table scan speeds

Jens-U. Mozdzen Sun, 04 Jan 2015 14:00:50 -0800

Hi *,

we're experimenting with a 4-node Cassandra setup, aggregating inputstreams and persisting time-bucket based aggregation results.

One test was to create "full dumps" of a table with around 32 millionexisting entries. The keyspace was created with a replication factorof 2 and the CF has 40 sstables across the four nodes.


The output of "describe table mykeyspace.mycf;":

--- cut here ---
CREATE TABLE mykeyspace.mycf (
    a timestamp,
    b bigint,
    c bigint,
    d bigint,
    e boolean,
    f text,
    g bigint,
    h text,
    i bigint,
    j text,
    k text,
    l bigint,
    m text,
    n text,
    o text,
    p text,
    q text,
    r bigint,
    PRIMARY KEY ((a, b, c), d, e, f)
) WITH CLUSTERING ORDER BY (d ASC, e ASC, f ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''

AND compaction = {'min_threshold': '4', 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy','max_threshold': '32'}AND compression = {'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
--- cut here ---

Cassandra is at version 2.1.2.

The test program is written in Java and using the Datastax driver2.1.3, issuing the SimpleQuery "select * from mykeyspace.mycf;",reading each row (without further local processing) and counting thenumber of rows returned plus measuring the elapsed time.

Across multiple invocations, the time per row is about 10.8milliseconds, which sums up to quite some elapsing time for all therecords.

When running the query against a single column instead of all columns,the average time per record still is at a similar level.

What puzzles me most: During the query (no other activity is runningon the Cassandra nodes), I see plenty of CPU idle and close to noiowait on all four nodes.

Depending on the used SimpleStatement.setFetchSize() used, we run intoless or more (up to severe) GC trouble, so we settled on using a sizeof 500 (1000 will work, too), keeping the GC stress level low.Loooking at the Cassandra logs, I do see some GC activity (we'recurrently using G1GC), but mostly around a few 100ms and not too often- so it should be no issue of GC blocking. The JVMs seem to haveplenty of memory left, according to jconsole.

I have tried running the query with tracing, but the amount of traceentries was overwhelming. Scanning the part that fit into the screenbuffer, nothing obvious turned up.

Given the fact that the servers seem not to be under resourceconstraints, adding more servers will likely not help to improve theresponse time.

How could I shed some light on why the queries will not take theservers to their (CPU, I/O) limits?


Regards,
Jens

PS: Doing full scans seems to be an anti-pattern - selectingindividual records from our tables works sufficiently fast ATM. We'relooking into changes to the solution architecture so we can avoiddumping such big tables, but nevertheless I'm curious what limit we'rehitting... what's restricting the flow of data?

table scan speeds

Reply via email to