Re: various Cassandra performance problems when CQL3 is really used

Ondřej Černoš Tue, 14 Jan 2014 12:50:31 -0800

Hi,

thanks for the answer and sorry for the delay. Let me answer inline.



On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton <[email protected]>wrote:

> > * select id from table where token(id) > token(some_value) and
> secondary_index = other_val limit 2 allow filtering;
> >
> > Filtering absolutely kills the performance. On a table populated with
> 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of
> JVM heap) and secondary index built on column with low cardinality of its
> value set this query takes 156 seconds to finish.
> Yes, this is why you have to add allow_filtering. You are asking the nodes
> to read all the data that matches and filter in memory, that’s a SQL type
> operation.
>
> Your example query is somewhat complex and I doubt it could get decent
> performance, what does the query plan look like?
>

I don't know. How do I find out? The only mention about query plan in
Cassandra I found is your article on your site, from 2011 and considering
version 0.8.

The example query gets computed in a fraction of the time if I perform just
the fetch of all rows matching the token function and perform the filtering
client side.



> IMHO you need to do further de-normalisation, you will get the best
> performance when you select rows by their full or part primary key.


I denormalize all the way I can. The problem is I need to support paging
and filtering at the same time. The API I must support allows filtering by
example and paging - so how should I denormalize? Should I somehow manage
pages of primary row keys manually? Or should I have manual secondary index
and page somehow in the denormalized wide row?

The trouble goes even further, even this doesn't perform well:

select id from table where token(id) > token(some_value) and pk_cluster =
'val' limit N;

where id and pk_cluster are primary key (CQL3 table). I guess this should
be ordered row query and ordered column slice query, so where is the
problem with performance?



> > By the way, the performance is order of magnitude better if this patch
> is applied:
> That looks like it’s tuned to your specific need, it would ignore the max
> results included in the query


It is tuned, it only demonstrates the heuristics doesn't work well.


> > * select id from table;
> >
> > As we saw in the trace log, the query - although it queries just row ids
> - scans all columns of all the rows and (probably) compares TTL with
> current time (?) (we saw hundreds of thousands of gettimeofday(2)). This
> means that if the table somehow mixes wide and narrow rows, the performance
> suffers horribly.
> Select all rows from a table requires a range scan, which reads all rows
> from all nodes. It should never be used production.
>

The trouble is I just need to perform it, sometimes. I know what the
problem with the query is, but I have just a couple of thousands records -
150.000 - the datasets can all be stored in memory, SSTables can be fully
mmapped. There is no reason for this query to be slow in this case.


> Not sure what you mean by “scans all columns from all rows” a select by
> column name will use a SliceByNamesReadCommand which will only read the
> required columns from each SSTable (it normally short circuits though and
> read from less).
>

The query should fetch only IDs, it checks TTLs of columns though. That is
the point. Why does it do it?


> if there is a TTL the ExpiringColumn.localExpirationTime must be checked,
> if there is no TTL it will no be checked.


It is a standard CQL3 table with ID, couple of columns and a CQL3
collection. I didn't do anything with TTL on the table and it's columns.


> > As Cassandra checks all the columns in selects, performance suffers
> badly if the collection is of any interesting size.
> This is not true, could you provide an example where you think this is
> happening ?


We saw it in the trace log. It happened in the select ID from table query.
The table had a collection column.


> > Additionally, we saw various random irreproducible freezes, high CPU
> consumption when nothing happens (even with trace log level set no activity
> was reported) and highly inpredictable performance characteristics after
> nodetool flush and/or major compaction.
> What was the HW platform and what was the load ?
>

My I7/8GB notebook, single node cluster, and virtualised AWS like
environment, on nodes of various sizes.


> Typically freezes in the server correlate to JVM GC, the JVM GC can also
> be using the CPU.
> If you have wide rows or make large reads you may run into more JVM GC
> issues.
>


> nodetool flush will (as it says) flush all the tables to disk, if you have
> a lot tables and/or a lot of secondary indexes this can cause the switch
> lock to be held preventing write threads from progressing. Once flush
> threads stop waiting on the flush queue the lock will be released. See the
> help for memtable_flush_queue_size in the yaml file.
>

I will check this, thanks.


> major compaction is not recommended to be used in production. If you are
> seeing it cause performance problems I would guess it is related to JVM GC
> and/or the disk IO is not able to keep up. When used it creates a single
> SSTable for each table which will not be compacted again until (default) 3
> other large SSTables are created or you run major compaction again. For
> this reason it is not recommended.
>
> > Conclusions:
> >
> > - do not use collections
> > - do not use secondary indexes
> > - do not use filtering
> > - have your rows as narrow as possible if you need any kind of all row
> keys traversal
> These features all have a use, but it looks like you leaned on them
> heavily while creating a relational model. Specially the filtering, you
> have to explicitly enable  it to prevent the client sending queries that
> will take a long time.
> The only time row key traversal is used normally is reading data through
> hadoop. You should always strive to read row(s) from a table by  the full
> or partial primary key.



> > With these conclusions in mind, CQL seems redundant, plain old thrift
> may be used, joins should be done client side and/or all indexes need to be
> handled manually. Correct?
> No.
> CQL provide a set of functionality not present in the thrift API.
> Joins and indexes should generally be handled by denormlaising the data
> during writes.
>
> It sounds like your data model was too relational, you need to denormalise
> and read rows by primary key. Secondary indexes are useful when you have a
> query pattern that is used infrequently.
>

regards,

ondrej cernos


> Hope that helps.
>
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 18/12/2013, at 3:47 am, Ondřej Černoš <[email protected]> wrote:
>
> > Hi all,
> >
> > we are reimplementing a legacy interface of an inventory-like service
> (currently built on top of mysql) on Cassandra and I thought I would share
> some findings with the list. The interface semantics is given and cannot be
> changed. We chose Cassandra due to its multiple datacenter capabilities and
> no-spof qualities. The dataset is small (6 tables having 150.000 records, a
> bunch of tables with up to thousands of records), but the model is not
> trivial - the mysql model has some 20+ tables, joins are frequent, m:n
> relationships are frequent and the like. The interface is read heavy. We
> thought the size of the dataset should allow the whole dataset to fit into
> memory of each node (3 node cluster in each DC, 3 replicas, local quorum
> operations) and that even though some operations (like secondary index
> lookup) are not superfast, due to the size of the dataset it should perform
> ok. We were wrong.
> >
> > We use CQL3 exclusively and we use all of its capabilities (collections,
> secondary indexes, filtering), because they make the data model
> maintainable. We denormalised what had to be denormalised in order to avoid
> client side joins. Usual query to the storage means one CQL query on a
> denormalised table. We need to support integer offset/limit paging,
> filter-by-example kind of queries, M:N relationship queries and all the
> usual suspects of old SQL-backed interface.
> >
> > This is the list of operations that perform really poorly we identified
> so far. Row id is called id in the following:
> >
> > * select id from table where token(id) > token(some_value) and
> secondary_index = other_val limit 2 allow filtering;
> >
> > Filtering absolutely kills the performance. On a table populated with
> 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of
> JVM heap) and secondary index built on column with low cardinality of its
> value set this query takes 156 seconds to finish.
> >
> > By the way, the performance is order of magnitude better if this patch
> is applied:
> >
> > diff --git
> a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
> b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
> > index 5ab1df6..13af671 100644
> > ---
> a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
> > +++
> b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
> > @@ -190,7 +190,8 @@ public class CompositesSearcher extends
> SecondaryIndexSearcher
> >
> >             private int meanColumns =
> Math.max(index.getIndexCfs().getMeanColumns(), 1);
> >             // We shouldn't fetch only 1 row as this provides buggy
> paging in case the first row doesn't satisfy all clauses
> > -            private final int rowsPerQuery =
> Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2);
> > +//            private final int rowsPerQuery =
> Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2);
> > +            private final int rowsPerQuery = 100000;
> >
> >             public boolean needsFiltering()
> >             {
> >
> > * select id from table;
> >
> > As we saw in the trace log, the query - although it queries just row ids
> - scans all columns of all the rows and (probably) compares TTL with
> current time (?) (we saw hundreds of thousands of gettimeofday(2)). This
> means that if the table somehow mixes wide and narrow rows, the performance
> suffers horribly.
> >
> > * CQL collections
> >
> > See the point above with mixing wide rows and narrow rows. As Cassandra
> checks all the columns in selects, performance suffers badly if the
> collection is of any interesting size.
> >
> > Additionally, we saw various random irreproducible freezes, high CPU
> consumption when nothing happens (even with trace log level set no activity
> was reported) and highly inpredictable performance characteristics after
> nodetool flush and/or major compaction.
> >
> > Conclusions:
> >
> > - do not use collections
> > - do not use secondary indexes
> > - do not use filtering
> > - have your rows as narrow as possible if you need any kind of all row
> keys traversal
> >
> > With these conclusions in mind, CQL seems redundant, plain old thrift
> may be used, joins should be done client side and/or all indexes need to be
> handled manually. Correct?
> >
> > Thanks for reading,
> >
> > ondrej cernos
>
>

Re: various Cassandra performance problems when CQL3 is really used

Reply via email to