Hi, thanks for the answer and sorry for the delay. Let me answer inline.
On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton <[email protected]>wrote: > > * select id from table where token(id) > token(some_value) and > secondary_index = other_val limit 2 allow filtering; > > > > Filtering absolutely kills the performance. On a table populated with > 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of > JVM heap) and secondary index built on column with low cardinality of its > value set this query takes 156 seconds to finish. > Yes, this is why you have to add allow_filtering. You are asking the nodes > to read all the data that matches and filter in memory, that’s a SQL type > operation. > > Your example query is somewhat complex and I doubt it could get decent > performance, what does the query plan look like? > I don't know. How do I find out? The only mention about query plan in Cassandra I found is your article on your site, from 2011 and considering version 0.8. The example query gets computed in a fraction of the time if I perform just the fetch of all rows matching the token function and perform the filtering client side. > IMHO you need to do further de-normalisation, you will get the best > performance when you select rows by their full or part primary key. I denormalize all the way I can. The problem is I need to support paging and filtering at the same time. The API I must support allows filtering by example and paging - so how should I denormalize? Should I somehow manage pages of primary row keys manually? Or should I have manual secondary index and page somehow in the denormalized wide row? The trouble goes even further, even this doesn't perform well: select id from table where token(id) > token(some_value) and pk_cluster = 'val' limit N; where id and pk_cluster are primary key (CQL3 table). I guess this should be ordered row query and ordered column slice query, so where is the problem with performance? > > By the way, the performance is order of magnitude better if this patch > is applied: > That looks like it’s tuned to your specific need, it would ignore the max > results included in the query It is tuned, it only demonstrates the heuristics doesn't work well. > > * select id from table; > > > > As we saw in the trace log, the query - although it queries just row ids > - scans all columns of all the rows and (probably) compares TTL with > current time (?) (we saw hundreds of thousands of gettimeofday(2)). This > means that if the table somehow mixes wide and narrow rows, the performance > suffers horribly. > Select all rows from a table requires a range scan, which reads all rows > from all nodes. It should never be used production. > The trouble is I just need to perform it, sometimes. I know what the problem with the query is, but I have just a couple of thousands records - 150.000 - the datasets can all be stored in memory, SSTables can be fully mmapped. There is no reason for this query to be slow in this case. > Not sure what you mean by “scans all columns from all rows” a select by > column name will use a SliceByNamesReadCommand which will only read the > required columns from each SSTable (it normally short circuits though and > read from less). > The query should fetch only IDs, it checks TTLs of columns though. That is the point. Why does it do it? > if there is a TTL the ExpiringColumn.localExpirationTime must be checked, > if there is no TTL it will no be checked. It is a standard CQL3 table with ID, couple of columns and a CQL3 collection. I didn't do anything with TTL on the table and it's columns. > > As Cassandra checks all the columns in selects, performance suffers > badly if the collection is of any interesting size. > This is not true, could you provide an example where you think this is > happening ? We saw it in the trace log. It happened in the select ID from table query. The table had a collection column. > > Additionally, we saw various random irreproducible freezes, high CPU > consumption when nothing happens (even with trace log level set no activity > was reported) and highly inpredictable performance characteristics after > nodetool flush and/or major compaction. > What was the HW platform and what was the load ? > My I7/8GB notebook, single node cluster, and virtualised AWS like environment, on nodes of various sizes. > Typically freezes in the server correlate to JVM GC, the JVM GC can also > be using the CPU. > If you have wide rows or make large reads you may run into more JVM GC > issues. > > nodetool flush will (as it says) flush all the tables to disk, if you have > a lot tables and/or a lot of secondary indexes this can cause the switch > lock to be held preventing write threads from progressing. Once flush > threads stop waiting on the flush queue the lock will be released. See the > help for memtable_flush_queue_size in the yaml file. > I will check this, thanks. > major compaction is not recommended to be used in production. If you are > seeing it cause performance problems I would guess it is related to JVM GC > and/or the disk IO is not able to keep up. When used it creates a single > SSTable for each table which will not be compacted again until (default) 3 > other large SSTables are created or you run major compaction again. For > this reason it is not recommended. > > > Conclusions: > > > > - do not use collections > > - do not use secondary indexes > > - do not use filtering > > - have your rows as narrow as possible if you need any kind of all row > keys traversal > These features all have a use, but it looks like you leaned on them > heavily while creating a relational model. Specially the filtering, you > have to explicitly enable it to prevent the client sending queries that > will take a long time. > The only time row key traversal is used normally is reading data through > hadoop. You should always strive to read row(s) from a table by the full > or partial primary key. > > With these conclusions in mind, CQL seems redundant, plain old thrift > may be used, joins should be done client side and/or all indexes need to be > handled manually. Correct? > No. > CQL provide a set of functionality not present in the thrift API. > Joins and indexes should generally be handled by denormlaising the data > during writes. > > It sounds like your data model was too relational, you need to denormalise > and read rows by primary key. Secondary indexes are useful when you have a > query pattern that is used infrequently. > regards, ondrej cernos > Hope that helps. > > ----------------- > Aaron Morton > New Zealand > @aaronmorton > > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > On 18/12/2013, at 3:47 am, Ondřej Černoš <[email protected]> wrote: > > > Hi all, > > > > we are reimplementing a legacy interface of an inventory-like service > (currently built on top of mysql) on Cassandra and I thought I would share > some findings with the list. The interface semantics is given and cannot be > changed. We chose Cassandra due to its multiple datacenter capabilities and > no-spof qualities. The dataset is small (6 tables having 150.000 records, a > bunch of tables with up to thousands of records), but the model is not > trivial - the mysql model has some 20+ tables, joins are frequent, m:n > relationships are frequent and the like. The interface is read heavy. We > thought the size of the dataset should allow the whole dataset to fit into > memory of each node (3 node cluster in each DC, 3 replicas, local quorum > operations) and that even though some operations (like secondary index > lookup) are not superfast, due to the size of the dataset it should perform > ok. We were wrong. > > > > We use CQL3 exclusively and we use all of its capabilities (collections, > secondary indexes, filtering), because they make the data model > maintainable. We denormalised what had to be denormalised in order to avoid > client side joins. Usual query to the storage means one CQL query on a > denormalised table. We need to support integer offset/limit paging, > filter-by-example kind of queries, M:N relationship queries and all the > usual suspects of old SQL-backed interface. > > > > This is the list of operations that perform really poorly we identified > so far. Row id is called id in the following: > > > > * select id from table where token(id) > token(some_value) and > secondary_index = other_val limit 2 allow filtering; > > > > Filtering absolutely kills the performance. On a table populated with > 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of > JVM heap) and secondary index built on column with low cardinality of its > value set this query takes 156 seconds to finish. > > > > By the way, the performance is order of magnitude better if this patch > is applied: > > > > diff --git > a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > > index 5ab1df6..13af671 100644 > > --- > a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > > +++ > b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > > @@ -190,7 +190,8 @@ public class CompositesSearcher extends > SecondaryIndexSearcher > > > > private int meanColumns = > Math.max(index.getIndexCfs().getMeanColumns(), 1); > > // We shouldn't fetch only 1 row as this provides buggy > paging in case the first row doesn't satisfy all clauses > > - private final int rowsPerQuery = > Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); > > +// private final int rowsPerQuery = > Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); > > + private final int rowsPerQuery = 100000; > > > > public boolean needsFiltering() > > { > > > > * select id from table; > > > > As we saw in the trace log, the query - although it queries just row ids > - scans all columns of all the rows and (probably) compares TTL with > current time (?) (we saw hundreds of thousands of gettimeofday(2)). This > means that if the table somehow mixes wide and narrow rows, the performance > suffers horribly. > > > > * CQL collections > > > > See the point above with mixing wide rows and narrow rows. As Cassandra > checks all the columns in selects, performance suffers badly if the > collection is of any interesting size. > > > > Additionally, we saw various random irreproducible freezes, high CPU > consumption when nothing happens (even with trace log level set no activity > was reported) and highly inpredictable performance characteristics after > nodetool flush and/or major compaction. > > > > Conclusions: > > > > - do not use collections > > - do not use secondary indexes > > - do not use filtering > > - have your rows as narrow as possible if you need any kind of all row > keys traversal > > > > With these conclusions in mind, CQL seems redundant, plain old thrift > may be used, joins should be done client side and/or all indexes need to be > handled manually. Correct? > > > > Thanks for reading, > > > > ondrej cernos > >
