Hi, The query behaviour should not rely on the compaction. It'd be great to have a Jira ticket for that.
It'd also be very useful if you described your setup a bit more, are you using SASI for like queries (assuming wildcards in the '*Size_s*')? As regarding the read path, there's a nice talk by Tyler Hobbs about read/write paths which might get you started: https://www.youtube.com/watch?v=9Id5me7QFHU Briefly, there are several paths, each one used in it's own setting and multiple things play together for the query execution. It all starts with the SelectStatement, which determines whether it's a single partition (SinglePartitionReadCommand) or partition range read (PartitionRangeReadCommand). StatementRestrictions are responsible for everything related to the WHERE clause of the query, including index queries and filtering. You can see that restrictions are separated into PartitionKey restrictions, ClusteringColumn Restrictions and NonPrimary column restrictions. Here, the bounds are created for the queries that have keys specified in the order that makes it possible to query without filtering and filter restrictions are added otherwise. Usually it's not necessary to go all the way down to the SSTables, as read path is very well abstracted from the underlying storage, since the read involves Memtable lookup as well. Also, if the query returns too many results, it seems that the engine itself did a good job of reading results from disk, but some of them were not filtered out. Hope this helps, On Tue, Jun 14, 2016 at 7:26 AM Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Hi All, > > Im debugging a issue in Cassandra 3.5 which was reported in user mailing > list earlier, is pretty critical to solve at our end. ill give a brief > intro: On issuing this query:; > > select id,filter_name from navigation_bucket_filter where id=2429 and > filter_name='*Size_s*'; > > id | filter_name > ------+---------------------- > 2429 | AdditionalProperty_s > 2429 | Brand > ------------more rows------- > 2429 | Size_s > ------------more rows------- > 2429 | sdFullfilled > 2429 | sellerCode > > (16 rows) > > Whereas *only one result was expected* (Row bearing filter_name - Size_s), > we got that result but along with 15 other unexpected rows.. > > Total number of rows in the partition are 20 (Verified using select > id,filter_name from navigation_bucket_filter where id=2429;) as well as > json dump. We are wondering why Cassandra could not filter the results > completely. I have checked that the data is intact by taking json dump and > validating using sstabledump tool. > > The issue was resolved on production by using nodetool compact, but > debugging it is critical as to what led to this and issuing manual > compaction may not be possible everytime. > > I copied the sstables of the particular table onto my local machine and > *have > been able to reproduce the same* issue, while trying to run Cassandra in > debug mode I have been able to connect my IDE with it but unfortunately I > have not been able to navigate really far in the Read Path. Will be glad to > get a some pointers on where in the code SSTables are read and partition is > filtered. > > Secondly, I wanted to know if there is a possible way by which we can read > the other SSTable files (Partition Index) Filter.db, Statistics.db, et al > as well as Commitlog. If such a utility does not exist currently but can be > created from existing classes pls let me know as well would love to build > and share one. > > > Best Regards, > Bhuvan Rawal > -- Alex Petrov