Re: Cassandra Read Path Code Navigation

Bhuvan Rawal Tue, 14 Jun 2016 01:15:07 -0700

Thanks Oleksandr,

Ill have a look at the Video you have suggested as well as debugging where
the problem may lie debugging on the classes you suggested. I have pointing
out SSTables here as I have issued `nodetool flush` and copied sstables on
local computer and reproduced this issue  (No Memtables in that case as no
writes on local machine) on issuing `nodetool compact`. I have then started
Cassandra in debug mode using these arguments in cassandra-env.sh:


# Cassandra Debugging Arguments
JVM_OPTS="$JVM_OPTS -Xdebug"
JVM_OPTS="$JVM_OPTS -Xnoagent"
JVM_OPTS="$JVM_OPTS -Djava.compiler=NONE"
JVM_OPTS="$JVM_OPTS
-Xrunjdwp:transport=dt_socket,server=y,address=5005,suspend=n"

Debugging mode is successful and I get a hit on StorageProxy class fetchRows()
method.

I have created a Jira - CASSANDRA-12003
<https://issues.apache.org/jira/browse/CASSANDRA-12003> for reference. If
required I can upload the SSTables if required for reproducing (Total 37MB
in size for this table). We are also looking at our application in the
meanwhile to see in which scenario this happened.

Best Regards,
Bhuvan

On Tue, Jun 14, 2016 at 12:36 PM, Oleksandr Petrov <
oleksandr.pet...@gmail.com> wrote:

> Hi,
>
> The query behaviour should not rely on the compaction. It'd be great to
> have a Jira ticket for that.
>
> It'd also be very useful if you described your setup a bit more, are you
> using SASI for like queries (assuming wildcards in the '*Size_s*')?
>
> As regarding the read path, there's a nice talk by Tyler Hobbs about
> read/write paths which might get you started:
> https://www.youtube.com/watch?v=9Id5me7QFHU
>
> Briefly, there are several paths, each one used in it's own setting and
> multiple things play together for the query execution. It all starts with
> the SelectStatement, which determines whether it's a single partition
> (SinglePartitionReadCommand) or partition range read
> (PartitionRangeReadCommand).
>
> StatementRestrictions are responsible for everything related to the WHERE
> clause of the query, including index queries and filtering. You can see
> that restrictions are separated into PartitionKey restrictions,
> ClusteringColumn Restrictions and NonPrimary column restrictions. Here, the
> bounds are created for the queries that have keys specified in the order
> that makes it possible to query without filtering and filter restrictions
> are added otherwise.
>
> Usually it's not necessary to go all the way down to the SSTables, as read
> path is very well abstracted from the underlying storage, since the read
> involves Memtable lookup as well. Also, if the query returns too many
> results, it seems that the engine itself did a good job of reading results
> from disk, but some of them were not filtered out.
>
> Hope this helps,
>
> On Tue, Jun 14, 2016 at 7:26 AM Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
> > Hi All,
> >
> > Im debugging a issue in Cassandra 3.5 which was reported in user mailing
> > list earlier, is pretty critical to solve at our end. ill give a brief
> > intro: On issuing this query:;
> >
> > select id,filter_name from navigation_bucket_filter where id=2429 and
> > filter_name='*Size_s*';
> >
> >  id   | filter_name
> > ------+----------------------
> >  2429 | AdditionalProperty_s
> >  2429 |                Brand
> > ------------more rows-------
> >  2429 |               Size_s
> > ------------more rows-------
> >  2429 |         sdFullfilled
> >  2429 |           sellerCode
> >
> > (16 rows)
> >
> > Whereas *only one result was expected* (Row bearing filter_name -
> Size_s),
> > we got that result but along with 15 other unexpected rows..
> >
> > Total number of rows in the partition are 20 (Verified using select
> > id,filter_name from navigation_bucket_filter where id=2429;) as well as
> > json dump. We are wondering why Cassandra could not filter the results
> > completely. I have checked that the data is intact by taking json dump
> and
> > validating using sstabledump tool.
> >
> > The issue was resolved on production by using nodetool compact, but
> > debugging it is critical as to what led to this and issuing manual
> > compaction may not be possible everytime.
> >
> > I copied the sstables of the particular table onto my local machine and
> > *have
> > been able to reproduce the same* issue, while trying to run Cassandra in
> > debug mode I have been able to connect my IDE with it but unfortunately I
> > have not been able to navigate really far in the Read Path. Will be glad
> to
> > get a some pointers on where in the code SSTables are read and partition
> is
> > filtered.
> >
> > Secondly, I wanted to know if there is a possible way by which we can
> read
> > the other SSTable files (Partition Index) Filter.db, Statistics.db, et al
> > as well as Commitlog. If such a utility does not exist currently but can
> be
> > created from existing classes pls let me know as well would love to build
> > and share one.
> >
> >
> > Best Regards,
> > Bhuvan Rawal
> >
> --
> Alex Petrov
>

Re: Cassandra Read Path Code Navigation

Reply via email to