RE: results differ on two queries, based on secondary index key and partition key

Durity, Sean R Wed, 29 Mar 2017 10:36:06 -0700

This looks more like a problem for a graph-based model. Have you looked at DSE 
Graph as a possibility?

Sean Durity
From: ferit baver elhuseyni [mailto:feritba...@gmail.com]
Sent: Tuesday, March 14, 2017 11:40 AM
To: user@cassandra.apache.org
Subject: results differ on two queries, based on secondary index key and 
partition key

Hi all,

We are using a C* 2.2.8 cluster in our production system, composed of 5 nodes 
in 1 DC with RF=3. Our clients mostly write with CL.ALL and read with CL.ONE 
(both will be switched to quorum soon).

We face several problems while trying to persist classical "follow 
relationship". Did anyone of you have similar problems / or have any idea on 
what could be wrong?

1) First our model. It is based on two tables : follower and following, that 
should be identical. First one is for queries on getting followers of a user, 
latter is for getting who a user is following.

followings (uid bigint, ts timeuuid, fid bigint, PRIMARY KEY (uid, ts)) WITH 
CLUSTERING ORDER BY (ts DESC);

followers (uid bigint, ts timeuuid, fid bigint, PRIMARY KEY (uid, ts)) WITH 
CLUSTERING ORDER BY (ts DESC);

2) Both tables have secondary indexes on fid columns.

3) Definitely, a new follow relationship should insert one row to each table 
and delete should work on both too.

Problems :

1) We have a serious discrepancy problems between tables. With "nodetool 
cfstats" followings is 18mb, follower is 19mb in total. For demonstration 
purposes of this problem, I got followers of the most-followed user from both 
tables.

A) select * from followers where uid=12345678
B) select * from followings where fid=12345678

using a small script on unix, i could find out this info on sets A and B:
count( A < B ) = 1247
count( B < A ) = 185
count( A ∩ B ) = 20894

2) Even more interesting than that is, if I query follower table on secondary 
index, I don't get a row that I normally get with filtering just on partition 
key. Let me try to visualize it :

select uid,ts,fid from followers where fid=X (cannot find uid=12345678)
     A | BBB | X
     C | DDD | X
     E | FFF | X

select uid,ts,fid from followers where uid=12345678 | grep X
 12345678 | GGG | X

My thoughts :

1) Currently, we don't use batches during inserts and deletes to both tables. 
Would this help with our problems?

2) I was first suspicious of a corruption in secondary indexes. But actually, 
through the use of secondary index, I get consistent results.

3) I also thought, there could be the case of zombie rows. However we didn't 
have any long downtimes with our nodes. But, to our shame, we haven't been 
running any scheduled repairs on the cluster.

4) Finally, do you think that there may be problem with our modelling?

Thanks in advance.

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: results differ on two queries, based on secondary index key and partition key

Reply via email to