Re: results differ on two queries, based on secondary index key and partition key

ferit baver elhuseyni Fri, 31 Mar 2017 23:59:42 -0700

Thanks Steve. With your approach, I believe it will not allow double
entries for a follower-followee relationship which is good. In our
modeling, a timeuuid field is chosen to be the cluster key and this lets a
time-based ordering, which was one of the design goals I believe.


Actually, I started thinking that our problems are based on mis-management
of Cassandra. I wasn't at the core of cassandra activities from the
beginning, laterI learned that cluster ha evolved in a weird way until now :

- Started with a 3 node, 2RF, 1WC, 1RC configuration
- Reacting to some consistency problems, switched to 2WC
- Reacting to availability problems, 2 more nodes were added, but without
any repair. New configuration was chosen to be 5RF, QuorumWC, 1RC
- After some weird things (inconsistent results from nodes), switched to 3RF
- There has been no periodical repairs in 2 years.

Right now, we did a manual cleansing of the problematic table and are
waiting to observe arising new problems.


2017-03-29 22:10 GMT+03:00 Steve Robenalt <sroben...@highwire.org>:

> Hi Ferit,
>
> I have implemented the follower/following relationship in a manner similar
> to what you describe. However, I can't tell from the description of your
> tables if you are using the same convention I used. The two tables I used
> were symmetric, as with yours, but with the ids swapped. I inserted and
> removed in tandem, as you mention, and I used batches, though they probably
> aren't strictly necessary for this case.
>
> so for the followers table (which I interpret as "who are my followers?"),
> the table was defined as:
> followers: pk = (uid,fid) + fields used for display
> and for the following table (which I interpret as "who am I following?"),
> it was defined as:
> following: pk = (fid,uid) + fields relevant to that display
>
> so when user 87654321 follows user 12345678:
>
> insert into followers (12345678,87654321, ...)
> and
> insert into following (87654321,12345678, ...)
>
> Secondary indexes aren't needed for the 2 queries, though bulk deletes
> (i.e. remove all followers when a user is removed) take a bit of
> housekeeping.
>
> On Wed, Mar 29, 2017 at 10:35 AM, Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> This looks more like a problem for a graph-based model. Have you looked
>> at DSE Graph as a possibility?
>>
>>
>>
>>
>>
>> Sean Durity
>>
>> *From:* ferit baver elhuseyni [mailto:feritba...@gmail.com]
>> *Sent:* Tuesday, March 14, 2017 11:40 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* results differ on two queries, based on secondary index key
>> and partition key
>>
>>
>>
>> Hi all,
>>
>>
>>
>>
>>
>> We are using a C* 2.2.8 cluster in our production system, composed of 5
>> nodes in 1 DC with RF=3. Our clients mostly write with CL.ALL and read with
>> CL.ONE (both will be switched to quorum soon).
>>
>>
>>
>> We face several problems while trying to persist classical "follow
>> relationship". Did anyone of you have similar problems / or have any idea
>> on what could be wrong?
>>
>>
>>
>> 1) First our model. It is based on two tables : follower and following,
>> that should be identical. First one is for queries on getting followers of
>> a user, latter is for getting who a user is following.
>>
>>
>>
>> followings (uid bigint, ts timeuuid, fid bigint, PRIMARY KEY (uid, ts))
>> WITH CLUSTERING ORDER BY (ts DESC);
>>
>>
>>
>> followers (uid bigint, ts timeuuid, fid bigint, PRIMARY KEY (uid, ts))
>> WITH CLUSTERING ORDER BY (ts DESC);
>>
>>
>>
>>
>>
>> 2) Both tables have secondary indexes on fid columns.
>>
>>
>>
>> 3) Definitely, a new follow relationship should insert one row to each
>> table and delete should work on both too.
>>
>>
>>
>>
>>
>>
>>
>> *Problems :*
>>
>>
>>
>> 1) We have a serious discrepancy problems between tables. With "nodetool
>> cfstats" followings is 18mb, follower is 19mb in total. For demonstration
>> purposes of this problem, I got followers of the most-followed user from
>> both tables.
>>
>>
>>
>> A) select * from followers where uid=12345678
>>
>> B) select * from followings where fid=12345678
>>
>>
>>
>> using a small script on unix, i could find out this info on sets A and B:
>>
>> count( A < B ) = 1247
>>
>> count( B < A ) = 185
>>
>> count( A ∩ B ) = 20894
>>
>>
>>
>>
>>
>> 2) Even more interesting than that is, if I query follower table on
>> secondary index, I don't get a row that I normally get with filtering just
>> on partition key. Let me try to visualize it :
>>
>>
>>
>> select uid,ts,fid from followers where fid=X (cannot find uid=12345678)
>>
>>      A | BBB | X
>>
>>      C | DDD | X
>>
>>      E | FFF | X
>>
>>
>>
>> select uid,ts,fid from followers where uid=12345678 | grep X
>>
>>  12345678 | GGG | *X*
>>
>>
>>
>>
>>
>> *My thoughts :*
>>
>>
>>
>> 1) Currently, we don't use batches during inserts and deletes to both
>> tables. Would this help with our problems?
>>
>>
>>
>> 2) I was first suspicious of a corruption in secondary indexes. But
>> actually, through the use of secondary index, I get consistent results.
>>
>>
>>
>> 3) I also thought, there could be the case of zombie rows. However we
>> didn't have any long downtimes with our nodes. But, to our shame, we
>> haven't been running any scheduled repairs on the cluster.
>>
>>
>>
>> 4) Finally, do you think that there may be problem with our modelling?
>>
>>
>>
>>
>>
>> Thanks in advance.
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>
>
>
> --
>
>
> * Steve Robenalt Software Architect, HighWire Press, Inc. *
> www.highwire.org| Los Gatos, CA| Belfast, NI| Brighton, UK
> <https://www.linkedin.com/company-beta/1797/>
> <http://twitter.com/highwirepress>
>
> *HighWire Spring Publishers' Meeting, April 4-6. Registration now open!
> <omenji...@highwire.org?subject=Please%20register%20me%20for%20HighWire%20Spring%20Publishers'%20Meeting>*
> *Register here for HighWire webinars
> <https://t.xink.io/Tracking/Index/XaoBAMZ_AAB-KCEA0>*
>
>
>
>
>

Re: results differ on two queries, based on secondary index key and partition key

Reply via email to