Re: results differ on two queries, based on secondary index key and partition key

Steve Robenalt Wed, 29 Mar 2017 12:10:23 -0700

Hi Ferit,

I have implemented the follower/following relationship in a manner similar
to what you describe. However, I can't tell from the description of your
tables if you are using the same convention I used. The two tables I used
were symmetric, as with yours, but with the ids swapped. I inserted and
removed in tandem, as you mention, and I used batches, though they probably
aren't strictly necessary for this case.


so for the followers table (which I interpret as "who are my followers?"),
the table was defined as:
followers: pk = (uid,fid) + fields used for display
and for the following table (which I interpret as "who am I following?"),
it was defined as:
following: pk = (fid,uid) + fields relevant to that display

so when user 87654321 follows user 12345678:

insert into followers (12345678,87654321, ...)
and
insert into following (87654321,12345678, ...)

Secondary indexes aren't needed for the 2 queries, though bulk deletes
(i.e. remove all followers when a user is removed) take a bit of
housekeeping.

On Wed, Mar 29, 2017 at 10:35 AM, Durity, Sean R <
sean_r_dur...@homedepot.com> wrote:

> This looks more like a problem for a graph-based model. Have you looked at
> DSE Graph as a possibility?
>
>
>
>
>
> Sean Durity
>
> *From:* ferit baver elhuseyni [mailto:feritba...@gmail.com]
> *Sent:* Tuesday, March 14, 2017 11:40 AM
> *To:* user@cassandra.apache.org
> *Subject:* results differ on two queries, based on secondary index key
> and partition key
>
>
>
> Hi all,
>
>
>
>
>
> We are using a C* 2.2.8 cluster in our production system, composed of 5
> nodes in 1 DC with RF=3. Our clients mostly write with CL.ALL and read with
> CL.ONE (both will be switched to quorum soon).
>
>
>
> We face several problems while trying to persist classical "follow
> relationship". Did anyone of you have similar problems / or have any idea
> on what could be wrong?
>
>
>
> 1) First our model. It is based on two tables : follower and following,
> that should be identical. First one is for queries on getting followers of
> a user, latter is for getting who a user is following.
>
>
>
> followings (uid bigint, ts timeuuid, fid bigint, PRIMARY KEY (uid, ts))
> WITH CLUSTERING ORDER BY (ts DESC);
>
>
>
> followers (uid bigint, ts timeuuid, fid bigint, PRIMARY KEY (uid, ts))
> WITH CLUSTERING ORDER BY (ts DESC);
>
>
>
>
>
> 2) Both tables have secondary indexes on fid columns.
>
>
>
> 3) Definitely, a new follow relationship should insert one row to each
> table and delete should work on both too.
>
>
>
>
>
>
>
> *Problems :*
>
>
>
> 1) We have a serious discrepancy problems between tables. With "nodetool
> cfstats" followings is 18mb, follower is 19mb in total. For demonstration
> purposes of this problem, I got followers of the most-followed user from
> both tables.
>
>
>
> A) select * from followers where uid=12345678
>
> B) select * from followings where fid=12345678
>
>
>
> using a small script on unix, i could find out this info on sets A and B:
>
> count( A < B ) = 1247
>
> count( B < A ) = 185
>
> count( A ∩ B ) = 20894
>
>
>
>
>
> 2) Even more interesting than that is, if I query follower table on
> secondary index, I don't get a row that I normally get with filtering just
> on partition key. Let me try to visualize it :
>
>
>
> select uid,ts,fid from followers where fid=X (cannot find uid=12345678)
>
>      A | BBB | X
>
>      C | DDD | X
>
>      E | FFF | X
>
>
>
> select uid,ts,fid from followers where uid=12345678 | grep X
>
>  12345678 | GGG | *X*
>
>
>
>
>
> *My thoughts :*
>
>
>
> 1) Currently, we don't use batches during inserts and deletes to both
> tables. Would this help with our problems?
>
>
>
> 2) I was first suspicious of a corruption in secondary indexes. But
> actually, through the use of secondary index, I get consistent results.
>
>
>
> 3) I also thought, there could be the case of zombie rows. However we
> didn't have any long downtimes with our nodes. But, to our shame, we
> haven't been running any scheduled repairs on the cluster.
>
>
>
> 4) Finally, do you think that there may be problem with our modelling?
>
>
>
>
>
> Thanks in advance.
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>



-- 


* Steve Robenalt Software Architect, HighWire Press, Inc. *
www.highwire.org| Los Gatos, CA| Belfast, NI| Brighton, UK
<https://www.linkedin.com/company-beta/1797/>
<http://twitter.com/highwirepress>

*HighWire Spring Publishers' Meeting, April 4-6. Registration now open!
<omenji...@highwire.org?subject=Please%20register%20me%20for%20HighWire%20Spring%20Publishers'%20Meeting>*
*Register here for HighWire webinars
<https://t.xink.io/Tracking/Index/XaoBAMZ_AAB-KCEA0>*

Re: results differ on two queries, based on secondary index key and partition key

Reply via email to