Thanks Steve. With your approach, I believe it will not allow double entries for a follower-followee relationship which is good. In our modeling, a timeuuid field is chosen to be the cluster key and this lets a time-based ordering, which was one of the design goals I believe.
Actually, I started thinking that our problems are based on mis-management of Cassandra. I wasn't at the core of cassandra activities from the beginning, laterI learned that cluster ha evolved in a weird way until now : - Started with a 3 node, 2RF, 1WC, 1RC configuration - Reacting to some consistency problems, switched to 2WC - Reacting to availability problems, 2 more nodes were added, but without any repair. New configuration was chosen to be 5RF, QuorumWC, 1RC - After some weird things (inconsistent results from nodes), switched to 3RF - There has been no periodical repairs in 2 years. Right now, we did a manual cleansing of the problematic table and are waiting to observe arising new problems. 2017-03-29 22:10 GMT+03:00 Steve Robenalt <sroben...@highwire.org>: > Hi Ferit, > > I have implemented the follower/following relationship in a manner similar > to what you describe. However, I can't tell from the description of your > tables if you are using the same convention I used. The two tables I used > were symmetric, as with yours, but with the ids swapped. I inserted and > removed in tandem, as you mention, and I used batches, though they probably > aren't strictly necessary for this case. > > so for the followers table (which I interpret as "who are my followers?"), > the table was defined as: > followers: pk = (uid,fid) + fields used for display > and for the following table (which I interpret as "who am I following?"), > it was defined as: > following: pk = (fid,uid) + fields relevant to that display > > so when user 87654321 follows user 12345678: > > insert into followers (12345678,87654321, ...) > and > insert into following (87654321,12345678, ...) > > Secondary indexes aren't needed for the 2 queries, though bulk deletes > (i.e. remove all followers when a user is removed) take a bit of > housekeeping. > > On Wed, Mar 29, 2017 at 10:35 AM, Durity, Sean R < > sean_r_dur...@homedepot.com> wrote: > >> This looks more like a problem for a graph-based model. Have you looked >> at DSE Graph as a possibility? >> >> >> >> >> >> Sean Durity >> >> *From:* ferit baver elhuseyni [mailto:feritba...@gmail.com] >> *Sent:* Tuesday, March 14, 2017 11:40 AM >> *To:* user@cassandra.apache.org >> *Subject:* results differ on two queries, based on secondary index key >> and partition key >> >> >> >> Hi all, >> >> >> >> >> >> We are using a C* 2.2.8 cluster in our production system, composed of 5 >> nodes in 1 DC with RF=3. Our clients mostly write with CL.ALL and read with >> CL.ONE (both will be switched to quorum soon). >> >> >> >> We face several problems while trying to persist classical "follow >> relationship". Did anyone of you have similar problems / or have any idea >> on what could be wrong? >> >> >> >> 1) First our model. It is based on two tables : follower and following, >> that should be identical. First one is for queries on getting followers of >> a user, latter is for getting who a user is following. >> >> >> >> followings (uid bigint, ts timeuuid, fid bigint, PRIMARY KEY (uid, ts)) >> WITH CLUSTERING ORDER BY (ts DESC); >> >> >> >> followers (uid bigint, ts timeuuid, fid bigint, PRIMARY KEY (uid, ts)) >> WITH CLUSTERING ORDER BY (ts DESC); >> >> >> >> >> >> 2) Both tables have secondary indexes on fid columns. >> >> >> >> 3) Definitely, a new follow relationship should insert one row to each >> table and delete should work on both too. >> >> >> >> >> >> >> >> *Problems :* >> >> >> >> 1) We have a serious discrepancy problems between tables. With "nodetool >> cfstats" followings is 18mb, follower is 19mb in total. For demonstration >> purposes of this problem, I got followers of the most-followed user from >> both tables. >> >> >> >> A) select * from followers where uid=12345678 >> >> B) select * from followings where fid=12345678 >> >> >> >> using a small script on unix, i could find out this info on sets A and B: >> >> count( A < B ) = 1247 >> >> count( B < A ) = 185 >> >> count( A ∩ B ) = 20894 >> >> >> >> >> >> 2) Even more interesting than that is, if I query follower table on >> secondary index, I don't get a row that I normally get with filtering just >> on partition key. Let me try to visualize it : >> >> >> >> select uid,ts,fid from followers where fid=X (cannot find uid=12345678) >> >> A | BBB | X >> >> C | DDD | X >> >> E | FFF | X >> >> >> >> select uid,ts,fid from followers where uid=12345678 | grep X >> >> 12345678 | GGG | *X* >> >> >> >> >> >> *My thoughts :* >> >> >> >> 1) Currently, we don't use batches during inserts and deletes to both >> tables. Would this help with our problems? >> >> >> >> 2) I was first suspicious of a corruption in secondary indexes. But >> actually, through the use of secondary index, I get consistent results. >> >> >> >> 3) I also thought, there could be the case of zombie rows. However we >> didn't have any long downtimes with our nodes. But, to our shame, we >> haven't been running any scheduled repairs on the cluster. >> >> >> >> 4) Finally, do you think that there may be problem with our modelling? >> >> >> >> >> >> Thanks in advance. >> >> ------------------------------ >> >> The information in this Internet Email is confidential and may be legally >> privileged. It is intended solely for the addressee. Access to this Email >> by anyone else is unauthorized. If you are not the intended recipient, any >> disclosure, copying, distribution or any action taken or omitted to be >> taken in reliance on it, is prohibited and may be unlawful. When addressed >> to our clients any opinions or advice contained in this Email are subject >> to the terms and conditions expressed in any applicable governing The Home >> Depot terms of business or client engagement letter. The Home Depot >> disclaims all responsibility and liability for the accuracy and content of >> this attachment and for any damages or losses arising from any >> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other >> items of a destructive nature, which may be contained in this attachment >> and shall not be liable for direct, indirect, consequential or special >> damages in connection with this e-mail message or its attachment. >> > > > > -- > > > * Steve Robenalt Software Architect, HighWire Press, Inc. * > www.highwire.org| Los Gatos, CA| Belfast, NI| Brighton, UK > <https://www.linkedin.com/company-beta/1797/> > <http://twitter.com/highwirepress> > > *HighWire Spring Publishers' Meeting, April 4-6. Registration now open! > <omenji...@highwire.org?subject=Please%20register%20me%20for%20HighWire%20Spring%20Publishers'%20Meeting>* > *Register here for HighWire webinars > <https://t.xink.io/Tracking/Index/XaoBAMZ_AAB-KCEA0>* > > > > >