Hi Kasper, Given the 2 most common queries you mention (top 5 and 5 nearest), I will assume that these are relative to the current user, so basically, they would correspond to the top 5 among my friends, and the 5 closest to my own top score. If that is your intention, I would suggest that you could compute both of these once at the time a write occurs (i.e. any new high score) and store them separate from the main friends list (which would still be updated as I suggested before). It wasn't clear to me if you're using CQL or not, but if so, your user record could have 2 collection columns for the top 5 and nearest 5, and I'm sure it'd be pretty easy to accomplish the same thing with Thrift as well, though I don't have experience in that area.
The top 5 list would be easy to update because each new high score would potentially displace one value. The nearest 5 would be likely to be a new set each time, so you'd just replace the entire column. This strategy does put more load onto your writes, but would be very efficient to read - one column for each set. Other queries, such as scrolling through the list, could still be handled from the full list with sorting, as I suggested before. So if my assumptions above are valid, this still seems to me to be better than using the score as part of the key and forcing read-before-write. YMMV, of course... Steve On Tue, Feb 25, 2014 at 1:11 AM, Kasper Middelboe Petersen < kas...@sybogames.com> wrote: > Hi Steve, > > I've considered this approach before and I'm partial to going this way > again. > > The reason I haven't yet is the fact that I'm fairly confident that the > user patterns would have a lot more highscore list reads than setting of > highscores or adding friends. And most of the reads from highscores is > either the 5 closest scores to a given score or the top-5. If a user has > 100 friends it seems very excessive to have to read all these 100 entries > to get the top 5. With the score as part of the clustering key this is > possible to do efficiently. > > Do you have any thoughts on this? > > > /Kasper > > > On Tue, Feb 25, 2014 at 1:30 AM, Steven A Robenalt > <srobe...@stanford.edu>wrote: > >> Hi Kasper, >> >> I am assuming that your friend list is symmetric (i.e. If I am your >> friend then you are also my friend), which your comments seem to indicate. >> >> First, I would suggest that you drop the friends score as a part of the >> clustering key, which eliminates the need to read-before-write. >> >> With that in mind, here's what I'd recommend: >> >> 1) Each user has their own high score and timestamp as part of their own >> attributes (thus keyed to their own user id). >> >> 2) Each user also has a list of their own friends, keyed to their own >> user id and the friend's user_id, and including the friend's high score and >> updated time as attributes. >> >> 3) When I add a friend, I copy their high score and timestamp to my own >> friends list, and I copy my current high score and timestamp to their >> friends list. >> >> 4) When I update my own high score, I grab the ids of all of my friends >> from my own friends list and push my new high score and timestamp to each >> of them. >> >> 5) When I remove a friend, I delete myself from their friends list and >> them from mine. >> >> 6) When I view any friends list, the result must be sorted at that time, >> presumably by descending high score (since the score is no longer part of >> the clustering key). >> >> There's a couple of potential conflicts in this strategy, which may occur >> (for example) if my friend updates a high score at the same time I drop >> them as a friend (they would end up re-inserting themselves to my friend >> list after I dropped them). In this case, I'd need to simply drop them >> again. This may or may not be acceptable in your application. Most conflict >> situations could be resolved with use of lightweight transactions if you >> want to sacrifice performance (and assuming that the condition happens >> often enough to warrant such treatment). >> >> For performance reasons, it may also be desirable to lump all of the high >> score updates for a user in a batch statement if you have the option to do >> so. >> >> Anyway, that eliminates the read-before-write problem, and also insures >> that if a high score is missed at the time a new user is added as a friend, >> it can at least be updated later with the correct value. Not sure how much >> help that is, but maybe it'll give you some ideas you can experiment with. >> >> Steve >> >> >> On Mon, Feb 24, 2014 at 9:47 AM, Kasper Middelboe Petersen < >> kas...@sybogames.com> wrote: >> >>> Hi, >>> >>> My requirements include a system that can handle friend based highscore >>> lists (as a user I have a bunch of friends from various social sites like >>> Facebook). The user must have a highscore list that consist of his friends >>> only. >>> >>> I have implemented this using the users ID as partition key and the >>> friends score and id as clustering keys. This keeps reads of the highscore >>> list fast and straight forward. >>> >>> Updating a highscore is a bit cumbersome and suffers from >>> read-before-write, but can largely be done without too much worry. The big >>> issue here is I need to know the exact old highscore to be able to set a >>> new one. >>> >>> The big problem arise when a new user connects and have many friends >>> that needs to be added. This can end up being quite an extensive amount of >>> queries that has to happen and could take some time to do: >>> - Lookup the friends user id based on the social credentials >>> - Lookup the friends highscore >>> - Lookup the users own highscore >>> - Add friend with his highscore to self >>> - Add self to with own highscore to friend >>> - Do update to self with lastUpdatedFriends timestamp >>> - Do update to friend with lastUpdatedFriends timestamp >>> >>> Now if a new user has a bunch of friends this could end up being quite a >>> lot of queries - all suffering from the read-before-write problem. Should >>> any of the friends set a new highscore between the lookup and the writes >>> the highscore would never be set correctly and duplicates would happen. >>> >>> I'm open to any suggestions ranging from how to model this differently >>> to avoid the read-before-write to how to do this without risking having >>> duplicate data that would be extremely painful to try and find again in >>> highscore lists? >>> >>> >>> Thanks, >>> Kasper >>> >> >> >> >> -- >> Steve Robenalt >> Software Architect >> HighWire | Stanford University >> 425 Broadway St, Redwood City, CA 94063 >> >> srobe...@stanford.edu >> http://highwire.stanford.edu >> >> >> >> >> >> > -- Steve Robenalt Software Architect HighWire | Stanford University 425 Broadway St, Redwood City, CA 94063 srobe...@stanford.edu http://highwire.stanford.edu