2012/9/20 aaron morton <aa...@thelastpickle.com> > I would consider: > > # User CF > * row_key: user_id > * columns: user properties, key=value > > # UserRequests CF > * row_key: <user_id : partition_start> where partition_start is the start > of a time partition that makes sense in your domain. e.g. partition > monthly. Generally want to avoid rows the grow forever, as a rule of thumb > avoid rows more than a few 10's of MB. > * columns: two possible approaches: > 1) If the requests are immutable and you generally want all of the data > store the request in a single column using JSON or similar, with the column > name a timestamp. > 2) Otherwise use a composite column name of <timestamp : request_property> > to store the request in many columns. > * In either case consider using Reversed comparators so the most recent > columns are first see > http://thelastpickle.com/2011/10/03/Reverse-Comparators/ > > # GlobalRequests CF > * row_key: partition_start - time partition as above. It may be easier to > use the same partition scheme. > * column name: <timestamp : user_id> > * column value: empty >
Ok, I think I understood your suggestion... But the only advantage in this solution is to split data among partitions? I understood how it would work, but I didn't understand why it's better than the other solution, without the GlobalRequests CF > - Select all the requests for an user > > Work out the current partition client side, get the first N columns. Then > page. > What do you mean here by current partition? You mean I would perform a query for each particition? If I want all the requests for the user, couldn't I just select all UserRequest records which start with "userId"? I might be missing something here, but in my understanding if I use hector to query a column familly I can do that and Cassandra servers will automatically communicate to each other to get the data I need, right? Is it bad? I really didn't understand why to use partitions. > - Select all the users which has new requests, since date D > > Worm out the current partition client side, get the first N columns from > GlobalRequests, make a multi get call to UserRequests > NOTE: Assuming the size of the global requests space is not huge. > Hope that helps. > For sure it is helping a lot. However, I don't know what is a multiget... I saw the hector api reference and found this method, but not sure about what Cassandra would do internally if I do a multiget... Is this expensive in terms of performance and latency? -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr