2012/9/20 aaron morton <aa...@thelastpickle.com>

> I would consider:
>
> # User CF
> * row_key: user_id
> * columns: user properties, key=value
>
> # UserRequests CF
> * row_key: <user_id : partition_start> where partition_start is the start
> of a time partition that makes sense in your domain. e.g. partition
> monthly. Generally want to avoid rows the grow forever, as a rule of thumb
> avoid rows more than a few 10's of MB.
> * columns: two possible approaches:
> 1) If the requests are immutable and you generally want all of the data
> store the request in a single column using JSON or similar, with the column
> name a timestamp.
> 2) Otherwise use a composite column name of <timestamp : request_property>
> to store the request in many columns.
> * In either case consider using Reversed comparators so the most recent
> columns are first  see
> http://thelastpickle.com/2011/10/03/Reverse-Comparators/
>
> # GlobalRequests CF
> * row_key: partition_start - time partition as above. It may be easier to
> use the same partition scheme.
> * column name: <timestamp : user_id>
> * column value: empty
>

Ok, I think I understood your suggestion... But the only advantage in this
solution is to split data among partitions? I understood how it would work,
but I didn't understand why it's better than the other solution, without
the GlobalRequests CF


> - Select all the requests for an user
>
> Work out the current partition client side, get the first N columns. Then
> page.
>

What do you mean here by current partition? You mean I would perform a
query for each particition? If I want all the requests for the user,
couldn't I just select all UserRequest records which start with "userId"? I
might be missing something here, but in my understanding if I use hector to
query a column familly I can do that and Cassandra servers will
automatically communicate to each other to get the data I need, right? Is
it bad? I really didn't understand why to use partitions.



> - Select all the users which has new requests, since date D
>
> Worm out the current partition client side, get the first N columns from
> GlobalRequests, make a multi get call to UserRequests
> NOTE: Assuming the size of the global requests space is not huge.
> Hope that helps.
>
 For sure it is helping a lot. However, I don't know what is a multiget...
I saw the hector api reference and found this method, but not sure about
what Cassandra would do internally if I do a multiget... Is this expensive
in terms of performance and latency?

-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Reply via email to