> I created the following model: an UserCF, whose key is a userID generated by
> TimeUUID, and a RequestCF, whose key is composite: UserUUID + timestamp. For
> each user, I will store basic data and, for each request, I will insert a lot
> of columns.
I would consider:
# User CF
* row_key: user_id
* columns: user properties, key=value
# UserRequests CF
* row_key: <user_id : partition_start> where partition_start is the start of a
time partition that makes sense in your domain. e.g. partition monthly.
Generally want to avoid rows the grow forever, as a rule of thumb avoid rows
more than a few 10's of MB.
* columns: two possible approaches:
1) If the requests are immutable and you generally want all of the data
store the request in a single column using JSON or similar, with the column
name a timestamp.
2) Otherwise use a composite column name of <timestamp :
request_property> to store the request in many columns.
* In either case consider using Reversed comparators so the most recent
columns are first see http://thelastpickle.com/2011/10/03/Reverse-Comparators/
# GlobalRequests CF
* row_key: partition_start - time partition as above. It may be easier
to use the same partition scheme.
* column name: <timestamp : user_id>
* column value: empty
> - Select all the requests for an user
Work out the current partition client side, get the first N columns. Then page.
> - Select all the users which has new requests, since date D
Worm out the current partition client side, get the first N columns from
GlobalRequests, make a multi get call to UserRequests
NOTE: Assuming the size of the global requests space is not huge.
Hope that helps.
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 20/09/2012, at 11:19 AM, Marcelo Elias Del Valle <[email protected]> wrote:
> In your first email, you get a request and seem to shove it and a user in
> generating the ids which means that user never generates a request ever
> again??? If a user sends multiple requests in, how are you looking up his
> TimeUUID row key from your first email(I would do the same in my
> implementation)?
>
> Actually, I don't get it from Cassandra. I am using Cassandra for the writes,
> but to find the userId I look on a pre-indexed structure, because I think the
> reads would be faster this way. I need to find the userId by some key fields,
> so I use an index like this:
>
> user ID 5596 -> { name -> "john denver", phone -> "5555 5555", field3 ->
> "field 3 data"...., field 10 -> "field 10 data"}
>
> The values are just examples. This part is not implemented yet and I am
> looking for alternatives. Currently we have some similar indexes in SOLR, but
> we are thinking in keeping the index in memory and replicating manually in
> the cluster, or using Voldemort, etc.
> I might be wrong, but I think Cassandra is great for writes, but a solution
> like this would be better for reads.
>
>
> If you had an ldap unique username, I would just use that as the primary
> key meaning you NEVER have to do reads. If you have a username and need
> to lookup a UUID, you would have to do that in both implementationsÅ not a
> real big deal thoughÅ a quick quick lookup table does the trick there and
> in most cases is still fast enough(ie. Read before write here is ok in a
> lot of cases).
>
> That X-ref table would simple be rowkey=username and value=users real
> primary key
>
> Though again, we use ldap and know no one's username is really going to
> change so username is our primary key.
>
> In my case, a single user can have thousands of requests. In my userCF, I
> will have just 1 user with uuid X, but I am not sure about what to have in my
> requestCF.
>
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr