Re: Correct model

aaron morton Thu, 20 Sep 2012 18:50:35 -0700

> I created the following model: an UserCF, whose key is a userID generated by 
> TimeUUID, and a RequestCF, whose key is composite: UserUUID + timestamp. For 
> each user, I will store basic data and, for each request, I will insert a lot 
> of columns.


I would consider:

# User CF
* row_key: user_id
* columns: user properties, key=value

# UserRequests CF
* row_key: <user_id : partition_start> where partition_start is the start of a 
time partition that makes sense in your domain. e.g. partition monthly. 
Generally want to avoid rows the grow forever, as a rule of thumb avoid rows 
more than a few 10's of MB. 
* columns: two possible approaches:
        1) If the requests are immutable and you generally want all of the data 
store the request in a single column using JSON or similar, with the column 
name a timestamp. 
        2) Otherwise use a composite column name of <timestamp : 
request_property> to store the request in many columns. 
        * In either case consider using Reversed comparators so the most recent 
columns are first  see http://thelastpickle.com/2011/10/03/Reverse-Comparators/

# GlobalRequests CF
        * row_key: partition_start - time partition as above. It may be easier 
to use the same partition scheme. 
        * column name: <timestamp : user_id>
        * column value: empty 

> - Select all the requests for an user

Work out the current partition client side, get the first N columns. Then page. 

> - Select all the users which has new requests, since date D
Worm out the current partition client side, get the first N columns from 
GlobalRequests, make a multi get call to UserRequests 

NOTE: Assuming the size of the global requests space is not huge.

Hope that helps. 
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/09/2012, at 11:19 AM, Marcelo Elias Del Valle <mvall...@gmail.com> wrote:

> In your first email, you get a request and seem to shove it and a user in
> generating the ids which means that user never generates a request ever
> again???  If a user sends multiple requests in, how are you looking up his
> TimeUUID row key from your first email(I would do the same in my
> implementation)?
> 
> Actually, I don't get it from Cassandra. I am using Cassandra for the writes, 
> but to find the userId I look on a pre-indexed structure, because I think the 
> reads would be faster this way. I need to find the userId by some key fields, 
> so I use an index like this:
> 
> user ID 5596 -> { name -> "john denver", phone -> "5555 5555", field3 -> 
> "field 3 data"...., field 10 -> "field 10 data"}
>    
> The values are just examples. This part is not implemented yet and I am 
> looking for alternatives. Currently we have some similar indexes in SOLR, but 
> we are thinking in keeping the index in memory and replicating manually in 
> the cluster, or using Voldemort, etc. 
> I might be wrong, but I think Cassandra is great for writes, but a solution 
> like this would be better for reads.
> 
>  
> If you had an ldap unique username, I would just use that as the primary
> key meaning you NEVER have to do reads.  If you have a username and need
> to lookup a UUID, you would have to do that in both implementationsŠnot a
> real big deal thoughŠa quick quick lookup table does the trick there and
> in most cases is still fast enough(ie. Read before write here is ok in a
> lot of cases).
> 
> That X-ref table would simple be rowkey=username and value=users real
> primary key
> 
> Though again, we use ldap and know no one's username is really going to
> change so username is our primary key.
> 
> In my case, a single user can have thousands of requests. In my userCF, I 
> will have just 1 user with uuid X, but I am not sure about what to have in my 
> requestCF.
>  
> -- 
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr

Re: Correct model

Reply via email to