I normally suggest trying a model with Standard CF's first as there are some 
down sides to super CF's. If you know there will only be a few sub columns 
there are probably OK (see 
http://wiki.apache.org/cassandra/CassandraLimitations). Your alternative design 
is fine. Test it out and see what works for you. 

Also (and I know not everyone agrees) depending on the use case it's ok to blob 
data up. Cassandra does not *need* to know about the individual properties of 
your entities. By that I mean there is not a query planner that can make better 
decisions about how to execute your query based on data types and 
distributions, or how what types columns should have in projections. 

So an alternative here is to collapse VisitantSessions and Sessions into one, 
and store the session data as a JSON (or similar) blob in the column value. 
This works best if you do not need to concurrently update fields in the entity. 
So if you write the session data once, or if you *always* only update from a 
single thread / process. Or if your data is designed to be overwritten. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2011, at 1:15 AM, Helder Oliveira wrote:

> Thanks Indranath Ghosh for your tip!
> 
> I will continue here the question.
> 
> Aaron, i have read your suggestion and tried to design your suggestion and i 
> have one question regarding it.
> 
> Let's forget for now the Requests and Events!
> 
> Just keep the Visitants and the Sessions.
> 
> My goal is when having a visitant get all informations about him, in this 
> case all his past sessions so i can create a profile using his past.
> 
> You have suggested 3 CF's
> 
> Visitants CF
> key: id
> cn: pn
> cv: pv
> 
> Visitant Sessions CF
> key: visitant id
> cn: session id
> cv: none
> 
> Sessions CF
> key: session id
> cn: pn
> cv: pv
> 
> When i need to know everything about one visitant, i need to query "Visitant 
> Sessions CF" to get all keys, and then query "Sessions CF" for all keys 
> properties.
> 
> In this case, applying a Super Column Family to the "Sessions" isn't better ?
> 
> I mean something like:
> 
> {
>   "sessions": {
> 
>     "visitant id 1": {
>       "session id 1": {
>         "p1": {"p1": "v1"},
>         "p2": {"jira": "v2"}
>       },
>       "session id 2": {
>         "p1": {"p1": "v1"}
>       }
>     }
>       
>     "visitant id 2": {
>       "session id 3": {
>         "p1": {"p1": "v1"},
>         "p2": {"jira": "v2"}
>       },
>       "session id 4": {
>         "p1": {"p1": "v1"}
>       }
>     }
>   }
> }
> 
> Using this, i can get all sessions in the second query, instead of having all 
> sessions only at third query.
> 
> Regarding your notes, the Visitant CF will be almost unchangeable since the 
> beginning of his creation, the sessions will be added every time a known user 
> visits back, ceasing a new sessions.
> 
> Thanks a lot for you help guys, and i hope i was not saying crazy things :D
> 
> On Aug 22, 2011, at 11:23 PM, aaron morton wrote:
> 
>> Lets start with something quick and simple, all standard Column Families…
>> 
>> Visitant CF
>> key: id 
>> column name: property name
>> column value: property value 
>> 
>> Visitant Sessions CF
>> key: visitant id 
>> column name: session id
>> column value: none
>> 
>> Session CF
>> 
>> key: session_id
>> column_name: property value 
>> column_value: property value 
>> 
>> key: session_id/requests
>> column_name: request_id
>> column_value: none
>> 
>> key: session_id/events
>> column_name: event_id
>> column_value: none
>> 
>> Requests CF
>> 
>> key: request_id
>> column_name: property name
>> column_value: property value
>> 
>> Event CF
>> 
>> key: event_id
>> column_name: property name
>> column_value: property value
>> 
>> 
>> Notes:
>> 
>> * assuming the Visitant CF is slowing changing i kept it in it's own cf.  
>> * using compound keys to keep information related to sessions in the same 
>> CF. These could be diff CF's,or in the Request or Event CF. 
>> * the best model is the one that allows you to do your reads by getting one 
>> or a few rows from a single cf. 
>> * you could collapse the Request and Event CF's into one. 
>> 
>> If the event and request data is immutable (or there is no issues with 
>> concurrent modifications) I would recommend this…
>> 
>> Request / Event CF:
>> 
>> key: session_id/events or session_id/requests
>> column_name: event_id or session_id
>> column_value: data
>> 
>> 
>> Start with the simple model and then make changes to better handle your read 
>> queries.
>> 
>> Have fun :)
>> 
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 22/08/2011, at 11:13 PM, Helder Oliveira wrote:
>> 
>>> Hello all,
>>> 
>>> i have a SQL structure like this:
>>> 
>>> Visitant ( has several properties )
>>> Visitant has many Sessions
>>> Sessions ( has several properties )
>>> Sessions has many Requests ( has several properties )
>>> Sessions has many Events ( has several properties )
>>> 
>>> 
>>> i have read a lot and still confused how to put this on cassandra, can 
>>> someone give me a idea ?
>> 
> 

Reply via email to