Thanks,

There is a point here that is very important. The key, is erhhm the key to success. Ie. you must build the key in a way where you can find it again.

In case you create a system for login, you would most likely have the login name as key. ( And maybe here link that to a userid that will be used for keys later on )

Is it correctly understand that instead of having this magic formatted json text in a column, a supercolumn could have serve you?

./Morten

On 20-09-2010 12:37, Juho Mäkinen wrote:
We have built a facebook style "messenger" into our web site which
uses cassandra as storage backend with two column families:
TalkMessages and TalkLastMessages. I've uploaded a screenshot showing
the feature in action to
http://img138.imageshack.us/img138/3807/talkexample.jpg

TalkMessages contains each message between two participants. The key
is a string built from the two users uids "$smaller_uid:$bigger_uid".
Each column inside this CF contains a single message. The column name
is the message timestamp in microseconds since epoch stored as
LongType. The column value is a JSON encoded string containing
following fields: sender_uid, target_uid, msg.

This results in following structure inside the column family.

"2249:9111" =>  [
   12345678 : { sender_uid : 2249, target_uid : 9111, msg : "Hello, how
are you?" },
   12345679 : { sender_uid : 9111, target_uid : 2249, msg : "I'm fine, thanks" }
]

TalkLastMessages is used to quickly fetch users talk partners, the
last message which was sent between the peers and other similar data.
This allows us to quickly fetch all needed data which is needed to
display a "main view" for all online friends with just one query to
cassandra. This column family uses the user uid as is key. Each column
represents a talk partner whom the user has been talking to and it
uses the talk partner uid as the column name. Column value is a json
packed structure which contains following fields:
  - last message timestamp: microseconds since epoch when a message was
last sent between these two users.
  - unread timestamp : microseconds since epoch when the first unread
message was sent between these two users.
  - unread : counter how many unread messages there are.
  - last message : last message between these two users.

This results in following structure inside the column family for these
two example users: 2249 and 9111.

"2249" =>  [
   9111 : { last_message_timestamp : 12345679, unread_timestamp :
12345679, unread : 1, last_message: "I'm fine, thanks" }

],
"9111" =>  [
   2249 : { last_message_timestamp :  12345679, unread_timestamp :
12345679, unread : 0, last_message: "I'm fine, thanks" }
]

Displaying chat (this happends on every page load, needs to be fast)
  1) Fetch all columns from TalkLastMessages for the user

Display messages history between two participants:
  1) Fetch last n columns from TalkMessages for the relevant
"$smaller_uid:$bigger_uid" row.

Mark all sent messages from another participant as read (when you read
the messages)
  1) Get column $sender_uid from row $reader_uid from TalkLastMessages
  2) Update the JSON payload and insert the column back

Sending message involves the following operations:
  1) Insert new column to TalkMessages
  2) Fetch relevant column from TalkLastMessages from $target_uid row
with $sender_uid column
  3) Update the column json payload and insert it back to TalkLastMessages
  4) Fetch relevant column from TalkLastMessages from $sender_uid row
with $target_uid column
  5) Update the column json payload and insert it back to TalkLastMessages

There are also other operations and the actual payload is a bit more complex.

I'm happy to answer questions if somebody is interested :)

  - Juho Mäkinen



On Mon, Sep 20, 2010 at 12:57 PM, Morten Wegelbye Nissen<m...@monit.dk>  wrote:
  Hello List,

No matter where you read, you almost every-where read the the noSQL
datascema is completely different from the relational way - and after a
little insight in cassandra everyone can 2nd that.

But I miss to see some real-life examples on how a real system can be
modelled. Lets take the example for a system where users can send messages
to each other. ( Completely imaginary, noone would use cassandra for a
mailsystem :) )

If one should create such a system, what CF's would be used? And how would
you per example find all not read messages?

./Morten


Reply via email to