On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <msega...@gmail.com> wrote:
> Hi Aaron, > > Thank you for your answer, I was beginning to think that my question would > never be answered ;-) > > Actually, this is what I was going for, except one thing, instead of > partitioning row per month, I though about partitioning per day, like that > everyday I launch the cleaning tool, and it will delete the day from X > month earlier. > USE TTL feature of column as it will remove column after TTL is over (no need for manual job). I guess that will reduce the workload drastically, does it have any > downside comparing to month partitioning? > key belongs to particular node , so depending on size of your data day or month wise partitioning matters. Other wise it can lead to Fat row which will cause system problem. > At one point I was going to do something like the twissandra example, > Having a CF per User's queue, and another CF per day storing every > message's ID of the day, in that way If I want to delete them, I only look > into this row, and delete them using ID's for deleting them in the User's > queue CF… Is that a good way to do ? Or should I stick with the first > implementation ? > > Best regards, > > Morgan. > > Le 30 avr. 2012 à 05:52, aaron morton a écrit : > > Message Queue is often not a great use case for Cassandra. For information > on how to handle high delete workloads see > http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra > > It hard to create a model without some idea of the data load, but I would > suggest you start with: > > CF: UserMessages > Key: ReceiverID > Columns : column name = TimeUUID ; column value = message ID and Body > > That will order the messages by time. > > Depending on load (and to support deleting a previous months messages) you > may want to partition the rows by month: > > CF: UserMessagesMonth > Key: ReceiverID+YYYYMM > Columns : column name = TimeUUID ; column value = message ID and Body > > Everything the same as before. But now a user has a row for each month and > which you can delete as a whole. This also helps avoid very big rows. > > I really don't think that storage will be an issue, I have 2TB per nodes, > messages are 1KB limited. > > I would suggest you keep the per node limit to 300 to 400 GB. It can take > a long time to compact, repair and move the data when it gets above 400GB. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 27/04/2012, at 1:30 AM, Morgan Segalis wrote: > > Hi everyone ! > > I'm fairly new to cassandra and I'm not quite yet familiarized with column > oriented NoSQL model. > I have worked a while on it, but I can't seems to find the best model for > what I'm looking for. > > I have a Erlang software that let user connecting and communicate with > each others, when an user (A) sends > a message to a disconnected user (B), it stores it on the database and > wait for the user (B) to connect and retrieve > the message queue, and deletes it. > > Here's some key point : > - Users are identified by integer IDs > - Each message are unique by combination of : Sender ID - Receiver ID - > Message ID - time > > I have a queue Message, and here's the operations I would need to do as > fast as possible : > > - Store from 1 to X messages per registered user > - Get the number of stored messages per user (Can be a incremental > variable updated at each store // this is often retrieved) > - retrieve all messages from an user at once. > - delete all messages from an user at once. > - delete all messages that are older than Y months (from all users). > > I really don't think that storage will be an issue, I have 2TB per nodes, > messages are 1KB limited. > I'm really looking for speed rather than storage optimization. > > My configuration is 2 dedicated server which are both : > - 4 x Intel i7 2.66 Ghz > - 64 bits > - 24 Go > - 2 TB > > Thank you all. > > > >