Re: Data model question, storing Queue Message

Morgan Segalis Mon, 30 Apr 2012 08:11:57 -0700

Isn't kafka too young for production using purpose ?

Clearly that would fit much better my needs but I can't afford early stage 
project not ready for production. Is it ?


Le 30 avr. 2012 à 14:28, samal <samalgo...@gmail.com> a écrit :

> 
> 
> On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis <msega...@gmail.com> wrote:
> Hi Samal,
> 
> Thanks for the TTL feature, I wasn't aware of it's existence.
> 
> Day's partitioning will be less wider than month partitionning (about 30 
> times less give or take ;-) )
> Per day it should have something like 100 000 messages stored, most of it 
> would be retrieved so deleted before the TTL feature should come do it's work.
> 
> TTL is the last day column can exist in c-world after that it is deleted. 
> Deleting before TTL is fine.
> Have you considered KAFKA http://incubator.apache.org/kafka/ 
>   
> 
>  
> Le 30 avr. 2012 à 13:16, samal a écrit :
> 
>> 
>> 
>> On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <msega...@gmail.com> wrote:
>> Hi Aaron,
>> 
>> Thank you for your answer, I was beginning to think that my question would 
>> never be answered ;-)
>> 
>> Actually, this is what I was going for, except one thing, instead of 
>> partitioning row per month, I though about partitioning per day, like that 
>> everyday I launch the cleaning tool, and it will delete the day from X month 
>> earlier.
>> 
>> USE TTL feature of column as it will remove column after TTL is over (no 
>> need for manual job). 
>> 
>> I guess that will reduce the workload drastically, does it have any downside 
>> comparing to month partitioning?
>> 
>> key belongs to particular node , so depending on size of your data day or 
>> month wise partitioning matters. Other wise it can lead to Fat row which 
>> will cause system problem. 
>> 
>>  
>> At one point I was going to do something like the twissandra example, Having 
>> a CF per User's queue, and another CF per day storing every message's ID of 
>> the day, in that way If I want to delete them, I only look into this row, 
>> and delete them using ID's for deleting them in the User's queue CF… Is that 
>> a good way to do ? Or should I stick with the first implementation ?
>> 
>> Best regards,
>> 
>> Morgan.
>> 
>> Le 30 avr. 2012 à 05:52, aaron morton a écrit :
>> 
>>> Message Queue is often not a great use case for Cassandra. For information 
>>> on how to handle high delete workloads see 
>>> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
>>> 
>>> It hard to create a model without some idea of the data load, but I would 
>>> suggest you start with:
>>> 
>>> CF: UserMessages
>>> Key: ReceiverID
>>> Columns : column name = TimeUUID ; column value = message ID and Body
>>> 
>>> That will order the messages by time. 
>>> 
>>> Depending on load (and to support deleting a previous months messages) you 
>>> may want to partition the rows by month:
>>> 
>>> CF: UserMessagesMonth
>>> Key: ReceiverID+YYYYMM
>>> Columns : column name = TimeUUID ; column value = message ID and Body
>>> 
>>> Everything the same as before. But now a user has a row for each month and 
>>> which you can delete as a whole. This also helps avoid very big rows. 
>>> 
>>>> I really don't think that storage will be an issue, I have 2TB per nodes, 
>>>> messages are 1KB limited.
>>> I would suggest you keep the per node limit to 300 to 400 GB. It can take a 
>>> long time to compact, repair and move the data when it gets above 400GB. 
>>> 
>>> Hope that helps. 
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
>>> 
>>>> Hi everyone !
>>>> 
>>>> I'm fairly new to cassandra and I'm not quite yet familiarized with column 
>>>> oriented NoSQL model.
>>>> I have worked a while on it, but I can't seems to find the best model for 
>>>> what I'm looking for.
>>>> 
>>>> I have a Erlang software that let user connecting and communicate with 
>>>> each others, when an user (A) sends
>>>> a message to a disconnected user (B), it stores it on the database and 
>>>> wait for the user (B) to connect and retrieve
>>>> the message queue, and deletes it. 
>>>> 
>>>> Here's some key point : 
>>>> - Users are identified by integer IDs
>>>> - Each message are unique by combination of : Sender ID - Receiver ID - 
>>>> Message ID - time
>>>> 
>>>> I have a queue Message, and here's the operations I would need to do as 
>>>> fast as possible : 
>>>> 
>>>> - Store from 1 to X messages per registered user
>>>> - Get the number of stored messages per user (Can be a incremental 
>>>> variable updated at each store // this is often retrieved)
>>>> - retrieve all messages from an user at once.
>>>> - delete all messages from an user at once.
>>>> - delete all messages that are older than Y months (from all users).
>>>> 
>>>> I really don't think that storage will be an issue, I have 2TB per nodes, 
>>>> messages are 1KB limited.
>>>> I'm really looking for speed rather than storage optimization.
>>>> 
>>>> My configuration is 2 dedicated server which are both :
>>>> - 4 x Intel i7 2.66 Ghz
>>>> - 64 bits
>>>> - 24 Go
>>>> - 2 TB
>>>> 
>>>> Thank you all.
>>> 
>> 
>> 
> 
>

Re: Data model question, storing Queue Message

Reply via email to