Re: Data model question, storing Queue Message

aaron morton Mon, 30 Apr 2012 18:21:22 -0700

> Isn't kafka too young for production using purpose ?
The best way to advance the project is to use it and contribute your experience 
and time.


btw, checking out kafka is a great idea. There are people around having Fun 
Times with Kafka in production

Cheers
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/05/2012, at 3:11 AM, Morgan Segalis wrote:

> Isn't kafka too young for production using purpose ?
> 
> Clearly that would fit much better my needs but I can't afford early stage 
> project not ready for production. Is it ?
> 
> Le 30 avr. 2012 à 14:28, samal <[email protected]> a écrit :
> 
>> 
>> 
>> On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis <[email protected]> wrote:
>> Hi Samal,
>> 
>> Thanks for the TTL feature, I wasn't aware of it's existence.
>> 
>> Day's partitioning will be less wider than month partitionning (about 30 
>> times less give or take ;-) )
>> Per day it should have something like 100 000 messages stored, most of it 
>> would be retrieved so deleted before the TTL feature should come do it's 
>> work.
>> 
>> TTL is the last day column can exist in c-world after that it is deleted. 
>> Deleting before TTL is fine.
>> Have you considered KAFKA http://incubator.apache.org/kafka/ 
>>   
>> 
>>  
>> Le 30 avr. 2012 à 13:16, samal a écrit :
>> 
>>> 
>>> 
>>> On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <[email protected]> wrote:
>>> Hi Aaron,
>>> 
>>> Thank you for your answer, I was beginning to think that my question would 
>>> never be answered ;-)
>>> 
>>> Actually, this is what I was going for, except one thing, instead of 
>>> partitioning row per month, I though about partitioning per day, like that 
>>> everyday I launch the cleaning tool, and it will delete the day from X 
>>> month earlier.
>>> 
>>> USE TTL feature of column as it will remove column after TTL is over (no 
>>> need for manual job). 
>>> 
>>> I guess that will reduce the workload drastically, does it have any 
>>> downside comparing to month partitioning?
>>> 
>>> key belongs to particular node , so depending on size of your data day or 
>>> month wise partitioning matters. Other wise it can lead to Fat row which 
>>> will cause system problem. 
>>> 
>>>  
>>> At one point I was going to do something like the twissandra example, 
>>> Having a CF per User's queue, and another CF per day storing every 
>>> message's ID of the day, in that way If I want to delete them, I only look 
>>> into this row, and delete them using ID's for deleting them in the User's 
>>> queue CF… Is that a good way to do ? Or should I stick with the first 
>>> implementation ?
>>> 
>>> Best regards,
>>> 
>>> Morgan.
>>> 
>>> Le 30 avr. 2012 à 05:52, aaron morton a écrit :
>>> 
>>>> Message Queue is often not a great use case for Cassandra. For information 
>>>> on how to handle high delete workloads see 
>>>> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
>>>> 
>>>> It hard to create a model without some idea of the data load, but I would 
>>>> suggest you start with:
>>>> 
>>>> CF: UserMessages
>>>> Key: ReceiverID
>>>> Columns : column name = TimeUUID ; column value = message ID and Body
>>>> 
>>>> That will order the messages by time. 
>>>> 
>>>> Depending on load (and to support deleting a previous months messages) you 
>>>> may want to partition the rows by month:
>>>> 
>>>> CF: UserMessagesMonth
>>>> Key: ReceiverID+YYYYMM
>>>> Columns : column name = TimeUUID ; column value = message ID and Body
>>>> 
>>>> Everything the same as before. But now a user has a row for each month and 
>>>> which you can delete as a whole. This also helps avoid very big rows. 
>>>> 
>>>>> I really don't think that storage will be an issue, I have 2TB per nodes, 
>>>>> messages are 1KB limited.
>>>> I would suggest you keep the per node limit to 300 to 400 GB. It can take 
>>>> a long time to compact, repair and move the data when it gets above 400GB. 
>>>> 
>>>> Hope that helps. 
>>>> 
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
>>>> 
>>>>> Hi everyone !
>>>>> 
>>>>> I'm fairly new to cassandra and I'm not quite yet familiarized with 
>>>>> column oriented NoSQL model.
>>>>> I have worked a while on it, but I can't seems to find the best model for 
>>>>> what I'm looking for.
>>>>> 
>>>>> I have a Erlang software that let user connecting and communicate with 
>>>>> each others, when an user (A) sends
>>>>> a message to a disconnected user (B), it stores it on the database and 
>>>>> wait for the user (B) to connect and retrieve
>>>>> the message queue, and deletes it. 
>>>>> 
>>>>> Here's some key point : 
>>>>> - Users are identified by integer IDs
>>>>> - Each message are unique by combination of : Sender ID - Receiver ID - 
>>>>> Message ID - time
>>>>> 
>>>>> I have a queue Message, and here's the operations I would need to do as 
>>>>> fast as possible : 
>>>>> 
>>>>> - Store from 1 to X messages per registered user
>>>>> - Get the number of stored messages per user (Can be a incremental 
>>>>> variable updated at each store // this is often retrieved)
>>>>> - retrieve all messages from an user at once.
>>>>> - delete all messages from an user at once.
>>>>> - delete all messages that are older than Y months (from all users).
>>>>> 
>>>>> I really don't think that storage will be an issue, I have 2TB per nodes, 
>>>>> messages are 1KB limited.
>>>>> I'm really looking for speed rather than storage optimization.
>>>>> 
>>>>> My configuration is 2 dedicated server which are both :
>>>>> - 4 x Intel i7 2.66 Ghz
>>>>> - 64 bits
>>>>> - 24 Go
>>>>> - 2 TB
>>>>> 
>>>>> Thank you all.
>>>> 
>>> 
>>> 
>> 
>>

Re: Data model question, storing Queue Message

Reply via email to