> Isn't kafka too young for production using purpose ? The best way to advance the project is to use it and contribute your experience and time.
btw, checking out kafka is a great idea. There are people around having Fun Times with Kafka in production Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/05/2012, at 3:11 AM, Morgan Segalis wrote: > Isn't kafka too young for production using purpose ? > > Clearly that would fit much better my needs but I can't afford early stage > project not ready for production. Is it ? > > Le 30 avr. 2012 à 14:28, samal <samalgo...@gmail.com> a écrit : > >> >> >> On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis <msega...@gmail.com> wrote: >> Hi Samal, >> >> Thanks for the TTL feature, I wasn't aware of it's existence. >> >> Day's partitioning will be less wider than month partitionning (about 30 >> times less give or take ;-) ) >> Per day it should have something like 100 000 messages stored, most of it >> would be retrieved so deleted before the TTL feature should come do it's >> work. >> >> TTL is the last day column can exist in c-world after that it is deleted. >> Deleting before TTL is fine. >> Have you considered KAFKA http://incubator.apache.org/kafka/ >> >> >> >> Le 30 avr. 2012 à 13:16, samal a écrit : >> >>> >>> >>> On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <msega...@gmail.com> wrote: >>> Hi Aaron, >>> >>> Thank you for your answer, I was beginning to think that my question would >>> never be answered ;-) >>> >>> Actually, this is what I was going for, except one thing, instead of >>> partitioning row per month, I though about partitioning per day, like that >>> everyday I launch the cleaning tool, and it will delete the day from X >>> month earlier. >>> >>> USE TTL feature of column as it will remove column after TTL is over (no >>> need for manual job). >>> >>> I guess that will reduce the workload drastically, does it have any >>> downside comparing to month partitioning? >>> >>> key belongs to particular node , so depending on size of your data day or >>> month wise partitioning matters. Other wise it can lead to Fat row which >>> will cause system problem. >>> >>> >>> At one point I was going to do something like the twissandra example, >>> Having a CF per User's queue, and another CF per day storing every >>> message's ID of the day, in that way If I want to delete them, I only look >>> into this row, and delete them using ID's for deleting them in the User's >>> queue CF… Is that a good way to do ? Or should I stick with the first >>> implementation ? >>> >>> Best regards, >>> >>> Morgan. >>> >>> Le 30 avr. 2012 à 05:52, aaron morton a écrit : >>> >>>> Message Queue is often not a great use case for Cassandra. For information >>>> on how to handle high delete workloads see >>>> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra >>>> >>>> It hard to create a model without some idea of the data load, but I would >>>> suggest you start with: >>>> >>>> CF: UserMessages >>>> Key: ReceiverID >>>> Columns : column name = TimeUUID ; column value = message ID and Body >>>> >>>> That will order the messages by time. >>>> >>>> Depending on load (and to support deleting a previous months messages) you >>>> may want to partition the rows by month: >>>> >>>> CF: UserMessagesMonth >>>> Key: ReceiverID+YYYYMM >>>> Columns : column name = TimeUUID ; column value = message ID and Body >>>> >>>> Everything the same as before. But now a user has a row for each month and >>>> which you can delete as a whole. This also helps avoid very big rows. >>>> >>>>> I really don't think that storage will be an issue, I have 2TB per nodes, >>>>> messages are 1KB limited. >>>> I would suggest you keep the per node limit to 300 to 400 GB. It can take >>>> a long time to compact, repair and move the data when it gets above 400GB. >>>> >>>> Hope that helps. >>>> >>>> ----------------- >>>> Aaron Morton >>>> Freelance Developer >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> >>>> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote: >>>> >>>>> Hi everyone ! >>>>> >>>>> I'm fairly new to cassandra and I'm not quite yet familiarized with >>>>> column oriented NoSQL model. >>>>> I have worked a while on it, but I can't seems to find the best model for >>>>> what I'm looking for. >>>>> >>>>> I have a Erlang software that let user connecting and communicate with >>>>> each others, when an user (A) sends >>>>> a message to a disconnected user (B), it stores it on the database and >>>>> wait for the user (B) to connect and retrieve >>>>> the message queue, and deletes it. >>>>> >>>>> Here's some key point : >>>>> - Users are identified by integer IDs >>>>> - Each message are unique by combination of : Sender ID - Receiver ID - >>>>> Message ID - time >>>>> >>>>> I have a queue Message, and here's the operations I would need to do as >>>>> fast as possible : >>>>> >>>>> - Store from 1 to X messages per registered user >>>>> - Get the number of stored messages per user (Can be a incremental >>>>> variable updated at each store // this is often retrieved) >>>>> - retrieve all messages from an user at once. >>>>> - delete all messages from an user at once. >>>>> - delete all messages that are older than Y months (from all users). >>>>> >>>>> I really don't think that storage will be an issue, I have 2TB per nodes, >>>>> messages are 1KB limited. >>>>> I'm really looking for speed rather than storage optimization. >>>>> >>>>> My configuration is 2 dedicated server which are both : >>>>> - 4 x Intel i7 2.66 Ghz >>>>> - 64 bits >>>>> - 24 Go >>>>> - 2 TB >>>>> >>>>> Thank you all. >>>> >>> >>> >> >>