Isn't kafka too young for production using purpose ? Clearly that would fit much better my needs but I can't afford early stage project not ready for production. Is it ?
Le 30 avr. 2012 à 14:28, samal <samalgo...@gmail.com> a écrit : > > > On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis <msega...@gmail.com> wrote: > Hi Samal, > > Thanks for the TTL feature, I wasn't aware of it's existence. > > Day's partitioning will be less wider than month partitionning (about 30 > times less give or take ;-) ) > Per day it should have something like 100 000 messages stored, most of it > would be retrieved so deleted before the TTL feature should come do it's work. > > TTL is the last day column can exist in c-world after that it is deleted. > Deleting before TTL is fine. > Have you considered KAFKA http://incubator.apache.org/kafka/ > > > > Le 30 avr. 2012 à 13:16, samal a écrit : > >> >> >> On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <msega...@gmail.com> wrote: >> Hi Aaron, >> >> Thank you for your answer, I was beginning to think that my question would >> never be answered ;-) >> >> Actually, this is what I was going for, except one thing, instead of >> partitioning row per month, I though about partitioning per day, like that >> everyday I launch the cleaning tool, and it will delete the day from X month >> earlier. >> >> USE TTL feature of column as it will remove column after TTL is over (no >> need for manual job). >> >> I guess that will reduce the workload drastically, does it have any downside >> comparing to month partitioning? >> >> key belongs to particular node , so depending on size of your data day or >> month wise partitioning matters. Other wise it can lead to Fat row which >> will cause system problem. >> >> >> At one point I was going to do something like the twissandra example, Having >> a CF per User's queue, and another CF per day storing every message's ID of >> the day, in that way If I want to delete them, I only look into this row, >> and delete them using ID's for deleting them in the User's queue CF… Is that >> a good way to do ? Or should I stick with the first implementation ? >> >> Best regards, >> >> Morgan. >> >> Le 30 avr. 2012 à 05:52, aaron morton a écrit : >> >>> Message Queue is often not a great use case for Cassandra. For information >>> on how to handle high delete workloads see >>> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra >>> >>> It hard to create a model without some idea of the data load, but I would >>> suggest you start with: >>> >>> CF: UserMessages >>> Key: ReceiverID >>> Columns : column name = TimeUUID ; column value = message ID and Body >>> >>> That will order the messages by time. >>> >>> Depending on load (and to support deleting a previous months messages) you >>> may want to partition the rows by month: >>> >>> CF: UserMessagesMonth >>> Key: ReceiverID+YYYYMM >>> Columns : column name = TimeUUID ; column value = message ID and Body >>> >>> Everything the same as before. But now a user has a row for each month and >>> which you can delete as a whole. This also helps avoid very big rows. >>> >>>> I really don't think that storage will be an issue, I have 2TB per nodes, >>>> messages are 1KB limited. >>> I would suggest you keep the per node limit to 300 to 400 GB. It can take a >>> long time to compact, repair and move the data when it gets above 400GB. >>> >>> Hope that helps. >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote: >>> >>>> Hi everyone ! >>>> >>>> I'm fairly new to cassandra and I'm not quite yet familiarized with column >>>> oriented NoSQL model. >>>> I have worked a while on it, but I can't seems to find the best model for >>>> what I'm looking for. >>>> >>>> I have a Erlang software that let user connecting and communicate with >>>> each others, when an user (A) sends >>>> a message to a disconnected user (B), it stores it on the database and >>>> wait for the user (B) to connect and retrieve >>>> the message queue, and deletes it. >>>> >>>> Here's some key point : >>>> - Users are identified by integer IDs >>>> - Each message are unique by combination of : Sender ID - Receiver ID - >>>> Message ID - time >>>> >>>> I have a queue Message, and here's the operations I would need to do as >>>> fast as possible : >>>> >>>> - Store from 1 to X messages per registered user >>>> - Get the number of stored messages per user (Can be a incremental >>>> variable updated at each store // this is often retrieved) >>>> - retrieve all messages from an user at once. >>>> - delete all messages from an user at once. >>>> - delete all messages that are older than Y months (from all users). >>>> >>>> I really don't think that storage will be an issue, I have 2TB per nodes, >>>> messages are 1KB limited. >>>> I'm really looking for speed rather than storage optimization. >>>> >>>> My configuration is 2 dedicated server which are both : >>>> - 4 x Intel i7 2.66 Ghz >>>> - 64 bits >>>> - 24 Go >>>> - 2 TB >>>> >>>> Thank you all. >>> >> >> > >