Row key based on hour will create hot spots for write - for an entire hour, all the writes will be going to the same node, i.e., the node where the row resides. You need to come up with a row key that distributes writes evenly across all your C* nodes, e.g., time concatenated with a sequence counter.
From: Mohan L [mailto:l.mohan...@gmail.com] Sent: Thursday, March 07, 2013 2:10 PM To: user@cassandra.apache.org Subject: data model to store large volume syslog Dear All, I am looking Cassandra to store time series data(mostly syslog). The volume of data is very huge and more entries happening at the same timestamps. each record contain the following fields. timestamps:host-name:facility:message The below are the things needs to be monitored: 1). Need to get data between time X and Y 2). Need to get data between time X and Y for a host-name. 3). Need to search a 'pattern' in the message the data model design which I am thinking is 1). create a column family 'cfrawlog' which stores raw log as received. row key could be 'yyyyddmmhh'(new row is added for each hour or less), each 'column name' is uuid with 'value' is raw log data. Since we are also going to use this log for forensics purpose, so it will help us to have all raw log with in the column family without missing. 2). I want to create one more column family which is going to have the parsed log so that we will use this column family to query. my question is How to model this CF so that it will give answer of the above question? what would be the row key for this CF? 3). Is the above data model makes sense? Any help and suggestion would be greatly appreciated. Thanks Mohan L _______________________________________________ This message may contain information that is confidential or privileged. If you are not an intended recipient of this message, please delete it and any attachments, and notify the sender that you have received it in error. Unless specifically stated in the message or otherwise indicated, you may not uplicate, redistribute or forward this message or any portion thereof, including any attachments, by any means to any other person, including any retail investor or customer. This message is not a recommendation, advice, offer or solicitation, to buy/sell any product or service, and is not an official confirmation of any transaction. Any opinions presented are solely those of the author and do not necessarily represent those of Barclays. This message is subject to terms available at: www.barclays.com/emaildisclaimer and, if received from Barclays' Sales or Trading desk, the terms available at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays you consent to the foregoing. Barclays Bank PLC is a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays group. _______________________________________________