Row key based on hour will create hot spots for write - for an entire hour, all 
the writes will be going to the same node, i.e., the node where the row 
resides. You need to come up with a row key that distributes writes evenly 
across all your C* nodes, e.g., time concatenated with a sequence counter.

From: Mohan L [mailto:l.mohan...@gmail.com]
Sent: Thursday, March 07, 2013 2:10 PM
To: user@cassandra.apache.org
Subject: data model to store large volume syslog


Dear All,

I am looking Cassandra to store time series data(mostly syslog). The volume of 
data is very huge and more entries happening at the same timestamps. each 
record contain the following fields.

timestamps:host-name:facility:message

The below are the things needs to be monitored:


1). Need to get data between time X and Y
2). Need to get data between time X and Y for a host-name.
3). Need to search a 'pattern' in the message

the data model design which I am thinking is

1). create a column family 'cfrawlog' which stores raw log as received. row key 
could be 'yyyyddmmhh'(new row is added for each hour or less), each 'column 
name' is uuid with 'value' is raw log data. Since we are also going to use this 
log for forensics purpose, so it will help us to have all raw log with in the 
column family without missing.

2). I want to create one more column family which is going to have the parsed 
log so that we will use this column family to query. my question is How to 
model this CF so that it will give answer of the above question? what would be 
the row key for this CF?

3). Is the above data model makes sense?

Any help and suggestion would be greatly appreciated.


Thanks
Mohan L


_______________________________________________

This message may contain information that is confidential or privileged. If you 
are not an intended recipient of this message, please delete it and any 
attachments, and notify the sender that you have received it in error. Unless 
specifically stated in the message or otherwise indicated, you may not 
uplicate, redistribute or forward this message or any portion thereof, 
including any attachments, by any means to any other person, including any 
retail investor or customer. This message is not a recommendation, advice, 
offer or solicitation, to buy/sell any product or service, and is not an 
official confirmation of any transaction. Any opinions presented are solely 
those of the author and do not necessarily represent those of Barclays.

This message is subject to terms available at: www.barclays.com/emaildisclaimer 
and, if received from Barclays' Sales or Trading desk, the terms available at: 
www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays you 
consent to the foregoing. Barclays Bank PLC is a company registered in England 
(number 1026167) with its registered office at 1 Churchill Place, London, E14 
5HP. This email may relate to or be sent from other members of the Barclays 
group.

_______________________________________________

Reply via email to