> 1). create a column family 'cfrawlog' which stores raw log as received. row > key could be 'yyyyddmmhh'(new row is added for each hour or less), each > 'column name' is uuid with 'value' is raw log data. Since we are also going > to use this log for forensics purpose, so it will help us to have all raw log > with in the column family without missing. As Moshe said there is a chance of hot spotting if you are sending all writes to a certain row. You also need to consider how big the row will get, in general stay below about 30MB. You can go higher but there are some implications.
> 2). I want to create one more column family which is going to have the parsed > log so that we will use this column family to query. my question is How to > model this CF so that it will give answer of the above question? what would > be the row key for this CF? Something like: row_key: YYYYMMDD column: <host:timestamp:> Note, i've not considered how to handle duplicate time stamps from the same host > 3). Is the above data model makes sense? Sort of. Do some googling for cassandra and log data, look at https://github.com/thobbs/logsandra Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 7/03/2013, at 4:16 AM, moshe.kr...@barclays.com wrote: > Row key based on hour will create hot spots for write – for an entire hour, > all the writes will be going to the same node, i.e., the node where the row > resides. You need to come up with a row key that distributes writes evenly > across all your C* nodes, e.g., time concatenated with a sequence counter. > > From: Mohan L [mailto:l.mohan...@gmail.com] > Sent: Thursday, March 07, 2013 2:10 PM > To: user@cassandra.apache.org > Subject: data model to store large volume syslog > > > Dear All, > > I am looking Cassandra to store time series data(mostly syslog). The volume > of data is very huge and more entries happening at the same timestamps. each > record contain the following fields. > > timestamps:host-name:facility:message > > The below are the things needs to be monitored: > > > 1). Need to get data between time X and Y > 2). Need to get data between time X and Y for a host-name. > 3). Need to search a 'pattern' in the message > > the data model design which I am thinking is > > 1). create a column family 'cfrawlog' which stores raw log as received. row > key could be 'yyyyddmmhh'(new row is added for each hour or less), each > 'column name' is uuid with 'value' is raw log data. Since we are also going > to use this log for forensics purpose, so it will help us to have all raw log > with in the column family without missing. > > 2). I want to create one more column family which is going to have the parsed > log so that we will use this column family to query. my question is How to > model this CF so that it will give answer of the above question? what would > be the row key for this CF? > > 3). Is the above data model makes sense? > > Any help and suggestion would be greatly appreciated. > > > Thanks > Mohan L > > > _______________________________________________ > > This message may contain information that is confidential or privileged. If > you are not an intended recipient of this message, please delete it and any > attachments, and notify the sender that you have received it in error. Unless > specifically stated in the message or otherwise indicated, you may not > duplicate, redistribute or forward this message or any portion thereof, > including any attachments, by any means to any other person, including any > retail investor or customer. This message is not a recommendation, advice, > offer or solicitation, to buy/sell any product or service, and is not an > official confirmation of any transaction. Any opinions presented are solely > those of the author and do not necessarily represent those of Barclays. This > message is subject to terms available at: www.barclays.com/emaildisclaimer > and, if received from Barclays' Sales or Trading desk, the terms available > at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays > you consent to the foregoing. Barclays Bank PLC is a company registered in > England (number 1026167) with its registered office at 1 Churchill Place, > London, E14 5HP. This email may relate to or be sent from other members of > the Barclays group. > > _______________________________________________ >