> 1). create a column family 'cfrawlog' which stores raw log as received. row 
> key could be 'yyyyddmmhh'(new row is added for each hour or less), each 
> 'column name' is uuid with 'value' is raw log data. Since we are also going 
> to use this log for forensics purpose, so it will help us to have all raw log 
> with in the column family without missing. 
As Moshe said there is a chance of hot spotting if you are sending all writes 
to a certain row. 
You also need to consider how big the row will get, in general stay below about 
30MB. You can go higher but there are some implications. 


> 2). I want to create one more column family which is going to have the parsed 
> log so that we will use this column family to query. my question is How to 
> model this CF so that it will give answer of the above question? what would 
> be the row key for this CF?  
Something like:

row_key: YYYYMMDD
column: <host:timestamp:>

Note, i've not considered how to handle duplicate time stamps from the same host

> 3). Is the above data model makes sense? 
Sort of.
Do some googling for cassandra and log data, look at 
https://github.com/thobbs/logsandra


Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/03/2013, at 4:16 AM, moshe.kr...@barclays.com wrote:

> Row key based on hour will create hot spots for write – for an entire hour, 
> all the writes will be going to the same node, i.e., the node where the row 
> resides. You need to come up with a row key that distributes writes evenly 
> across all your C* nodes, e.g., time concatenated with a sequence counter.
>  
> From: Mohan L [mailto:l.mohan...@gmail.com] 
> Sent: Thursday, March 07, 2013 2:10 PM
> To: user@cassandra.apache.org
> Subject: data model to store large volume syslog
>  
> 
> Dear All,
> 
> I am looking Cassandra to store time series data(mostly syslog). The volume 
> of data is very huge and more entries happening at the same timestamps. each 
> record contain the following fields.
>   
> timestamps:host-name:facility:message
> 
> The below are the things needs to be monitored: 
> 
> 
> 1). Need to get data between time X and Y
> 2). Need to get data between time X and Y for a host-name.
> 3). Need to search a 'pattern' in the message
> 
> the data model design which I am thinking is 
> 
> 1). create a column family 'cfrawlog' which stores raw log as received. row 
> key could be 'yyyyddmmhh'(new row is added for each hour or less), each 
> 'column name' is uuid with 'value' is raw log data. Since we are also going 
> to use this log for forensics purpose, so it will help us to have all raw log 
> with in the column family without missing. 
> 
> 2). I want to create one more column family which is going to have the parsed 
> log so that we will use this column family to query. my question is How to 
> model this CF so that it will give answer of the above question? what would 
> be the row key for this CF? 
> 
> 3). Is the above data model makes sense? 
> 
> Any help and suggestion would be greatly appreciated.
> 
> 
> Thanks
> Mohan L
> 
> 
> _______________________________________________
> 
> This message may contain information that is confidential or privileged. If 
> you are not an intended recipient of this message, please delete it and any 
> attachments, and notify the sender that you have received it in error. Unless 
> specifically stated in the message or otherwise indicated, you may not 
> duplicate, redistribute or forward this message or any portion thereof, 
> including any attachments, by any means to any other person, including any 
> retail investor or customer. This message is not a recommendation, advice, 
> offer or solicitation, to buy/sell any product or service, and is not an 
> official confirmation of any transaction. Any opinions presented are solely 
> those of the author and do not necessarily represent those of Barclays. This 
> message is subject to terms available at: www.barclays.com/emaildisclaimer 
> and, if received from Barclays' Sales or Trading desk, the terms available 
> at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays 
> you consent to the foregoing. Barclays Bank PLC is a company registered in 
> England (number 1026167) with its registered office at 1 Churchill Place, 
> London, E14 5HP. This email may relate to or be sent from other members of 
> the Barclays group.
> 
> _______________________________________________
> 

Reply via email to