Re: Cassandra data model for financial data

2010-05-22 Thread Jonathan Ellis
On Sat, May 22, 2010 at 5:59 AM, Steve Lihn wrote: > This is an indexing question. If I have a structure like > > RowKey => { Col => val } > > is Col indexed (assuming I will have a lot of columns)? yes > On the other hand, if I have a structure like > > RowKey => CF => { col => val } you mean

Re: Cassandra data model for financial data

2010-05-22 Thread Steve Lihn
This is an indexing question. If I have a structure like RowKey => { Col => val } is Col indexed (assuming I will have a lot of columns)? On the other hand, if I have a structure like RowKey => CF => { col => val } which components are indexed in addition to RowKey? Thanks, Steve

Re: Cassandra data model for financial data

2010-05-13 Thread Steve Lihn
For what I have to handle, yes, there are a lot of attributes (daily) in addition to the daily prices (OHLC). At securities level, SharesOutstanding, TradedVolume, ShortInterest. At the company level, even more - MarketCap, DilutedSharesOutstanding, P/E, P/B, DividendYield, etc, etc.. Seems like ea

Re: Cassandra data model for financial data

2010-05-13 Thread Miguel Verde
I agree that it's more normal in a columnar store than in an RDBMS, but in my experience modelling similar data, the vast majority of the time I want all of {high, low, close, volume} and optimizing for that would be my goal. It does seem like Steve has more expansive attributes to track (e.g. shar

Re: Cassandra data model for financial data

2010-05-13 Thread Benjamin Black
On Thu, May 13, 2010 at 12:45 PM, Miguel Verde wrote: > I also think that's not a good design, but only because the typical query > would have to hit several column families instead of just one. > This is completely normal in a columnar store. You query at least one index CF, then use the respon

Re: Cassandra data model for financial data

2010-05-13 Thread Miguel Verde
I also think that's not a good design, but only because the typical query would have to hit several column families instead of just one. To answer your question, use a http://wiki.apache.org/cassandra/API#KeyRange which includes AAPL across all years you might want in your http://wiki.apache.org/c

Re: Cassandra data model for financial data

2010-05-13 Thread Steve Lihn
I am not sure this is a good design in Cassandra. What if I just want to get all the data points for AAPL? Since AAPL is not a key, how does Cassandra get the data if I don't provide the years? On Thu, Apr 29, 2010 at 1:09 AM, Schubert Zhang wrote: > key : stock ID, e.g. AAPL+year > column fam

Re: Cassandra data model for financial data

2010-04-30 Thread Rob Coli
On 4/30/10 6:36 AM, Jonathan Ellis wrote: each row has a [column] index and bloom filter of column names, and then there is the overhead of the java objects. In addition to the aforementioned row column index, there's also the row key index, which is an int and a key-length-(string now/byte[]

Re: Cassandra data model for financial data

2010-04-30 Thread Jonathan Ellis
each row has an index and bloom filter of column names, and then there is the overhead of the java objects. On Thu, Apr 29, 2010 at 11:05 PM, Andrew Nguyen wrote: > When making rough calculations regarding the potential size of a single row, > what sort of overhead is there to consider?  In other

Re: Cassandra data model for financial data

2010-04-29 Thread Andrew Nguyen
When making rough calculations regarding the potential size of a single row, what sort of overhead is there to consider? In other words, for a particular column, what else is there to consider in terms of memory consumption besides the value itself? On Apr 29, 2010, at 8:49 AM, Mark Jones wrot

RE: Cassandra data model for financial data

2010-04-29 Thread Mark Jones
At the moment they all have to fit in memory during compaction. Columns OR SuperColumns (for one Key). From: Andrew Nguyen [mailto:andrew-lists-cassan...@ucsfcti.org] Sent: Thursday, April 29, 2010 10:30 AM To: user@cassandra.apache.org Subject: Re: Cassandra data model for financial data What

Re: Cassandra data model for financial data

2010-04-29 Thread Andrew Nguyen
What is the upper limit on the number of super columns? Is it pretty much the same as for columns in general? On Apr 28, 2010, at 10:09 PM, Schubert Zhang wrote: > key : stock ID, e.g. AAPL+year > column family: closting price and valume, tow CFs. > colum name: timestamp LongType > > AAPL+201

Re: Cassandra data model for financial data

2010-04-28 Thread Schubert Zhang
key : stock ID, e.g. AAPL+year column family: closting price and valume, tow CFs. colum name: timestamp LongType AAPL+2010-> CF:closingPrice -> {'04-13' : 242, '04-14': 245} AAPL+2010-> CF:volume -> {'04-13' : 242, '04-14': 245} On Thu, Apr 22, 2010 at 2:00 AM, Miguel Verde wrote: > On Wed, Ap

Re: Cassandra data model for financial data

2010-04-21 Thread Miguel Verde
On Wed, Apr 21, 2010 at 12:17 PM, Steve Lihn wrote: > [...] > Design 1: Each attribute is a super column. Therefore each date is a > column. So we have: > > AAPL -> closingPrice -> { '2010-04-13' : 242, '2010-04-14': 245 } > AAPL -> volume -> { '2010-04-13' : 10.9m, '2010-04-14': 14.4m } > etc

Re: Cassandra data model for financial data

2010-04-21 Thread JKnight JKnight
I know Cassandra is very flexible. a. Because of super_column can not contain large number of columns, you should not use design 1 b. Maybe with each query, you have to separate to each ColumnFamily On Wed, Apr 21, 2010 at 1:17 PM, Steve Lihn wrote: > Hi, > I am new to Cassandra. I would like to