key : stock ID, e.g. AAPL+year column family: closting price and valume, tow CFs. colum name: timestamp LongType
AAPL+2010-> CF:closingPrice -> {'04-13' : 242, '04-14': 245} AAPL+2010-> CF:volume -> {'04-13' : 242, '04-14': 245} On Thu, Apr 22, 2010 at 2:00 AM, Miguel Verde <miguelitov...@gmail.com>wrote: > On Wed, Apr 21, 2010 at 12:17 PM, Steve Lihn <stevel...@gmail.com> wrote: > >> [...] > > > >> Design 1: Each attribute is a super column. Therefore each date is a >> column. So we have: >> >> AAPL -> closingPrice -> { '2010-04-13' : 242, '2010-04-14': 245 } >> AAPL -> volume -> { '2010-04-13' : 10.9m, '2010-04-14': 14.4m } >> etc. >> > I would suggest not using this design, as each query involving an attribute > will pull all dates for that attribute into memory on the server. i.e. > getting the closingPrice for AAPL on '2010-04-13' would pull all closing > prices for AAPL across all dates into memory. > > >> >> Design 2: Each date is a super column. Therefore each attribute is a >> column. So we have: >> >> AAPL -> '2010-04-13' -> { closingPrice -> 242, volume -> 10.9m } >> AAPL -> '2010-04-14' -> {closingPrice -> 245, volume -> 14.4m } >> etc. >> >> The date column / superColumn will need Order Perserving Partitioner since >> we are going to do a lot of range queries. > > > Partitioners split up keys between nodes, the partitioner you use has no > effect on your ability to query columns in a row. > > >> Examples are: >> Query 1: Give me the data between date1 and date2 for a set of tickers >> (say, the 100 tickers in QQQ). >> > You could use http://wiki.apache.org/cassandra/API#multiget_slice for > this. > > >> Query 2: More often than not, the query is: Give me the data for the max >> available dates (for each ticker) between date1 and date2 in a set of >> tickers. >> (Since not every day is traded, and we only want the most recent data, >> given a range of dates.) >> > A http://wiki.apache.org/cassandra/API#SliceRange allows you to specify > limits and ordering for columns you are slicing. > > > > > >