Right, this is an index row per time interval (your previous email was not).
On Sat, Aug 7, 2010 at 11:43 AM, Mark <static.void....@gmail.com> wrote: > On 8/7/10 11:30 AM, Mark wrote: >> >> On 8/7/10 4:22 AM, Thomas Heller wrote: >>>> >>>> Ok, I think the part I was missing was the concatenation of the key and >>>> partition to do the look ups. Is this the preferred way of accomplishing >>>> needs such as this? Are there alternatives ways? >>> >>> Depending on your needs you can concat the row key or use super columns. >>> >>>> How would one then "query" over multiple days? Same question for all >>>> days. >>>> Should I use range_slice or multiget_slice? And if its range_slice does >>>> that >>>> mean I need OrderPreservingPartitioner? >>> >>> The last 3 days is pretty simple: ['2010-08-07', '2010-08-06', >>> '2010-08-05'], as is 7, 31, etc. Just generate the keys in your app >>> and use multiget_slice. >>> >>> If you want to get all days where a specific ip address had some >>> requests you'll just need another CF where the row key is the addr and >>> column names are the days (values optional again). Pretty much the >>> same all over again, just add another CF and insert the data you need. >>> >>> get_range_slice in my experience is better used for "offline" tasks >>> where you really want to process every row there is. >>> >>> /thomas >> >> Ok... as an example using looking up logs by ip for a certain >> timeframe/range would this work? >> >> <ColumnFamily Name="SearchLog"/> >> >> <ColumnFamily Name="IPSearchLog" >> ColumnType="Super" >> CompareWith="UTF8Type" >> CompareSubcolumnsWith="TimeUUIDType"/> >> >> Resulting in a structure like: >> >> { >> "127.0.0.1" : { >> "2010080711" : { >> uuid1 : "" >> uuid2: "" >> uuid3: "" >> } >> "2010080712" : { >> uuid1 : "" >> uuid2: "" >> uuid3: "" >> } >> } >> "some.other.ip" : { >> "2010080711" : { >> uuid1 : "" >> } >> } >> } >> >> Whereas each uuid is the key used for SearchLog. Is there anything wrong >> with this? I know there is a 2 billion column limit but in this case that >> would never be exceeded because each column represents an hour. However does >> the above "schema" imply that for any certain IP there can only be a maxium >> of 2GB of data stored? > > Or should I invert the ip with the time slices? The limitation of this seems > like there can only be 2 billion unique ips per hour which is more than > enough for our application :) > > { > "2010080711" : { > "127.0.0.1" : { > uuid1 : "" > uuid2: "" > uuid3: "" > } > "some.other.ip" : { > uuid1 : "" > uuid2: "" > uuid3: "" > } > } > "2010080712" : { > "127.0.0.1" : { > uuid1 : "" > } > } > } > >