Yes, but as I said it may not be the optimal design. You may end up with a single row very big row.

- you could use multiple rows, each holding a range of counts.

- you could use a standard CF and store the count in the row key, then use get_range_slices. Using the random partition you will need to sort them yourself, if you use the Order preserving Partitioner they will be sorted for you.
e.g. {
 SearchLogs:
  999 : {word1:word1}
  998 : {word2 : word2}
}

get_range_slices over the RandomPartioner has some performance issues when compared to OrderPerservingPartioner. But I think the feature returns the same data, just out of order. Try some experiments and see what happens.

Do you want to read back a portion of the index (e.g. words with 800 to 900 occurrences) or the entire index ?
Aaron


On 30 Jul, 2010,at 10:04 AM, Mark <static.void....@gmail.com> wrote:

Ok so basically an "array" of words grouped by their count?

Something like this?

{
SearchLogs : {
ALL : {
999: { word1:word1, word2:word2, word3:word3 }
998: { word1:word1, word2:word2, word3:word3 }
}
}
}

On 7/29/10 2:50 PM, Aaron Morton wrote:
> One method would be to use a Super Column Family. Have one row, in
> that create a column family for each count value you have, and then in
> the super column create a column for each word.
>
> Set the CompareWith for the super col to be LongType and the
> CompareSubcolumnsWith to be AsciiTyoe or UTFType.
>
> You could then use get_slice to read super columns in that row.
>
> This may not be the most efficient model, it will depend how how much
> data you have and what your read patterns are like. Also be remember
> that pre 0.7 you cannot atomically increment counters in cassandra.
>
> Have a play and see what works for you.
>
> Aaron
>
> On 29 Jul, 2010,at 02:36 PM, Mark <static.void....@gmail.com> wrote:
>
>> I know there is no native support for "order by", "group by" etc but I
>> was wondering how it could be accomplished with some custom indexes?
>>
>> For example, say I have a list of word counts like (notice 2 words have
>> the same count):
>>
>> "cassandra" => 100
>> "foo" => 999
>> "bar" => 1
>> "baz" => 500
>> "fooz" => 999
>>
>> How can I store then retrieve these words ordered by their count/values?
>>
>> Thanks.

Reply via email to