How would you implement range queries?


On 29 May 2013 17:49, Hiller, Dean <dean.hil...@nrel.gov> wrote:

> We recently ran into too much data in one CF because LCS can't really run
> in parallel on one CF in a single tier which got me thinking, why doesn't
> the CF directoy have 100 or 1000 directories 0-999 and cassandra hash the
> key to which directory it would go in and then put it in one of the
> sstables in that directory.  This would lead to
>
>  1.  Parallel compaction of LCS in a single CF !!!!  Yeah, faster
> compactions since there is less to sort in each directory(and it can be
> done in parallel too)
>  2.  Help with fast key lookups as it hashes to one of the 1000
> directories very quickly and then just needs to find the key in one of the
> sstables which are sorted (there would be 1000x less sstables in each
> directory than in one big CF)
>
> Am I on crack here? Or does that seem like it would be a pretty good
> direction to go?
>
> Maybe this is only because our system has 98% of it's data in one CF while
> other systems have 10% of their data in each CF though.  I still tend to
> think a lot of people will end up with 80% of their data in one CF and 20%
> in all the other CF's…isn't pareto's principal a natural tendency and if it
> is, maybe the above feature should be considered?
>
> Later,
> Dean
>



-- 
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
andy.tw...@cs.ox.ac.uk | +447799647538

Reply via email to