How would you implement range queries?
On 29 May 2013 17:49, Hiller, Dean <dean.hil...@nrel.gov> wrote: > We recently ran into too much data in one CF because LCS can't really run > in parallel on one CF in a single tier which got me thinking, why doesn't > the CF directoy have 100 or 1000 directories 0-999 and cassandra hash the > key to which directory it would go in and then put it in one of the > sstables in that directory. This would lead to > > 1. Parallel compaction of LCS in a single CF !!!! Yeah, faster > compactions since there is less to sort in each directory(and it can be > done in parallel too) > 2. Help with fast key lookups as it hashes to one of the 1000 > directories very quickly and then just needs to find the key in one of the > sstables which are sorted (there would be 1000x less sstables in each > directory than in one big CF) > > Am I on crack here? Or does that seem like it would be a pretty good > direction to go? > > Maybe this is only because our system has 98% of it's data in one CF while > other systems have 10% of their data in each CF though. I still tend to > think a lot of people will end up with 80% of their data in one CF and 20% > in all the other CF's…isn't pareto's principal a natural tendency and if it > is, maybe the above feature should be considered? > > Later, > Dean > -- Dr Andy Twigg Junior Research Fellow, St Johns College, Oxford Room 351, Department of Computer Science http://www.cs.ox.ac.uk/people/andy.twigg/ andy.tw...@cs.ox.ac.uk | +447799647538