Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-06 Thread Eric Stevens
B would work better in the case where you need to do sequential or ranged style reads on the id, particularly if id has any significant sparseness (eg, id is a timeuuid). You can compute the buckets and do reads of entire buckets within your range. However if you're doing random access by id, the

Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread DuyHai Doan
Another argument for table A is that it leverages a lot Bloom filter for fast lookup. If negative, no disk hit otherwise at most 1 or 2 disk hits depending on the fp chance. Compaction also works better on skinny partition. On Fri, Dec 5, 2014 at 6:33 PM, Tyler Hobbs wrote: > > On Fri, Dec 5, 2

Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Tyler Hobbs
On Fri, Dec 5, 2014 at 11:14 AM, Robert Wille wrote: > > And lets say that bucket is computed as id / N. For analysis purposes, > lets assume I have 100 million id’s to store. > > Table a is obviously going to have a larger bloom filter. That’s a clear > negative. > That's true, *but*, if you

Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Robert Wille
At the data modeling class at the Cassandra Summit, the instructor said that lots of small partitions are just fine. I’ve heard on this list that that is not true, and that its better to cluster small partitions into fewer, larger partitions. Due to conflicting information on this issue, I’d be