B would work better in the case where you need to do sequential or ranged
style reads on the id, particularly if id has any significant sparseness
(eg, id is a timeuuid). You can compute the buckets and do reads of entire
buckets within your range. However if you're doing random access by id,
the
Another argument for table A is that it leverages a lot Bloom filter for
fast lookup. If negative, no disk hit otherwise at most 1 or 2 disk hits
depending on the fp chance.
Compaction also works better on skinny partition.
On Fri, Dec 5, 2014 at 6:33 PM, Tyler Hobbs wrote:
>
> On Fri, Dec 5, 2
On Fri, Dec 5, 2014 at 11:14 AM, Robert Wille wrote:
>
> And lets say that bucket is computed as id / N. For analysis purposes,
> lets assume I have 100 million id’s to store.
>
> Table a is obviously going to have a larger bloom filter. That’s a clear
> negative.
>
That's true, *but*, if you
At the data modeling class at the Cassandra Summit, the instructor said that
lots of small partitions are just fine. I’ve heard on this list that that is
not true, and that its better to cluster small partitions into fewer, larger
partitions. Due to conflicting information on this issue, I’d be