On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:
> @Todd. Good catch about caching HFile blocks. > > My point still applies though. Caching HFIle blocks on a single node > vs individual "dataums" on N nodes may not be more efficient. Thus > terms like "Slower" and "Less Efficient" could be very misleading. > > Isn't caching only the item more efficient? In cases with high random > read is evicting single keys more efficient then evicting blocks in > terms of memory churn? > > These are difficult questions to answer absolutely so seeing bullet > points such as '#Cassandra has slower this' are oversimplifications of > complex problems. > Definitely complex, especially in a system like Java where memory accounting is often difficult to quantify. Depending on the data structure used for your cache, you are likely to have at least 8-16 bytes of overhead per item in the data structure, more likely much much more. EG we calculate the following overhead for our cache: CONCURRENT_HASHMAP_ENTRY = align(REFERENCE + OBJECT + (3 * REFERENCE) + (2 * Bytes.SIZEOF_INT)); which ends up being something like 48 bytes per entry on a 64-bit JVM. So, if your rows are small (eg 64 bytes), caching a 64KB block with 1000 entries and 64 bytes of overhead is much more RAM-efficient than caching 1000 64-byte rows with 48KB of overhead. I agree, of course, that absolutisms are way oversimplified. Discussions like these that elicit the differences between the systems are productive, though - I think each system can learn things from the other. -Todd