> My point still applies though. Caching HFIle blocks on a single node >> vs individual "dataums" on N nodes may not be more efficient. Thus >> terms like "Slower" and "Less Efficient" could be very misleading. >> > I seem to have missed this the first time around. Next time I correct the summary I'll include something about the subtlties of block vs record caching. If you access sparse/random rows, and rows are small, the record caching on multiple machines may in fact be more efficicent than block caching on fewer machines.
That said, the story for pinning ranges of data in memory doesn't seem to change. Another interesting difference has to do with scan vs seek performance. There was one comment about cassandra possibly having better seek performance than hbase because of some hdfs slowness, which was then rumored to be in the works to fix. Anyone have any other comments about scan or seek performance comparisons? Again, I understand Cassandra is not HBase. However, it's useful to be able to compare them (and their designs), so people can understand what might help them choose one over the other. Thanks again!