> My point still applies though. Caching HFIle blocks on a single node
>> vs individual "dataums" on N nodes may not be more efficient. Thus
>> terms like "Slower" and "Less Efficient" could be very misleading.
>>
>
I seem to have missed this the first time around. Next time I correct the
summary I'll include something about the subtlties of block vs record
caching. If you access sparse/random rows, and rows are small, the record
caching on multiple machines may in fact be more efficicent than block
caching on fewer machines.

That said, the story for pinning ranges of data in memory doesn't seem to
change.

Another interesting difference has to do with scan vs seek performance.
There was one comment about cassandra possibly having better seek
performance than hbase because of some hdfs slowness, which was then rumored
to be in the works to fix. Anyone have any other comments about scan or seek
performance comparisons?

Again, I understand Cassandra is not HBase. However, it's useful to be able
to compare them (and their designs), so people can understand what might
help them choose one over the other.  Thanks again!

Reply via email to