Thanks for all the great answers last week about Cassandra. I have an
additional question about cassandra and columns/supercolumns. I had naively
assumed that columns and super-columns map to an internal row-key (like how
in Bigtable the indexed map is row/column-key/timestamp to data), but some
pe
> My point still applies though. Caching HFIle blocks on a single node
>> vs individual "dataums" on N nodes may not be more efficient. Thus
>> terms like "Slower" and "Less Efficient" could be very misleading.
>>
>
I seem to have missed this the first time around. Next time I correct the
summary I
On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo wrote:
> @Todd. Good catch about caching HFile blocks.
>
> My point still applies though. Caching HFIle blocks on a single node
> vs individual "dataums" on N nodes may not be more efficient. Thus
> terms like "Slower" and "Less Efficient" could be
Seems accurate to me. One small correction - the daemon in HBase that serves
regions is known as a "region server" rather than a region master. The RS is
the equivalent of the tablet server in Bigtable terminology.
-Todd
On Mon, Nov 22, 2010 at 4:50 PM, David Jeske wrote:
> This is my second at
This is my second attempt at a summary of Cassandra vs HBase consistency and
performance for an hbase acceptable workload. I think these tricky subtlties
are hard to understand, yet it's helpful for the community to understand
them. I'm not trying to state my own facts (or opinion) but merely summa
On Mon, Nov 22, 2010 at 5:48 PM, David Jeske wrote:
>
>
> On Mon, Nov 22, 2010 at 2:44 PM, David Jeske wrote:
>>
>> On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo
>> wrote:
>>>
>>> Return messages such as "your data was written to at least 1 node but
>>> not enough to make your write-consisten
On Mon, Nov 22, 2010 at 2:44 PM, David Jeske wrote:
> On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo wrote:
>
>> Return messages such as "your data was written to at least 1 node but
>> not enough to make your write-consistency count". Do not help the
>> situation. As the client that writes the
On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo wrote:
> Return messages such as "your data was written to at least 1 node but
> not enough to make your write-consistency count". Do not help the
> situation. As the client that writes the data would be aware of the
> inconsistency, but the other c
On Mon, Nov 22, 2010 at 5:14 PM, Todd Lipcon wrote:
> On Mon, Nov 22, 2010 at 1:58 PM, David Jeske wrote:
>>
>> On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon wrote:
>>>
>>> Not quite. The replica synchronization code is pretty messy, but
>>> basically it will take the longest replica that may ha
On Mon, Nov 22, 2010 at 1:58 PM, David Jeske wrote:
> On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon wrote:
>
>> Not quite. The replica synchronization code is pretty messy, but basically
>> it will take the longest replica that may have been synced, not a quorum.
>>
>> i.e the guarantee is that
On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon wrote:
> Not quite. The replica synchronization code is pretty messy, but basically
> it will take the longest replica that may have been synced, not a quorum.
>
> i.e the guarantee is that "if you successfully sync() data, it will be
> present after
On Mon, Nov 22, 2010 at 1:26 PM, Edward Capriolo wrote:
> For cassandra all writes must be transmitted to all replicas.
> CASSANDRA-1314 does not change how writes happen. Write operations
> will still effect cache (possibly evicting things if cache is full).
> Reads however will prefer a single n
On Mon, Nov 22, 2010 at 1:26 PM, Edward Capriolo wrote:
> For cassandra all writes must be transmitted to all replicas.
>
I thought that was only true if you set the number of replicas required for
the write to the same as the number of replicas.
Further, we've established in this thread that ev
For cassandra all writes must be transmitted to all replicas.
CASSANDRA-1314 does not change how writes happen. Write operations
will still effect cache (possibly evicting things if cache is full).
Reads however will prefer a single node of it's possible replicas.
This should cause better cache uti
On Mon, Nov 22, 2010 at 12:03 PM, Edward Capriolo wrote:
> What of reads that are not in the cache?
> Cassandra can use memory mapped io for its data and index files. Hbase
> has a very expensive read path for things that are not in cache. HDFS
> random read performance is historically poor.
>
Ye
>
> 2) Cassandra has a less efficient memory footprint data pinned in
> memory (or cached). With 3 replicas on Cassandra, each element of data
> pinned in-memory is kept in memory on 3 servers, wheras in hbase only
> region masters keep the data in memory, so there is only one-copy of
> each data e
On Mon, Nov 22, 2010 at 2:56 PM, Edward Capriolo wrote:
> On Mon, Nov 22, 2010 at 2:52 PM, Todd Lipcon wrote:
>> On Mon, Nov 22, 2010 at 10:01 AM, David Jeske wrote:
>>>
>>> I havn't used either Cassandra or hbase, so please don't take any part of
>>> this message as me attempting to state facts
On Mon, Nov 22, 2010 at 2:52 PM, Todd Lipcon wrote:
> On Mon, Nov 22, 2010 at 10:01 AM, David Jeske wrote:
>>
>> I havn't used either Cassandra or hbase, so please don't take any part of
>> this message as me attempting to state facts about either system. However,
>> I'm very familiar with data-s
On Mon, Nov 22, 2010 at 10:01 AM, David Jeske wrote:
> I havn't used either Cassandra or hbase, so please don't take any part of
> this message as me attempting to state facts about either system. However,
> I'm very familiar with data-storage design details, and I've worked
> extensively optimiz
I already noticed a mistake in my own facts...
On Mon, Nov 22, 2010 at 10:01 AM, David Jeske wrote:
> *4) Cassandra (N3/W3/R1) takes longer to allow data to become writable
> again in the face of a node-failure than HBase/HDFS.* Cassandra must
> repair the keyrange to bring N from 2 to 3 to resu
20 matches
Mail list logo