> My point still applies though. Caching HFIle blocks on a single node
>> vs individual "dataums" on N nodes may not be more efficient. Thus
>> terms like "Slower" and "Less Efficient" could be very misleading.
>>
>
I seem to have missed this the first time around. Next time I correct the
summary I
On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo wrote:
> @Todd. Good catch about caching HFile blocks.
>
> My point still applies though. Caching HFIle blocks on a single node
> vs individual "dataums" on N nodes may not be more efficient. Thus
> terms like "Slower" and "Less Efficient" could be
Seems accurate to me. One small correction - the daemon in HBase that serves
regions is known as a "region server" rather than a region master. The RS is
the equivalent of the tablet server in Bigtable terminology.
-Todd
On Mon, Nov 22, 2010 at 4:50 PM, David Jeske wrote:
> This is my second at
2010-11-23
ywf2008
This is my second attempt at a summary of Cassandra vs HBase consistency and
performance for an hbase acceptable workload. I think these tricky subtlties
are hard to understand, yet it's helpful for the community to understand
them. I'm not trying to state my own facts (or opinion) but merely summa
On Mon, Nov 22, 2010 at 5:48 PM, David Jeske wrote:
>
>
> On Mon, Nov 22, 2010 at 2:44 PM, David Jeske wrote:
>>
>> On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo
>> wrote:
>>>
>>> Return messages such as "your data was written to at least 1 node but
>>> not enough to make your write-consisten
On Mon, Nov 22, 2010 at 2:44 PM, David Jeske wrote:
> On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo wrote:
>
>> Return messages such as "your data was written to at least 1 node but
>> not enough to make your write-consistency count". Do not help the
>> situation. As the client that writes the
On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo wrote:
> Return messages such as "your data was written to at least 1 node but
> not enough to make your write-consistency count". Do not help the
> situation. As the client that writes the data would be aware of the
> inconsistency, but the other c
On Mon, Nov 22, 2010 at 5:14 PM, Todd Lipcon wrote:
> On Mon, Nov 22, 2010 at 1:58 PM, David Jeske wrote:
>>
>> On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon wrote:
>>>
>>> Not quite. The replica synchronization code is pretty messy, but
>>> basically it will take the longest replica that may ha
On Mon, Nov 22, 2010 at 1:58 PM, David Jeske wrote:
> On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon wrote:
>
>> Not quite. The replica synchronization code is pretty messy, but basically
>> it will take the longest replica that may have been synced, not a quorum.
>>
>> i.e the guarantee is that
On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon wrote:
> Not quite. The replica synchronization code is pretty messy, but basically
> it will take the longest replica that may have been synced, not a quorum.
>
> i.e the guarantee is that "if you successfully sync() data, it will be
> present after
On Mon, Nov 22, 2010 at 1:26 PM, Edward Capriolo wrote:
> For cassandra all writes must be transmitted to all replicas.
> CASSANDRA-1314 does not change how writes happen. Write operations
> will still effect cache (possibly evicting things if cache is full).
> Reads however will prefer a single n
On Mon, Nov 22, 2010 at 1:26 PM, Edward Capriolo wrote:
> For cassandra all writes must be transmitted to all replicas.
>
I thought that was only true if you set the number of replicas required for
the write to the same as the number of replicas.
Further, we've established in this thread that ev
They are memtable_throughput_in_mb, memtable_flush_after_mins, memtable_operations_in_millions. Under 0.7 these are per CF settings, in 0.6 these are cluster wide. To start with try mb one down to something like 64 or 128, ops to 0.5 and mins to 60 . What version are you using ? AaronOn 23 Nov, 20
For cassandra all writes must be transmitted to all replicas.
CASSANDRA-1314 does not change how writes happen. Write operations
will still effect cache (possibly evicting things if cache is full).
Reads however will prefer a single node of it's possible replicas.
This should cause better cache uti
Hi,
Is it the min_compaction_threshold and max_compaction_threshold? Do i
need to lower the memtable setting also?
Thanks,
Trung.
On Mon, Nov 22, 2010 at 12:02 PM, Jonathan Ellis wrote:
> Set your columnfamily thresholds lower.
>
> On Mon, Nov 22, 2010 at 12:45 PM, Trung Tran wrote:
>> Hi,
>>
On Mon, Nov 22, 2010 at 12:03 PM, Edward Capriolo wrote:
> What of reads that are not in the cache?
> Cassandra can use memory mapped io for its data and index files. Hbase
> has a very expensive read path for things that are not in cache. HDFS
> random read performance is historically poor.
>
Ye
>
> 2) Cassandra has a less efficient memory footprint data pinned in
> memory (or cached). With 3 replicas on Cassandra, each element of data
> pinned in-memory is kept in memory on 3 servers, wheras in hbase only
> region masters keep the data in memory, so there is only one-copy of
> each data e
On Mon, Nov 22, 2010 at 2:56 PM, Edward Capriolo wrote:
> On Mon, Nov 22, 2010 at 2:52 PM, Todd Lipcon wrote:
>> On Mon, Nov 22, 2010 at 10:01 AM, David Jeske wrote:
>>>
>>> I havn't used either Cassandra or hbase, so please don't take any part of
>>> this message as me attempting to state facts
Set your columnfamily thresholds lower.
On Mon, Nov 22, 2010 at 12:45 PM, Trung Tran wrote:
> Hi,
>
> I have a test cluster of 3 nodes, 14Gb of mem in each node,
> replication factor = 3. With default -Xms and Xmx, my nodes are set to
> have max-heap-size = 7Gb. After initial load with about 200M
On Mon, Nov 22, 2010 at 2:52 PM, Todd Lipcon wrote:
> On Mon, Nov 22, 2010 at 10:01 AM, David Jeske wrote:
>>
>> I havn't used either Cassandra or hbase, so please don't take any part of
>> this message as me attempting to state facts about either system. However,
>> I'm very familiar with data-s
On Mon, Nov 22, 2010 at 10:01 AM, David Jeske wrote:
> I havn't used either Cassandra or hbase, so please don't take any part of
> this message as me attempting to state facts about either system. However,
> I'm very familiar with data-storage design details, and I've worked
> extensively optimiz
Hi,
Thanks for the guideline. I did not turn up any memory setting, the
nodes are configured with all default settings (except for disk-access
is using nmap). I have 3 nodes with 1 client using hector, 8 writing
threads. There are 3 CF, 1 standard and 2 super.
Thanks,
Trung.
On Mon, Nov 22, 2010
I think you'll need to show us how to reproduce without your custom
LoadFunc, e.g., with normal index scans outside of pig.
On Wed, Nov 17, 2010 at 3:56 PM, Christian Decker
wrote:
> On Tue, Nov 16, 2010 at 6:58 PM, Jonathan Ellis wrote:
>>
>> I'm pretty sure that "reading an index" and "using p
The higher memory usage for the java process may be because of memory mapped file access, take a look at the disk_access_mode in cassandra.yaml WRT going OutOfMemory:- what are your Memtable thresholds in cassandra.yaml ? - how many Column Families do you have? - What are your row and key cache set
Hi,
I have a test cluster of 3 nodes, 14Gb of mem in each node,
replication factor = 3. With default -Xms and Xmx, my nodes are set to
have max-heap-size = 7Gb. After initial load with about 200M rows
(write with hector default consistencylevel = quorum,) my nodes memory
usage are up to 13.5Gb, sh
I already noticed a mistake in my own facts...
On Mon, Nov 22, 2010 at 10:01 AM, David Jeske wrote:
> *4) Cassandra (N3/W3/R1) takes longer to allow data to become writable
> again in the face of a node-failure than HBase/HDFS.* Cassandra must
> repair the keyrange to bring N from 2 to 3 to resu
.
> > - graph-diskT-stat-with-jmx.png: graph of cpu load, LiveSSTableCount
> > and logarithm of MemtableDataSize.
> > - log-gc.20101122-12:41.160M.log.gz: GC log with -XX:+PrintGC
> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> >
> > As you can see from the second g
I havn't used either Cassandra or hbase, so please don't take any part of
this message as me attempting to state facts about either system. However,
I'm very familiar with data-storage design details, and I've worked
extensively optimizing applications running on MySQL, Oracle, berkeledb
(including
aph-diskT-stat-with-jmx.png: graph of cpu load, LiveSSTableCount
> and logarithm of MemtableDataSize.
> - log-gc.20101122-12:41.160M.log.gz: GC log with -XX:+PrintGC
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>
> As you can see from the second graph, logarithm of MemtableDataSize
> and cp
Provided at least one node receives the write, it will eventually be written
to all replicas. A failure to meet the requested ConsistencyLevel is just
that; not a failure to write the data itself. Once the write is received by
a node, it will eventually reach all replicas, there is no roll back.
T
Yes, but the value is supposed to be 11, since the write failed.
On Mon, Nov 22, 2010 at 2:27 PM, André Fiedler wrote:
> Doesn´t sync Cassandra all nodes if the network is up again? I think this
> was one of the reasons, storing a timestamp at every key/value pair?
> So i think the response will
Doesn´t sync Cassandra all nodes if the network is up again? I think this
was one of the reasons, storing a timestamp at every key/value pair?
So i think the response will only temporary be 11. If all nodes have synct
it should be 12? Or isn´t that so?
greetings André
2010/11/22 Samuel Carrière
>Cassandra can work in a consistent way, see some of this discussion and the
>Consistency section here http://wiki.apache.org/cassandra/ArchitectureOverview
>
>If you always read and write with CL.Quorum (or the other way discussed) you
>will have consistency. Even if some of the replicas are tem
If you are working inside the cassandra code base, take a look at
o.a.c.hadoop.ColumnFamilyRecordReader. It reads all the rows in a CF using
tokens. I'm not sure that code cares too much about reading a row twice. AFAIK
using tokens for is considered an internal feature.
WRT the start key / end
Is this from a clean install ? Have you been deleting data?
Could this be your problem ?
http://wiki.apache.org/cassandra/FAQ#i_deleted_what_gives
If not you'll need to provide some more details, which version, what the files
are on disk, what was the data you loaded etc.
Hope that helps
Aar
Cassandra can work in a consistent way, see some of this discussion and the
Consistency section here http://wiki.apache.org/cassandra/ArchitectureOverview
If you always read and write with CL.Quorum (or the other way discussed) you
will have consistency. Even if some of the replicas are temporar
I am not using any client, I am trying to extend Cassandra with a new API
call so that a _node_ will do that on behalf of clients. Thank you for the
answer, but it doesn't answer my question!
Alexander
> Most of the high level clients do this for you.
>
> For example, pycassa and phpcassa both do
It's true that Cassandra has "tunable consistency", but if eventual
consistency is not sufficient for most of your use cases, Cassandra becomes
much less attractive. Am I wrong?
On Sun, Nov 21, 2010 at 7:56 PM, Eric Evans wrote:
> On Sun, 2010-11-21 at 11:32 -0500, Simon Reavely wrote:
> > As
39 matches
Mail list logo