Re: Cassandra read throughput with little/no caching.

aaron morton Sun, 23 Dec 2012 12:18:48 -0800

First, the non helpful advice, I strongly suggest changing the data model so 
you do not have 100MB+ rows. They will make life harder.


>               Write request latency is about 900 microsecs, read request
>         latency
>              is about 4000 microsecs.
> 
>      

4 milliseconds to drag 100 to 300 MB data off a SAN, through your network, into 
C* and out to the client does not sound terrible at first glance. Can you 
benchmark and individual request to get an idea of the throughput? 

I would recommend removing the SAN from the equation, cassandra will run better 
with local disks. It also introduces a single point of failure into a 
distributed system. 

> but it's likely in the Linux disk cache, given the sizing of the 
> node/data/jvm.

Are you sure that the local Linux machine is going to cache files stored on the 
SAN ? 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/12/2012, at 6:56 AM, Yiming Sun <yiming....@gmail.com> wrote:

> James, you could experiment with Row cache, with off-heap JNA cache, and see 
> if it helps.  My own experience with row cache was not good, and the OS cache 
> seemed to be most useful, but in my case, our data space was big, over 10TB.  
> Your sequential access pattern certainly doesn't play well with LRU, but 
> giving the small data space you have, you may be able to fit the data from 
> one column family entirely into the row cache.
> 
> 
> On Fri, Dec 21, 2012 at 12:03 PM, James Masson <james.mas...@opigram.com> 
> wrote:
> 
> 
> On 21/12/12 16:27, Yiming Sun wrote:
> James, using RandomPartitioner, the order of the rows is random, so when
> you request these rows in "Sequential" order (sort by the date?),
> Cassandra is not reading them sequentially.
> 
> Yes, I understand the "next" row to be retrieved in sequence is likely to be 
> on a different node, and the ordering is random. I'm using the word 
> sequential to try to explain that the data being requested is in an order, 
> and not repeated, until the next cycle. The data is not guaranteed to be of a 
> size that is cache-able as a whole.
> 
> 
> 
> The size of the data, 200Mb, 300Mb , and 40Mb, are these the size for
> each column? Or are these the total size of the entire column family?
>   It wasn't too clear to me.  But if these are the total size of the
> column families, you will be able to fit them mostly in memory, so you
> should enable row cache.
> 
> Size of the column family, on a single node. Row caching is off at the moment.
> 
> Are you saying that I should increase the JVM heap to fit some data in the 
> row cache, at the expense of linux disk caching?
> 
> Bear in mind that the data is only going to be re-requested in sequence again 
> - I'm not sure what the value is in the cassandra native caching if rows are 
> not re-requested before being evicted.
> 
> My current key-cache hit-rates are near zero on this workload, hence I'm 
> interested in cassandra's zero-cache performance. Unless I can guarantee to 
> fit the entire data-set in memory, it's difficult to justify using memory on 
> a cassandra cache if LRU and workload means it's not actually a benefit.
> 
> 
> 
> I happen to have done some performance tests of my own on cassandra,
> mostly on the read, and was also only able to get less than 6MB/sec read
> rate out of a cluster of 6 nodes RF2 using a single threaded client.
>   But it makes a huge difference when I changed the client to an
> asynchronous multi-threaded structure.
> 
> 
> Yes, I've been talking to the developers about having a separate thread or 
> two that keeps cassandra busy, keeping Disruptor 
> (http://lmax-exchange.github.com/disruptor/) fed to do the processing work.
> 
> But this all doesn't change the fact that under this zero-cache workload, 
> cassandra seems to be very CPU expensive for throughput.
> 
> thanks
> 
> James M
> 
> 
> 
> 
> On Fri, Dec 21, 2012 at 10:36 AM, James Masson <james.mas...@opigram.com
> <mailto:james.mas...@opigram.com>> wrote:
> 
> 
>     Hi,
> 
>     thanks for the reply
> 
> 
>     On 21/12/12 14:36, Yiming Sun wrote:
> 
>         I have a few questions for you, James,
> 
>         1. how many nodes are in your Cassandra ring?
> 
> 
>     2 or 3 - depending on environment - it doesn't seem to make a
>     difference to throughput very much. What is a 30 minute task on a 2
>     node environment is a 30 minute task on a 3 node environment.
> 
> 
>         2. what is the replication factor?
> 
> 
>     1
> 
>         3. when you say sequentially, what do you mean?  what
>         Partitioner do you
>         use?
> 
> 
>     The data is organised by date - the keys are read sequentially in
>     order, only once.
> 
>     Random partitioner - the data is equally spread across the nodes to
>     avoid hotspots.
> 
> 
>         4. how many columns per row?  how much data per row?  per column?
> 
> 
>     varies - described in the schema.
> 
>     create keyspace mykeyspace
>        with placement_strategy = 'SimpleStrategy'
>        and strategy_options = {replication_factor : 1}
>        and durable_writes = true;
> 
> 
>     create column family entities
>        with column_type = 'Standard'
>        and comparator = 'BytesType'
>        and default_validation_class = 'BytesType'
>        and key_validation_class = 'AsciiType'
>        and read_repair_chance = 0.0
>        and dclocal_read_repair_chance = 0.0
>        and gc_grace = 0
>        and min_compaction_threshold = 4
>        and max_compaction_threshold = 32
>        and replicate_on_write = false
>        and compaction_strategy =
>     'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
> 
>        and caching = 'NONE'
>        and column_metadata = [
>          {column_name : '64656c65746564',
>          validation_class : BytesType,
>          index_name : 'deleted_idx',
>          index_type : 0},
>          {column_name : '6576656e744964',
>          validation_class : TimeUUIDType,
>          index_name : 'eventId_idx',
>          index_type : 0},
>          {column_name : '7061796c6f6164',
>          validation_class : UTF8Type}];
> 
>     2 columns per row here - about 200Mb of data in total
> 
> 
>     create column family events
>        with column_type = 'Standard'
>        and comparator = 'BytesType'
>        and default_validation_class = 'BytesType'
>        and key_validation_class = 'TimeUUIDType'
>        and read_repair_chance = 0.0
>        and dclocal_read_repair_chance = 0.0
>        and gc_grace = 0
>        and min_compaction_threshold = 4
>        and max_compaction_threshold = 32
>        and replicate_on_write = false
>        and compaction_strategy =
>     'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
> 
>        and caching = 'NONE';
> 
>     1 column per row - about 300Mb of data
> 
>     create column family intervals
>        with column_type = 'Standard'
>        and comparator = 'BytesType'
>        and default_validation_class = 'BytesType'
>        and key_validation_class = 'AsciiType'
>        and read_repair_chance = 0.0
>        and dclocal_read_repair_chance = 0.0
>        and gc_grace = 0
>        and min_compaction_threshold = 4
>        and max_compaction_threshold = 32
>        and replicate_on_write = false
>        and compaction_strategy =
>     'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
> 
>        and caching = 'NONE';
> 
>     variable columns per row - about 40Mb of data.
> 
> 
> 
>         5. what client library do you use to access Cassandra?
>           (Hector?).  Is
>         your client code single threaded?
> 
> 
>     Hector - yes, the processing side of the client is single threaded,
>     but is largely waiting for cassandra responses and has plenty of CPU
>     headroom.
> 
> 
>     I guess what I'm most interested in is why the discrepancy in
>     between read/write latency - although I understand the data volume
>     is much larger in reads, even though the request rate is lower.
> 
>     Network usage on a cassandra box barely gets above 20Mbit, including
>     inter-cluster comms. Averages 5mbit client<>cassandra
> 
>     There is near zero disk I/O, and what little there is is served sub
>     1ms. Storage is backed by a very fast SAN, but like I said earlier,
>     the dataset just about fits in the Linux disk cache. 2Gb VM, 512Mb
>     cassandra heap - GCs are nice and quick, no JVM memory problems,
>     used heap oscillates between 280-350Mb.
> 
>     Basically, I'm just puzzled as cassandra doesn't behave as I would
>     expect. Huge CPU use in cassandra for very little throughput. I'm
>     struggling to find anything that's wrong with the environment,
>     there's no bottleneck that I can see.
> 
>     thanks
> 
>     James M
> 
> 
> 
> 
> 
>         On Fri, Dec 21, 2012 at 7:27 AM, James Masson
>         <james.mas...@opigram.com <mailto:james.mas...@opigram.com>
>         <mailto:james.masson@opigram.__com
> 
>         <mailto:james.mas...@opigram.com>>> wrote:
> 
> 
>              Hi list-users,
> 
>              We have an application that has a relatively unusual access
>         pattern
>              in cassandra 1.1.6
> 
>              Essentially we read an entire multi hundred megabyte column
>         family
>              sequentially (little chance of a cassandra cache hit),
>         perform some
>              operations on the data, and write the data back to another
>         column
>              family in the same keyspace.
> 
>              We do about 250 writes/sec and 100 reads/sec during this
>         process.
>              Write request latency is about 900 microsecs, read request
>         latency
>              is about 4000 microsecs.
> 
>              * First Question: Do these numbers make sense?
> 
>              read-request latency seems a little high to me, cassandra
>         hasn't had
>              a chance to cache this data, but it's likely in the Linux disk
>              cache, given the sizing of the node/data/jvm.
> 
>              thanks
> 
>              James M
> 
> 
> 
>

Re: Cassandra read throughput with little/no caching.

Reply via email to