Re: Cassandra read throughput with little/no caching.

James Masson Mon, 24 Dec 2012 04:08:39 -0800

Hi Aaron,

On 23/12/12 20:18, aaron morton wrote:

First, the non helpful advice, I strongly suggest changing the data
model so you do not have 100MB+ rows. They will make life harder.


I don't think we have 100MB+ rows. Column families, yes - but not rows.

Write request latency is about 900 microsecs, read request

        latency
        is about 4000 microsecs.


4 milliseconds to drag 100 to 300 MB data off a SAN, through your
network, into C* and out to the client does not sound terrible at first
glance. Can you benchmark and individual request to get an idea of the
throughput?

It's large numbers of small requests - 250 writes/sec - about 100reads/sec. I might look at some tcpdumps, to see what it's actually doing...

With a total volume of approx 400Mb, split over 3 nodes, it takes about30mins to run through the complete data-set. There's near zero disk I/O,and disk-wait. It's definitely coming out of the Linux disk cache.

That works out at about 0.2Mb/sec in data crunching terms - and about0.6Mb/sec network I/O.


I would recommend removing the SAN from the equation, cassandra will run
better with local disks. It also introduces a single point of failure
into a distributed system.

Understood about the SPoF, but negated by good SAN fabric design. Ithink a single local disk or two is going to find it hard to competewith a FC attached SAN with Gb of dedicated DRAM cache, and SSD tiering.

This is all on VMware anyway, so there's no option of local disks.

but it's likely in the Linux disk cache, given the sizing of the
node/data/jvm.

Are you sure that the local Linux machine is going to cache files stored
on the SAN ?

Yes, Linux doesn't care ( and isn't aware) at the filesystem level ifthe volume is 'local' or not, everything goes through the same cachingstrategy. Again, because this is VMware, it appears as a 'local' diskanyway.


In short, disk isn't the limiting factor here.

thanks

James M

Re: Cassandra read throughput with little/no caching.

Reply via email to