Re: SSD vs. HDD

Alaa Zubaidi Wed, 03 Nov 2010 16:11:22 -0700

Thanks for the reply.
I am having time out errors while reading.
I have 5 CFs but two CFs with high write/read.

The data is organized in time series rows, in CF1 the new rows are readevery 10 seconds and then the whole rows are deleted, While in CF2 therows are read in different time range slices and eventually deleted maybe after few hours.


Thanks

On 11/3/2010 1:58 PM, Tyler Hobbs wrote:

SSD will not generally improve your write performance very much, but they
can significantly improve read performance.

You do *not* want to waste an SSD on the commitlog drive, as even a slow HDD
can write sequentially very quickly.  For the data drive, they might make
sense.

As Jonathan talks about, it has a lot to do with your access patterns.  If
you either: (1) delete parts of rows (2) update parts of rows, or (3) insert
new columns into existing rows frequently, you'll end up with rows spread
across several SSTables (which are on disk).  This means that each read may
require several seeks, which are very slow for HDDs, but are very quick for
SSDs.

Of course, the randomness of what rows you access is also important, but
Jonathan did a good job of covering that.  Don't forget about the effects of
caching here, too.

The only way to tell if it is cost-effective is to test your particular
access patterns (using a configured stress.py test or, preferably, your
actual application).

- Tyler

On Wed, Nov 3, 2010 at 3:44 PM, Jonathan Shook<jsh...@gmail.com>  wrote:

SSDs are not reliable after a (relatively-low compared to spinning
disk) number of writes.
They may significantly boost performance if used on the "journal"
storage, but will suffer short lifetimes for highly-random write
patterns.

In general, plan to replace them frequently. Whether they are worth
it, given the performance improvement over the cost of replacement x
hardware x logistics is generally a calculus problem. It's difficult
to make a generic rationale for or against them.

You might be better off in general by throwing more memory at your
servers, and isolating your random access from your journaled data.
Is there any pattern to your reads and writes/deletes? If it is fully
random across your keys, then you have the worst-case scenario.
Sometimes you can impose access patterns or structural patterns in
your app which make caching more effective.

Good questions to ask about your data access:
Is there a "user session" which shows an access pattern to proximal data?
Are there sets of access which always happen close together?
Are there keys or maps which add extra indirection?

I'm not familiar with your situation. I was just providing some general
ideas..

Jonathan Shook

On Wed, Nov 3, 2010 at 2:32 PM, Alaa Zubaidi<alaa.zuba...@pdf.com>  wrote:

Hi,
we have a continuous high throughput writes, read and delete, and we are
trying to find the best hardware.
Is using SSD for Cassandra improves performance? Did any one compare SSD

vs.

HDD? and any recommendations on SSDs?

Thanks,
Alaa


--
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 700
San Jose, CA 95110  USA
Tel: 408-283-5639 (or 408-280-7900 x5639)
fax: 408-938-6479
email: alaa.zuba...@pdf.com

Re: SSD vs. HDD

Reply via email to