On Sat, Sep 18, 2010 at 9:26 AM, Peter Schuller
<peter.schul...@infidyne.com> wrote:
>>  - performance (it should be not as much less than shard of MySQL and
>> scale linearly, we want to have not more that 10K inserts per second
>> of writes, and probably not more than 1K/s reads which will be mostly
>> random)
>>  - ability to store big amounts of data (now it looks that we will
>> have about 50GB of uncompressed data per day)
>
> I'm curious what the writes look like, if you expect to do 10k inserts
> per second with MySQL on large data sets that do not fit in RAM. Are
> they of such a nature that the writes turn mostly sequential even in
> MySQL?
>
> In particular consider that for indexed columns in the MySQL case, if
> you're inserting values randomly with respect to the index order of
> indexed columns, you can expect to take seeks for your writes even if
> your database is logically "append only" in terms of not having
> deletes.
>
> Unless the write activity does happen to be very unusually structured
> such that you don't take any seeking in a MySQL/Tokyo Cabinet
> environment, it seems likely that write performance will be the most
> significant issue for you (and cassandra should shine here), given the
> size of the data set and given that you say you expect 10x the amount
> of writes as you do reads, and that the reads are expected to be
> random.
>
> With regards to reads, I re-read your original post. I'm trying to
> figure out what the most likely cause of the discrepancy is. Even with
> zero cache locality, my reading of the index read path is that, in
> cassandra, you would expect exactly one disk seek into the index file,
> followed by streaming (though this may be turned into multiple seeks
> depending on OS readahead behavior). Then, you would expect a single
> seek in the data file followed by streaming.
>
> With low cache hit ratios, I would not expect a btree index lookup +
> data file seek in mysql/cabinet/postgresql to be significantly faster
> in terms of disk seeks, assuming (1) read-ahead is in effect and
> avoids the streaming turning into multiple seeks, and (2) that the
> data size is large enough that btree index leaves are not cached most
> of the time.
>
> Some random thoughts:
>
> * Is MySQL configured such that most or all of the index will be in
> MySQL's internal key cache? If this is the case, it is not competing
> with the data file and may account for MySQL effectively not needing
> to go down on disk for index lookups, where Cassandra may have to.
> However, if my understanding is correct, this should effectively no
> more than double the amount of seeks required in Cassandra relative to
> MySQL.
>
> * For rows that span multiple sstables, you may see a magnifying
> effects with respect to seeks across multiple sstables. My reading of
> py_stress is that this should not be what is hitting you, assuming you
> populated the database once and then did the reads, since each row key
> seems to be populated once for an insertion run. To ensure this is not
> an issue, you can try confirming that read performance does not
> significantly improve after having triggered a major compaction (and
> waiting for it to complete).
>
> As to why you just saw 200 -> 450 instead of 200 -> 800 when moving to
> 8 nodes and doubling RF, I don't know, given that you say you were
> still fully disk bound.
>
> Going from 450 down to 300 when going from RF 2 to 3 seems very much
> expected though (almost exactly), given that you're increasing the
> reads by 50% (for an underlying read capacity of n seeks per second
> and replication factor f, the ideal case would be to expect that
> the read rate r = n/f; so r = 450, f = 2 => n = 900. So given n = 900
> and f = 3, n = 900/3=300).
>
> Anyways... on to the numbers comparison between cassandra and MySQL:
>
> With MySQL you got 340 reads/second @ 21 gb on a single machine. The
> poorest numbers from cassandra (discounting the expected decrease of
> going from rf 2->3), is 450 / second on 8 nodes with rf2. This is
> roughly equivalent of 450 / second on 4 nodes with rf 1, or 112
> reads/second for a single node.
>
> In other words, in the end the poorest number you got from cassandra,
> compared to the best number you reported for MySQL, shows a factor of
> 3 difference. What is this caused by? I'm not sure. But if the mysql
> key index hypothesis is correct, that might account for  a factor of 2
> difference. If so, the remainder would be 50% faster in the MySQL
> case. Add to this that the 2.5 GB heap given to Cassandra may be large
> relative to MySQL, leaving less room for OS caching.
>
> I'm not trying to make excuses for cassandra, and perhaps the MySQL
> key cache is not even at play here. But it's difficult to offer an
> accurate explanation for what exactly is happening based on the
> information provided, and whether the cause of the performance
> difference would be likely to hold for the real production use case.
> This is not to complain that you are not providing enough information,
> but to make the point that the performance of these things can be
> subtly affected by many things and it is difficult to predict expected
> performance to a high degree of precision - particularly when multiple
> different forms of caching involved in the comparison.
>
> --
> / Peter Schuller
>
Are you using 7.0? Have you experimented with read_repair_chance=0 (or
some other low number)? Even READ at CL.ONE is going to eventually
have to communicate with other nodes to do Digest reads. I feel if you
are going to compare c* to mysql you should set to
read_repair_chance=0. This would make the read only happen on that
machine. Otherwise you have to take into account that READ_REPAIR is
going to cause activity on neighbour nodes.

Reply via email to