Re: Read is slower in 2.1.6 than 2.0.14?

Alain RODRIGUEZ Thu, 25 Jun 2015 10:09:26 -0700

I think that your benchmark will soon be relevant :).

Do not hesitate to share your exact use case (configurations, size & number
of request, results - latencies / throughput / errors if any)


The only benchmark I have seen so far is the one made by datastax I already
shared to you (
http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster). Your
results are quite different though. Having a third party like you doing
this, plus with a non RC 2.1 and the last 2.0 might be of interest to many
people imho.

C*heers

Alain

2015-06-25 18:41 GMT+02:00 Zhiyan Shao <zhiyan.s...@gmail.com>:

> Yes, our clients didn't specify the port so they are using 9042 by
> default.
>
> On Thu, Jun 25, 2015 at 9:23 AM, Alain RODRIGUEZ <arodr...@gmail.com>
> wrote:
>
>> Hi Zhiyan,
>>
>> 2 - RF 2 will improve overall performance, but not about the result 2.0.*
>> vs 2.1.*. Same comment about adding 3 nodes. Yet Cassandra is supposed to
>> be linearly scalable, so...
>> 3 - I guess this was the first thing to do. You did not answered about
>> heap size. One of the main differences between 2.0 and 2.1 is memtables can
>> now be stored off heap. So if you set a big Heap with a high memtable size,
>> then you will let less space for page caching on 2.1. You should go with
>> default and modify things incrementally to reach an objective (Latency /
>> throughput / percentiles /...).
>>
>> About thrift vs native protocol, Thrift is becoming deprecated over time.
>> You should stick with native an I think that Datastax driver allow CQL /
>> native protocol only, you should be good to go. Basically does your clients
>> use port 9042 (by default) ?
>>
>> C*heers,
>>
>> Alain
>>
>>
>> 2015-06-25 17:36 GMT+02:00 Zhiyan Shao <zhiyan.s...@gmail.com>:
>>
>>> Thanks Alain,
>>>
>>> for 2, We tried CL one but the improvement is small. Will try RF 2 and
>>> see. Maybe adding 3 more boxes will help too.
>>> for 3,  we changed key cache back to default (100MB) and it helped
>>> improving the perf but still worse than 2.0.14. We also noticed that hit
>>> rate grew slower than 2.0.14.
>>> for 4, we are querying 1 partition key each time. There are 5 rows on
>>> average for each partition key.
>>>
>>> We are using datastax java driver so I guess it is native protocol. We
>>> will try out 2.1.7 too.
>>>
>>> Thanks,
>>> Zhiyan
>>>
>>> On Wed, Jun 24, 2015 at 11:48 PM, Alain RODRIGUEZ <arodr...@gmail.com>
>>> wrote:
>>>
>>>> I am amazed to see that you don't have OOM with this setup...
>>>>
>>>> 1 - for performances and given Cassandra replication properties an I/O
>>>> usage you might want to try with a Raid0. But I imagine this is tradeoff.
>>>>
>>>> 2 - A billion is quite a few and any of your nodes takes the full load.
>>>> You might want to try with RF 2 and CL one if performance is what you are
>>>> looking for.
>>>>
>>>> 3 - Using 50 GB of key cache is something I never saw and can't be
>>>> good, since afaik, key cache is on heap and you don"t really want a heap
>>>> bigger than 8 GB ( or 10/12 GB for some cases). Try with default heap size
>>>> and key cache.
>>>>
>>>> 4 - Are you querying the set at once ? You might want to query rows one
>>>> by one, maybe in a synchronous way to have back pressure.
>>>>
>>>> An other question would be: did you use native protocol or rather
>>>> thrift ? (
>>>> http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster)
>>>>
>>>> BTW interesting benchmark, but having the right conf is interesting.
>>>> Also you might want to go to 2.1.7 that mainly fixes a memory leak afaik.
>>>>
>>>> C*heers,
>>>>
>>>> Alain
>>>> Le 25 juin 2015 01:23, "Zhiyan Shao" <zhiyan.s...@gmail.com> a écrit :
>>>>
>>>>> Hi,
>>>>>
>>>>> we recently experimented read performance on both versions and found
>>>>> read is slower in 2.1.6. Here is our setup:
>>>>>
>>>>> 1. Machines: 3 physical hosts. Each node has 24 cores CPU, 256G memory
>>>>> and 8x600GB SAS disks with raid 1.
>>>>> 2. Replica is 3 and a billion rows of data is inserted.
>>>>> 3. Key cache capacity is increased to 50G on each node.
>>>>> 4. Keep querying the same set of a million partition keys in a loop.
>>>>>
>>>>> Result:
>>>>> For 2.0.14, we can get an average of 6 ms while for 2.1.6, we can only
>>>>> get 18 ms
>>>>>
>>>>> It seems key cache hit rate 0.011 is pretty low even though the same
>>>>> set of keys were used. Has anybody done similar read performance testing?
>>>>> Could you share your results?
>>>>>
>>>>> Thanks,
>>>>> Zhiyan
>>>>>
>>>>
>>>
>>
>

Re: Read is slower in 2.1.6 than 2.0.14?

Reply via email to