Yes, our clients didn't specify the port so they are using 9042 by default.
On Thu, Jun 25, 2015 at 9:23 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > Hi Zhiyan, > > 2 - RF 2 will improve overall performance, but not about the result 2.0.* > vs 2.1.*. Same comment about adding 3 nodes. Yet Cassandra is supposed to > be linearly scalable, so... > 3 - I guess this was the first thing to do. You did not answered about > heap size. One of the main differences between 2.0 and 2.1 is memtables can > now be stored off heap. So if you set a big Heap with a high memtable size, > then you will let less space for page caching on 2.1. You should go with > default and modify things incrementally to reach an objective (Latency / > throughput / percentiles /...). > > About thrift vs native protocol, Thrift is becoming deprecated over time. > You should stick with native an I think that Datastax driver allow CQL / > native protocol only, you should be good to go. Basically does your clients > use port 9042 (by default) ? > > C*heers, > > Alain > > > 2015-06-25 17:36 GMT+02:00 Zhiyan Shao <zhiyan.s...@gmail.com>: > >> Thanks Alain, >> >> for 2, We tried CL one but the improvement is small. Will try RF 2 and >> see. Maybe adding 3 more boxes will help too. >> for 3, we changed key cache back to default (100MB) and it helped >> improving the perf but still worse than 2.0.14. We also noticed that hit >> rate grew slower than 2.0.14. >> for 4, we are querying 1 partition key each time. There are 5 rows on >> average for each partition key. >> >> We are using datastax java driver so I guess it is native protocol. We >> will try out 2.1.7 too. >> >> Thanks, >> Zhiyan >> >> On Wed, Jun 24, 2015 at 11:48 PM, Alain RODRIGUEZ <arodr...@gmail.com> >> wrote: >> >>> I am amazed to see that you don't have OOM with this setup... >>> >>> 1 - for performances and given Cassandra replication properties an I/O >>> usage you might want to try with a Raid0. But I imagine this is tradeoff. >>> >>> 2 - A billion is quite a few and any of your nodes takes the full load. >>> You might want to try with RF 2 and CL one if performance is what you are >>> looking for. >>> >>> 3 - Using 50 GB of key cache is something I never saw and can't be good, >>> since afaik, key cache is on heap and you don"t really want a heap bigger >>> than 8 GB ( or 10/12 GB for some cases). Try with default heap size and key >>> cache. >>> >>> 4 - Are you querying the set at once ? You might want to query rows one >>> by one, maybe in a synchronous way to have back pressure. >>> >>> An other question would be: did you use native protocol or rather thrift >>> ? ( http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster) >>> >>> BTW interesting benchmark, but having the right conf is interesting. >>> Also you might want to go to 2.1.7 that mainly fixes a memory leak afaik. >>> >>> C*heers, >>> >>> Alain >>> Le 25 juin 2015 01:23, "Zhiyan Shao" <zhiyan.s...@gmail.com> a écrit : >>> >>>> Hi, >>>> >>>> we recently experimented read performance on both versions and found >>>> read is slower in 2.1.6. Here is our setup: >>>> >>>> 1. Machines: 3 physical hosts. Each node has 24 cores CPU, 256G memory >>>> and 8x600GB SAS disks with raid 1. >>>> 2. Replica is 3 and a billion rows of data is inserted. >>>> 3. Key cache capacity is increased to 50G on each node. >>>> 4. Keep querying the same set of a million partition keys in a loop. >>>> >>>> Result: >>>> For 2.0.14, we can get an average of 6 ms while for 2.1.6, we can only >>>> get 18 ms >>>> >>>> It seems key cache hit rate 0.011 is pretty low even though the same >>>> set of keys were used. Has anybody done similar read performance testing? >>>> Could you share your results? >>>> >>>> Thanks, >>>> Zhiyan >>>> >>> >> >