Re: High performance hardware with lot of data per node - Global learning about configuration

Mike Heffner Thu, 11 Jul 2013 12:50:50 -0700

Aiman,

I believe that is one of the cases we added a check for:


https://github.com/librato/tablesnap/blob/master/tablesnap#L203-L207


Mike


On Thu, Jul 11, 2013 at 1:54 PM, Aiman Parvaiz <ai...@grapheffect.com>wrote:

> Thanks for the info Mike, we ran in to a race condition which was killing
> table snap, I want to share the problem and the solution/ work around and
> may be someone can throw some light on the effects of the solution.
>
> tablesnap was getting killed with this error message:
>
> Failed uploading %s. Aborting.\n%s"
>
> Looking at the code it took me to the following:
>
> def worker(self):
>         bucket = self.get_bucket()
>
>         while True:
>             f = self.fileq.get()
>             keyname = self.build_keyname(f)
>             try:
>                 self.upload_sstable(bucket, keyname, f)
>             except:
>                 self.log.critical("Failed uploading %s. Aborting.\n%s" %
>                              (f, format_exc()))
>                 # Brute force kill self
>                 os.kill(os.getpid(), signal.SIGKILL)
>
>             self.fileq.task_done()
>
>
> It builds the filename and then before it could upload it, the file
> disappears (which is possible), I simply commented out the line which kills
> tablesnap if the file is not found, it fixes the issue we were having but I
> would appreciate if some one has any insights on any ill effects this might
> have on backup or restoration process.
>
> Thanks
>
>
> On Jul 11, 2013, at 7:03 AM, Mike Heffner <m...@librato.com> wrote:
>
> We've also noticed very good read and write latencies with the hi1.4xls
> compared to our previous instance classes. We actually ran a mixed cluster
> of hi1.4xls and m2.4xls to watch side-by-side comparison.
>
> Despite the significant improvement in underlying hardware, we've noticed
> that streaming performance with 1.2.6+vnodes is a lot slower than we would
> expect. Bootstrapping a node into a ring with large storage loads can take
> 6+ hours. We have a JIRA open that describes our current config:
> https://issues.apache.org/jira/browse/CASSANDRA-5726
>
> Aiman: We also use tablesnap for our backups. We're using a slightly
> modified version [1]. We currently backup every sst as soon as they hit
> disk (tablesnap's inotify), but we're considering moving to a periodic
> snapshot approach as the sst churn after going from 24 nodes -> 6 nodes is
> quite high.
>
> Mike
>
>
> [1]: https://github.com/librato/tablesnap
>
>
> On Thu, Jul 11, 2013 at 7:33 AM, Aiman Parvaiz <ai...@grapheffect.com>wrote:
>
>> Hi,
>> We also recently migrated to 3 hi.4xlarge boxes(Raid0 SSD) and the disk
>> IO performance is definitely better than the earlier non SSD servers, we
>> are serving up to 14k reads/s with a latency of 3-3.5 ms/op.
>> I wanted to share our config options and ask about the data back up
>> strategy for Raid0.
>>
>> We are using C* 1.2.6 with
>>
>> key_chache and row_cache of 300MB
>> I have not changed/ modified any other parameter except for going with
>> multithreaded GC. I will be playing around with other factors and update
>> everyone if I find something interesting.
>>
>> Also, just wanted to share backup strategy and see if I can get something
>> useful from how others are taking backup of their raid0. I am using
>> tablesnap to upload SSTables to s3 and I have attached a separate EBS
>> volume to every box and have set up rsync to mirror Cassandra data from
>> Raid0 to EBS. I would really appreciate if you guys can share how you
>> taking backups.
>>
>> Thanks
>>
>>
>> On Jul 9, 2013, at 7:11 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > Using C*1.2.2.
>> >
>> > We recently dropped our 18 m1.xLarge (4CPU, 15GB RAM, 4 Raid-0 Disks)
>> servers to get 3 hi1.4xLarge (16CPU, 60GB RAM, 2 Raid-0 SSD) servers
>> instead, for about the same price.
>> >
>> > We tried it after reading some benchmark published by Netflix.
>> >
>> > It is awesome and I recommend it to anyone who is using more than 18
>> xLarge server or can afford these high cost / high performance EC2
>> instances. SSD gives a very good throughput with an awesome latency.
>> >
>> > Yet, we had about 200 GB data per server and now about 1 TB.
>> >
>> > To alleviate memory pressure inside the heap I had to reduce the index
>> sampling. I changed the index_interval value from 128 to 512, with no
>> visible impact on latency, but a great improvement inside the heap which
>> doesn't complain about any pressure anymore.
>> >
>> > Is there some more tuning I could use, more tricks that could be useful
>> while using big servers, with a lot of data per node and relatively high
>> throughput ?
>> >
>> > SSD are at 20-40 % of their throughput capacity (according to
>> OpsCenter), CPU almost never reach a bigger load than 5 or 6 (with 16 CPU),
>> 15 GB RAM used out of 60GB.
>> >
>> > At this point I have kept my previous configuration, which is almost
>> the default one from the Datastax community AMI. There is a part of it, you
>> can consider that any property that is not in here is configured as default
>> :
>> >
>> > cassandra.yaml
>> >
>> > key_cache_size_in_mb: (empty) - so default - 100MB (hit rate between 88
>> % and 92 %, good enough ?)
>> > row_cache_size_in_mb: 0 (not usable in our use case, a lot of different
>> and random reads)
>> > flush_largest_memtables_at: 0.80
>> > reduce_cache_sizes_at: 0.90
>> >
>> > concurrent_reads: 32 (I am thinking to increase this to 64 or more
>> since I have just a few servers to handle more concurrence)
>> > concurrent_writes: 32 (I am thinking to increase this to 64 or more too)
>> > memtable_total_space_in_mb: 1024 (to avoid having a full heap, shoul I
>> use bigger value, why for ?)
>> >
>> > rpc_server_type: sync (I tried hsha and had the "ERROR 12:02:18,971
>> Read an invalid frame size of 0. Are you using TFramedTransport on the
>> client side?" error). No idea how to fix this, and I use 5 different
>> clients for different purpose  (Hector, Cassie, phpCassa, Astyanax,
>> Helenus)...
>> >
>> > multithreaded_compaction: false (Should I try enabling this since I now
>> use SSD ?)
>> > compaction_throughput_mb_per_sec: 16 (I will definitely up this to 32
>> or even more)
>> >
>> > cross_node_timeout: true
>> > endpoint_snitch: Ec2MultiRegionSnitch
>> >
>> > index_interval: 512
>> >
>> > cassandra-env.sh
>> >
>> > I am not sure about how to tune the heap, so I mainly use defaults
>> >
>> > MAX_HEAP_SIZE="8G"
>> > HEAP_NEWSIZE="400M" (I tried with higher values, and it produced bigger
>> GC times (1600 ms instead of < 200 ms now with 400M)
>> >
>> > -XX:+UseParNewGC
>> > -XX:+UseConcMarkSweepGC
>> > -XX:+CMSParallelRemarkEnabled
>> > -XX:SurvivorRatio=8
>> > -XX:MaxTenuringThreshold=1
>> > -XX:CMSInitiatingOccupancyFraction=70
>> > -XX:+UseCMSInitiatingOccupancyOnly
>> >
>> > Does this configuration seems coherent ? Right now, performance are
>> correct, latency < 5ms almost all the time. What can I do to handle more
>> data per node and keep these performances or get even better once ?
>> >
>> > I know this is a long message but if you have any comment or insight
>> even on part of it, don't hesitate to share it. I guess this kind of
>> comment on configuration is usable by the entire community.
>> >
>> > Alain
>> >
>>
>>
>
>
> --
>
>   Mike Heffner <m...@librato.com>
>   Librato, Inc.
>
>
>


-- 

  Mike Heffner <m...@librato.com>
  Librato, Inc.

Re: High performance hardware with lot of data per node - Global learning about configuration

Reply via email to