Re: High performance hardware with lot of data per node - Global learning about configuration

Aiman Parvaiz Thu, 11 Jul 2013 10:55:54 -0700

Thanks for the info Mike, we ran in to a race condition which was killing table 
snap, I want to share the problem and the solution/ work around and may be 
someone can throw some light on the effects of the solution.


tablesnap was getting killed with this error message:

Failed uploading %s. Aborting.\n%s" 

Looking at the code it took me to the following:

def worker(self):
        bucket = self.get_bucket()

        while True:
            f = self.fileq.get()
            keyname = self.build_keyname(f)
            try:
                self.upload_sstable(bucket, keyname, f)
            except:
                self.log.critical("Failed uploading %s. Aborting.\n%s" %
                             (f, format_exc()))
                # Brute force kill self
                os.kill(os.getpid(), signal.SIGKILL)

            self.fileq.task_done()

It builds the filename and then before it could upload it, the file disappears 
(which is possible), I simply commented out the line which kills tablesnap if 
the file is not found, it fixes the issue we were having but I would appreciate 
if some one has any insights on any ill effects this might have on backup or 
restoration process.

Thanks


On Jul 11, 2013, at 7:03 AM, Mike Heffner <m...@librato.com> wrote:

> We've also noticed very good read and write latencies with the hi1.4xls 
> compared to our previous instance classes. We actually ran a mixed cluster of 
> hi1.4xls and m2.4xls to watch side-by-side comparison.
> 
> Despite the significant improvement in underlying hardware, we've noticed 
> that streaming performance with 1.2.6+vnodes is a lot slower than we would 
> expect. Bootstrapping a node into a ring with large storage loads can take 6+ 
> hours. We have a JIRA open that describes our current config: 
> https://issues.apache.org/jira/browse/CASSANDRA-5726
> 
> Aiman: We also use tablesnap for our backups. We're using a slightly modified 
> version [1]. We currently backup every sst as soon as they hit disk 
> (tablesnap's inotify), but we're considering moving to a periodic snapshot 
> approach as the sst churn after going from 24 nodes -> 6 nodes is quite high.
> 
> Mike
> 
> 
> [1]: https://github.com/librato/tablesnap
> 
> 
> On Thu, Jul 11, 2013 at 7:33 AM, Aiman Parvaiz <ai...@grapheffect.com> wrote:
> Hi,
> We also recently migrated to 3 hi.4xlarge boxes(Raid0 SSD) and the disk IO 
> performance is definitely better than the earlier non SSD servers, we are 
> serving up to 14k reads/s with a latency of 3-3.5 ms/op.
> I wanted to share our config options and ask about the data back up strategy 
> for Raid0.
> 
> We are using C* 1.2.6 with
> 
> key_chache and row_cache of 300MB
> I have not changed/ modified any other parameter except for going with 
> multithreaded GC. I will be playing around with other factors and update 
> everyone if I find something interesting.
> 
> Also, just wanted to share backup strategy and see if I can get something 
> useful from how others are taking backup of their raid0. I am using tablesnap 
> to upload SSTables to s3 and I have attached a separate EBS volume to every 
> box and have set up rsync to mirror Cassandra data from Raid0 to EBS. I would 
> really appreciate if you guys can share how you taking backups.
> 
> Thanks
> 
> 
> On Jul 9, 2013, at 7:11 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
> 
> > Hi,
> >
> > Using C*1.2.2.
> >
> > We recently dropped our 18 m1.xLarge (4CPU, 15GB RAM, 4 Raid-0 Disks) 
> > servers to get 3 hi1.4xLarge (16CPU, 60GB RAM, 2 Raid-0 SSD) servers 
> > instead, for about the same price.
> >
> > We tried it after reading some benchmark published by Netflix.
> >
> > It is awesome and I recommend it to anyone who is using more than 18 xLarge 
> > server or can afford these high cost / high performance EC2 instances. SSD 
> > gives a very good throughput with an awesome latency.
> >
> > Yet, we had about 200 GB data per server and now about 1 TB.
> >
> > To alleviate memory pressure inside the heap I had to reduce the index 
> > sampling. I changed the index_interval value from 128 to 512, with no 
> > visible impact on latency, but a great improvement inside the heap which 
> > doesn't complain about any pressure anymore.
> >
> > Is there some more tuning I could use, more tricks that could be useful 
> > while using big servers, with a lot of data per node and relatively high 
> > throughput ?
> >
> > SSD are at 20-40 % of their throughput capacity (according to OpsCenter), 
> > CPU almost never reach a bigger load than 5 or 6 (with 16 CPU), 15 GB RAM 
> > used out of 60GB.
> >
> > At this point I have kept my previous configuration, which is almost the 
> > default one from the Datastax community AMI. There is a part of it, you can 
> > consider that any property that is not in here is configured as default :
> >
> > cassandra.yaml
> >
> > key_cache_size_in_mb: (empty) - so default - 100MB (hit rate between 88 % 
> > and 92 %, good enough ?)
> > row_cache_size_in_mb: 0 (not usable in our use case, a lot of different and 
> > random reads)
> > flush_largest_memtables_at: 0.80
> > reduce_cache_sizes_at: 0.90
> >
> > concurrent_reads: 32 (I am thinking to increase this to 64 or more since I 
> > have just a few servers to handle more concurrence)
> > concurrent_writes: 32 (I am thinking to increase this to 64 or more too)
> > memtable_total_space_in_mb: 1024 (to avoid having a full heap, shoul I use 
> > bigger value, why for ?)
> >
> > rpc_server_type: sync (I tried hsha and had the "ERROR 12:02:18,971 Read an 
> > invalid frame size of 0. Are you using TFramedTransport on the client 
> > side?" error). No idea how to fix this, and I use 5 different clients for 
> > different purpose  (Hector, Cassie, phpCassa, Astyanax, Helenus)...
> >
> > multithreaded_compaction: false (Should I try enabling this since I now use 
> > SSD ?)
> > compaction_throughput_mb_per_sec: 16 (I will definitely up this to 32 or 
> > even more)
> >
> > cross_node_timeout: true
> > endpoint_snitch: Ec2MultiRegionSnitch
> >
> > index_interval: 512
> >
> > cassandra-env.sh
> >
> > I am not sure about how to tune the heap, so I mainly use defaults
> >
> > MAX_HEAP_SIZE="8G"
> > HEAP_NEWSIZE="400M" (I tried with higher values, and it produced bigger GC 
> > times (1600 ms instead of < 200 ms now with 400M)
> >
> > -XX:+UseParNewGC
> > -XX:+UseConcMarkSweepGC
> > -XX:+CMSParallelRemarkEnabled
> > -XX:SurvivorRatio=8
> > -XX:MaxTenuringThreshold=1
> > -XX:CMSInitiatingOccupancyFraction=70
> > -XX:+UseCMSInitiatingOccupancyOnly
> >
> > Does this configuration seems coherent ? Right now, performance are 
> > correct, latency < 5ms almost all the time. What can I do to handle more 
> > data per node and keep these performances or get even better once ?
> >
> > I know this is a long message but if you have any comment or insight even 
> > on part of it, don't hesitate to share it. I guess this kind of comment on 
> > configuration is usable by the entire community.
> >
> > Alain
> >
> 
> 
> 
> 
> -- 
> 
>   Mike Heffner <m...@librato.com>
>   Librato, Inc.
>

Re: High performance hardware with lot of data per node - Global learning about configuration

Reply via email to