Anuj, So we originally started testing with Java8 + G1, however we were able to reproduce the same results with the default CMS settings that ship in the cassandra-env.sh from the Deb pkg. We didn't detect any large GC pauses during the runs.
Query pattern during our testing was 100% writes, batching (via Thrift mostly) to 5 tables, between 6-1500 rows per batch. Mike On Thu, Feb 18, 2016 at 12:22 PM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Whats the GC overhead? Can you your share your GC collector and settings ? > > > Whats your query pattern? Do you use secondary indexes, batches, in clause > etc? > > > Anuj > > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Thu, 18 Feb, 2016 at 8:45 pm, Mike Heffner > <m...@librato.com> wrote: > Alain, > > Thanks for the suggestions. > > Sure, tpstats are here: > https://gist.github.com/mheffner/a979ae1a0304480b052a. Looking at the > metrics across the ring, there were no blocked tasks nor dropped messages. > > Iowait metrics look fine, so it doesn't appear to be blocking on disk. > Similarly, there are no long GC pauses. > > We haven't noticed latency on any particular table higher than others or > correlated around the occurrence of a timeout. We have noticed with further > testing that running cassandra-stress against the ring, while our workload > is writing to the same ring, will incur similar 10 second timeouts. If our > workload is not writing to the ring, cassandra stress will run without > hitting timeouts. This seems to imply that our workload pattern is causing > something to block cluster-wide, since the stress tool writes to a > different keyspace then our workload. > > I mentioned in another reply that we've tracked it to something between > 2.0.x and 2.1.x, so we are focusing on narrowing which point release it was > introduced in. > > Cheers, > > Mike > > On Thu, Feb 18, 2016 at 3:33 AM, Alain RODRIGUEZ <arodr...@gmail.com> > wrote: > >> Hi Mike, >> >> What about the output of tpstats ? I imagine you have dropped messages >> there. Any blocked threads ? Could you paste this output here ? >> >> May this be due to some network hiccup to access the disks as they are >> EBS ? Can you think of anyway of checking this ? Do you have a lot of GC >> logs, how long are the pauses (use something like: grep -i 'GCInspector' >> /var/log/cassandra/system.log) ? >> >> Something else you could check are local_writes stats to see if only one >> table if affected or this is keyspace / cluster wide. You can use metrics >> exposed by cassandra or if you have no dashboards I believe a: 'nodetool >> cfstats <myks> | grep -e 'Table:' -e 'Local'' should give you a rough idea >> of local latencies. >> >> Those are just things I would check, I have not a clue on what is >> happening here, hope this will help. >> >> C*heers, >> ----------------- >> Alain Rodriguez >> France >> >> The Last Pickle >> http://www.thelastpickle.com >> >> 2016-02-18 5:13 GMT+01:00 Mike Heffner <m...@librato.com>: >> >>> Jaydeep, >>> >>> No, we don't use any light weight transactions. >>> >>> Mike >>> >>> On Wed, Feb 17, 2016 at 6:44 PM, Jaydeep Chovatia < >>> chovatia.jayd...@gmail.com> wrote: >>> >>>> Are you guys using light weight transactions in your write path? >>>> >>>> On Thu, Feb 11, 2016 at 12:36 AM, Fabrice Facorat < >>>> fabrice.faco...@gmail.com> wrote: >>>> >>>>> Are your commitlog and data on the same disk ? If yes, you should put >>>>> commitlogs on a separate disk which don't have a lot of IO. >>>>> >>>>> Others IO may have great impact impact on your commitlog writing and >>>>> it may even block. >>>>> >>>>> An example of impact IO may have, even for Async writes: >>>>> >>>>> https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic >>>>> >>>>> 2016-02-11 0:31 GMT+01:00 Mike Heffner <m...@librato.com>: >>>>> > Jeff, >>>>> > >>>>> > We have both commitlog and data on a 4TB EBS with 10k IOPS. >>>>> > >>>>> > Mike >>>>> > >>>>> > On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa < >>>>> jeff.ji...@crowdstrike.com> >>>>> > wrote: >>>>> >> >>>>> >> What disk size are you using? >>>>> >> >>>>> >> >>>>> >> >>>>> >> From: Mike Heffner >>>>> >> Reply-To: "user@cassandra.apache.org" >>>>> >> Date: Wednesday, February 10, 2016 at 2:24 PM >>>>> >> To: "user@cassandra.apache.org" >>>>> >> Cc: Peter Norton >>>>> >> Subject: Re: Debugging write timeouts on Cassandra 2.2.5 >>>>> >> >>>>> >> Paulo, >>>>> >> >>>>> >> Thanks for the suggestion, we ran some tests against CMS and saw >>>>> the same >>>>> >> timeouts. On that note though, we are going to try doubling the >>>>> instance >>>>> >> sizes and testing with double the heap (even though current usage >>>>> is low). >>>>> >> >>>>> >> Mike >>>>> >> >>>>> >> On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta < >>>>> pauloricard...@gmail.com> >>>>> >> wrote: >>>>> >>> >>>>> >>> Are you using the same GC settings as the staging 2.0 cluster? If >>>>> not, >>>>> >>> could you try using the default GC settings (CMS) and see if that >>>>> changes >>>>> >>> anything? This is just a wild guess, but there were reports before >>>>> of >>>>> >>> G1-caused instabilities with small heap sizes (< 16GB - see >>>>> CASSANDRA-10403 >>>>> >>> for more context). Please ignore if you already tried reverting >>>>> back to CMS. >>>>> >>> >>>>> >>> 2016-02-10 16:51 GMT-03:00 Mike Heffner <m...@librato.com>: >>>>> >>>> >>>>> >>>> Hi all, >>>>> >>>> >>>>> >>>> We've recently embarked on a project to update our Cassandra >>>>> >>>> infrastructure running on EC2. We are long time users of 2.0.x >>>>> and are >>>>> >>>> testing out a move to version 2.2.5 running on VPC with EBS. Our >>>>> test setup >>>>> >>>> is a 3 node, RF=3 cluster supporting a small write load (mirror >>>>> of our >>>>> >>>> staging load). >>>>> >>>> >>>>> >>>> We are writing at QUORUM and while p95's look good compared to our >>>>> >>>> staging 2.0.x cluster, we are seeing frequent write operations >>>>> that time out >>>>> >>>> at the max write_request_timeout_in_ms (10 seconds). CPU across >>>>> the cluster >>>>> >>>> is < 10% and EBS write load is < 100 IOPS. Cassandra is running >>>>> with the >>>>> >>>> Oracle JDK 8u60 and we're using G1GC and any GC pauses are less >>>>> than 500ms. >>>>> >>>> >>>>> >>>> We run on c4.2xl instances with GP2 EBS attached storage for data >>>>> and >>>>> >>>> commitlog directories. The nodes are using EC2 enhanced >>>>> networking and have >>>>> >>>> the latest Intel network driver module. We are running on HVM >>>>> instances >>>>> >>>> using Ubuntu 14.04.2. >>>>> >>>> >>>>> >>>> Our schema is 5 tables, all with COMPACT STORAGE. Each table is >>>>> similar >>>>> >>>> to the definition here: >>>>> >>>> https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a >>>>> >>>> >>>>> >>>> This is our cassandra.yaml: >>>>> >>>> >>>>> https://gist.github.com/mheffner/fea80e6e939dd483f94f#file-cassandra-yaml >>>>> >>>> >>>>> >>>> Like I mentioned we use 8u60 with G1GC and have used many of the >>>>> GC >>>>> >>>> settings in Al Tobey's tuning guide. This is our upstart config >>>>> with JVM and >>>>> >>>> other CPU settings: >>>>> https://gist.github.com/mheffner/dc44613620b25c4fa46d >>>>> >>>> >>>>> >>>> We've used several of the sysctl settings from Al's guide as well: >>>>> >>>> https://gist.github.com/mheffner/ea40d58f58a517028152 >>>>> >>>> >>>>> >>>> Our client application is able to write using either Thrift >>>>> batches >>>>> >>>> using Asytanax driver or CQL async INSERT's using the Datastax >>>>> Java driver. >>>>> >>>> >>>>> >>>> For testing against Thrift (our legacy infra uses this) we write >>>>> batches >>>>> >>>> of anywhere from 6 to 1500 rows at a time. Our p99 for batch >>>>> execution is >>>>> >>>> around 45ms but our maximum (p100) sits less than 150ms except >>>>> when it >>>>> >>>> periodically spikes to the full 10seconds. >>>>> >>>> >>>>> >>>> Testing the same write path using CQL writes instead demonstrates >>>>> >>>> similar behavior. Low p99s except for periodic full timeouts. We >>>>> enabled >>>>> >>>> tracing for several operations but were unable to get a trace >>>>> that completed >>>>> >>>> successfully -- Cassandra started logging many messages as: >>>>> >>>> >>>>> >>>> INFO [ScheduledTasks:1] - MessagingService.java:946 - _TRACE >>>>> messages >>>>> >>>> were dropped in last 5000 ms: 52499 for internal timeout and 0 >>>>> for cross >>>>> >>>> node timeout >>>>> >>>> >>>>> >>>> And all the traces contained rows with a "null" source_elapsed >>>>> row: >>>>> >>>> >>>>> https://gist.githubusercontent.com/mheffner/1d68a70449bd6688a010/raw/0327d7d3d94c3a93af02b64212e3b7e7d8f2911b/trace.out >>>>> >>>> >>>>> >>>> >>>>> >>>> We've exhausted as many configuration option permutations that we >>>>> can >>>>> >>>> think of. This cluster does not appear to be under any >>>>> significant load and >>>>> >>>> latencies seem to largely fall in two bands: low normal or max >>>>> timeout. This >>>>> >>>> seems to imply that something is getting stuck and timing out at >>>>> the max >>>>> >>>> write timeout. >>>>> >>>> >>>>> >>>> Any suggestions on what to look for? We had debug enabled for >>>>> awhile but >>>>> >>>> we didn't see any msg that pointed to something obvious. Happy to >>>>> provide >>>>> >>>> any more information that may help. >>>>> >>>> >>>>> >>>> We are pretty much at the point of sprinkling debug around the >>>>> code to >>>>> >>>> track down what could be blocking. >>>>> >>>> >>>>> >>>> >>>>> >>>> Thanks, >>>>> >>>> >>>>> >>>> Mike >>>>> >>>> >>>>> >>>> -- >>>>> >>>> >>>>> >>>> Mike Heffner <m...@librato.com> >>>>> >>>> Librato, Inc. >>>>> >>>> >>>>> >>> >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> >>>>> >> Mike Heffner <m...@librato.com> >>>>> >> Librato, Inc. >>>>> >> >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > >>>>> > Mike Heffner <m...@librato.com> >>>>> > Librato, Inc. >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Close the World, Open the Net >>>>> http://www.linux-wizard.net >>>>> >>>> >>>> >>> >>> >>> -- >>> >>> Mike Heffner <m...@librato.com> >>> Librato, Inc. >>> >>> >> > > > -- > > Mike Heffner <m...@librato.com> > Librato, Inc. > > -- Mike Heffner <m...@librato.com> Librato, Inc.