Re: Debugging write timeouts on Cassandra 2.2.5

Mike Heffner Fri, 19 Feb 2016 06:49:15 -0800

Anuj,

So we originally started testing with Java8 + G1, however we were able to
reproduce the same results with the default CMS settings that ship in the
cassandra-env.sh from the Deb pkg. We didn't detect any large GC pauses
during the runs.


Query pattern during our testing was 100% writes, batching (via Thrift
mostly) to 5 tables, between 6-1500 rows per batch.

Mike

On Thu, Feb 18, 2016 at 12:22 PM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

> Whats the GC overhead? Can you your share your GC collector and settings ?
>
>
> Whats your query pattern? Do you use secondary indexes, batches, in clause
> etc?
>
>
> Anuj
>
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Thu, 18 Feb, 2016 at 8:45 pm, Mike Heffner
> <m...@librato.com> wrote:
> Alain,
>
> Thanks for the suggestions.
>
> Sure, tpstats are here:
> https://gist.github.com/mheffner/a979ae1a0304480b052a. Looking at the
> metrics across the ring, there were no blocked tasks nor dropped messages.
>
> Iowait metrics look fine, so it doesn't appear to be blocking on disk.
> Similarly, there are no long GC pauses.
>
> We haven't noticed latency on any particular table higher than others or
> correlated around the occurrence of a timeout. We have noticed with further
> testing that running cassandra-stress against the ring, while our workload
> is writing to the same ring, will incur similar 10 second timeouts. If our
> workload is not writing to the ring, cassandra stress will run without
> hitting timeouts. This seems to imply that our workload pattern is causing
> something to block cluster-wide, since the stress tool writes to a
> different keyspace then our workload.
>
> I mentioned in another reply that we've tracked it to something between
> 2.0.x and 2.1.x, so we are focusing on narrowing which point release it was
> introduced in.
>
> Cheers,
>
> Mike
>
> On Thu, Feb 18, 2016 at 3:33 AM, Alain RODRIGUEZ <arodr...@gmail.com>
> wrote:
>
>> Hi Mike,
>>
>> What about the output of tpstats ? I imagine you have dropped messages
>> there. Any blocked threads ? Could you paste this output here ?
>>
>> May this be due to some network hiccup to access the disks as they are
>> EBS ? Can you think of anyway of checking this ? Do you have a lot of GC
>> logs, how long are the pauses (use something like: grep -i 'GCInspector'
>> /var/log/cassandra/system.log) ?
>>
>> Something else you could check are local_writes stats to see if only one
>> table if affected or this is keyspace / cluster wide. You can use metrics
>> exposed by cassandra or if you have no dashboards I believe a: 'nodetool
>> cfstats <myks> | grep -e 'Table:' -e 'Local'' should give you a rough idea
>> of local latencies.
>>
>> Those are just things I would check, I have not a clue on what is
>> happening here, hope this will help.
>>
>> C*heers,
>> -----------------
>> Alain Rodriguez
>> France
>>
>> The Last Pickle
>> http://www.thelastpickle.com
>>
>> 2016-02-18 5:13 GMT+01:00 Mike Heffner <m...@librato.com>:
>>
>>> Jaydeep,
>>>
>>> No, we don't use any light weight transactions.
>>>
>>> Mike
>>>
>>> On Wed, Feb 17, 2016 at 6:44 PM, Jaydeep Chovatia <
>>> chovatia.jayd...@gmail.com> wrote:
>>>
>>>> Are you guys using light weight transactions in your write path?
>>>>
>>>> On Thu, Feb 11, 2016 at 12:36 AM, Fabrice Facorat <
>>>> fabrice.faco...@gmail.com> wrote:
>>>>
>>>>> Are your commitlog and data on the same disk ? If yes, you should put
>>>>> commitlogs on a separate disk which don't have a lot of IO.
>>>>>
>>>>> Others IO may have great impact impact on your commitlog writing and
>>>>> it may even block.
>>>>>
>>>>> An example of impact IO may have, even for Async writes:
>>>>>
>>>>> https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
>>>>>
>>>>> 2016-02-11 0:31 GMT+01:00 Mike Heffner <m...@librato.com>:
>>>>> > Jeff,
>>>>> >
>>>>> > We have both commitlog and data on a 4TB EBS with 10k IOPS.
>>>>> >
>>>>> > Mike
>>>>> >
>>>>> > On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa <
>>>>> jeff.ji...@crowdstrike.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> What disk size are you using?
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> From: Mike Heffner
>>>>> >> Reply-To: "user@cassandra.apache.org"
>>>>> >> Date: Wednesday, February 10, 2016 at 2:24 PM
>>>>> >> To: "user@cassandra.apache.org"
>>>>> >> Cc: Peter Norton
>>>>> >> Subject: Re: Debugging write timeouts on Cassandra 2.2.5
>>>>> >>
>>>>> >> Paulo,
>>>>> >>
>>>>> >> Thanks for the suggestion, we ran some tests against CMS and saw
>>>>> the same
>>>>> >> timeouts. On that note though, we are going to try doubling the
>>>>> instance
>>>>> >> sizes and testing with double the heap (even though current usage
>>>>> is low).
>>>>> >>
>>>>> >> Mike
>>>>> >>
>>>>> >> On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta <
>>>>> pauloricard...@gmail.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Are you using the same GC settings as the staging 2.0 cluster? If
>>>>> not,
>>>>> >>> could you try using the default GC settings (CMS) and see if that
>>>>> changes
>>>>> >>> anything? This is just a wild guess, but there were reports before
>>>>> of
>>>>> >>> G1-caused instabilities with small heap sizes (< 16GB - see
>>>>> CASSANDRA-10403
>>>>> >>> for more context). Please ignore if you already tried reverting
>>>>> back to CMS.
>>>>> >>>
>>>>> >>> 2016-02-10 16:51 GMT-03:00 Mike Heffner <m...@librato.com>:
>>>>> >>>>
>>>>> >>>> Hi all,
>>>>> >>>>
>>>>> >>>> We've recently embarked on a project to update our Cassandra
>>>>> >>>> infrastructure running on EC2. We are long time users of 2.0.x
>>>>> and are
>>>>> >>>> testing out a move to version 2.2.5 running on VPC with EBS. Our
>>>>> test setup
>>>>> >>>> is a 3 node, RF=3 cluster supporting a small write load (mirror
>>>>> of our
>>>>> >>>> staging load).
>>>>> >>>>
>>>>> >>>> We are writing at QUORUM and while p95's look good compared to our
>>>>> >>>> staging 2.0.x cluster, we are seeing frequent write operations
>>>>> that time out
>>>>> >>>> at the max write_request_timeout_in_ms (10 seconds). CPU across
>>>>> the cluster
>>>>> >>>> is < 10% and EBS write load is < 100 IOPS. Cassandra is running
>>>>> with the
>>>>> >>>> Oracle JDK 8u60 and we're using G1GC and any GC pauses are less
>>>>> than 500ms.
>>>>> >>>>
>>>>> >>>> We run on c4.2xl instances with GP2 EBS attached storage for data
>>>>> and
>>>>> >>>> commitlog directories. The nodes are using EC2 enhanced
>>>>> networking and have
>>>>> >>>> the latest Intel network driver module. We are running on HVM
>>>>> instances
>>>>> >>>> using Ubuntu 14.04.2.
>>>>> >>>>
>>>>> >>>> Our schema is 5 tables, all with COMPACT STORAGE. Each table is
>>>>> similar
>>>>> >>>> to the definition here:
>>>>> >>>> https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a
>>>>> >>>>
>>>>> >>>> This is our cassandra.yaml:
>>>>> >>>>
>>>>> https://gist.github.com/mheffner/fea80e6e939dd483f94f#file-cassandra-yaml
>>>>> >>>>
>>>>> >>>> Like I mentioned we use 8u60 with G1GC and have used many of the
>>>>> GC
>>>>> >>>> settings in Al Tobey's tuning guide. This is our upstart config
>>>>> with JVM and
>>>>> >>>> other CPU settings:
>>>>> https://gist.github.com/mheffner/dc44613620b25c4fa46d
>>>>> >>>>
>>>>> >>>> We've used several of the sysctl settings from Al's guide as well:
>>>>> >>>> https://gist.github.com/mheffner/ea40d58f58a517028152
>>>>> >>>>
>>>>> >>>> Our client application is able to write using either Thrift
>>>>> batches
>>>>> >>>> using Asytanax driver or CQL async INSERT's using the Datastax
>>>>> Java driver.
>>>>> >>>>
>>>>> >>>> For testing against Thrift (our legacy infra uses this) we write
>>>>> batches
>>>>> >>>> of anywhere from 6 to 1500 rows at a time. Our p99 for batch
>>>>> execution is
>>>>> >>>> around 45ms but our maximum (p100) sits less than 150ms except
>>>>> when it
>>>>> >>>> periodically spikes to the full 10seconds.
>>>>> >>>>
>>>>> >>>> Testing the same write path using CQL writes instead demonstrates
>>>>> >>>> similar behavior. Low p99s except for periodic full timeouts. We
>>>>> enabled
>>>>> >>>> tracing for several operations but were unable to get a trace
>>>>> that completed
>>>>> >>>> successfully -- Cassandra started logging many messages as:
>>>>> >>>>
>>>>> >>>> INFO  [ScheduledTasks:1] - MessagingService.java:946 - _TRACE
>>>>> messages
>>>>> >>>> were dropped in last 5000 ms: 52499 for internal timeout and 0
>>>>> for cross
>>>>> >>>> node timeout
>>>>> >>>>
>>>>> >>>> And all the traces contained rows with a "null" source_elapsed
>>>>> row:
>>>>> >>>>
>>>>> https://gist.githubusercontent.com/mheffner/1d68a70449bd6688a010/raw/0327d7d3d94c3a93af02b64212e3b7e7d8f2911b/trace.out
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> We've exhausted as many configuration option permutations that we
>>>>> can
>>>>> >>>> think of. This cluster does not appear to be under any
>>>>> significant load and
>>>>> >>>> latencies seem to largely fall in two bands: low normal or max
>>>>> timeout. This
>>>>> >>>> seems to imply that something is getting stuck and timing out at
>>>>> the max
>>>>> >>>> write timeout.
>>>>> >>>>
>>>>> >>>> Any suggestions on what to look for? We had debug enabled for
>>>>> awhile but
>>>>> >>>> we didn't see any msg that pointed to something obvious. Happy to
>>>>> provide
>>>>> >>>> any more information that may help.
>>>>> >>>>
>>>>> >>>> We are pretty much at the point of sprinkling debug around the
>>>>> code to
>>>>> >>>> track down what could be blocking.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> Thanks,
>>>>> >>>>
>>>>> >>>> Mike
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>>
>>>>> >>>>   Mike Heffner <m...@librato.com>
>>>>> >>>>   Librato, Inc.
>>>>> >>>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >>
>>>>> >>   Mike Heffner <m...@librato.com>
>>>>> >>   Librato, Inc.
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> >
>>>>> >   Mike Heffner <m...@librato.com>
>>>>> >   Librato, Inc.
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Close the World, Open the Net
>>>>> http://www.linux-wizard.net
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>   Mike Heffner <m...@librato.com>
>>>   Librato, Inc.
>>>
>>>
>>
>
>
> --
>
>   Mike Heffner <m...@librato.com>
>   Librato, Inc.
>
>


-- 

  Mike Heffner <m...@librato.com>
  Librato, Inc.

Re: Debugging write timeouts on Cassandra 2.2.5

Reply via email to