Apologies for the typo, Mike On Wed, Sep 20, 2017 at 9:49 AM, Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote:
> Hello Nike, > > were you able to fix the issue? If so what change helped you? > > On Wed, Feb 24, 2016 at 5:36 PM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: > >> Great that you found a specific release that triggers the problem - 2.1.x >> has a huge number of changes. >> >> How many partitions and rows do you have? What's the largest row count >> for a single partition? And all of these CQL tables are COMPACT STORAGE, >> correct? Are you writing a large number of skinny partitions or a smaller >> number of very wide partitions? It wouldn't surprise me if behavior for >> large partitions varies between releases since they can be so >> memory-intensive. >> >> I see this change in 2.1.5 that could possibly introduce some memory >> usage: >> Write partition size estimates into a system table (CASSANDRA-7688) >> >> At this stage it would probably help for you to try to produce a >> reasonably small repro test case that you could file as a Jira. And if you >> could run that repro test case on 3.x to verify that the problem still >> exists, that would be helpful as well. >> >> How long does it take to repro the timeout? >> >> Can you repro the timeout using a single node? >> >> What is the pattern of the timeouts - just random and occasional, or >> heavy and continuous once they start? >> >> Are they occurring uniformly on all three nodes? >> >> If you bounce the cluster and continue testing, do the timeouts commence >> immediately, fairly soon, or only after about as long as they take from a >> clean fresh start? >> >> >> >> -- Jack Krupansky >> >> On Wed, Feb 24, 2016 at 7:04 PM, Mike Heffner <m...@librato.com> wrote: >> >>> Nate, >>> >>> So we have run several install tests, bisecting the 2.1.x release line, >>> and we believe that the regression was introduced in version 2.1.5. This is >>> the first release that clearly hits the timeout for us. >>> >>> It looks like quite a large release, so our next step will likely be >>> bisecting the major commits to see if we can narrow it down: >>> https://github.com/apache/cassandra/blob/3c0a337ebc90b >>> 0d99349d0aa152c92b5b3494d8c/CHANGES.txt. Obviously, any suggestions on >>> potential suspects appreciated. >>> >>> These are the memtable settings we've configured diff from the defaults >>> during our testing: >>> >>> memtable_allocation_type: offheap_objects >>> memtable_flush_writers: 8 >>> >>> >>> Cheers, >>> >>> Mike >>> >>> On Fri, Feb 19, 2016 at 1:46 PM, Nate McCall <n...@thelastpickle.com> >>> wrote: >>> >>>> The biggest change which *might* explain your behavior has to do with >>>> the changes in memtable flushing between 2.0 and 2.1: >>>> https://issues.apache.org/jira/browse/CASSANDRA-5549 >>>> >>>> However, the tpstats you posted shows no dropped mutations which would >>>> make me more certain of this as the cause. >>>> >>>> What values do you have right now for each of these (my recommendations >>>> for each on a c4.2xl with stock cassandra-env.sh are in parenthesis): >>>> >>>> - memtable_flush_writers (2) >>>> - memtable_heap_space_in_mb (2048) >>>> - memtable_offheap_space_in_mb (2048) >>>> - memtable_cleanup_threshold (0.11) >>>> - memtable_allocation_type (offheap_objects) >>>> >>>> The biggest win IMO will be moving to offheap_objects. By default, >>>> everything is on heap. Regardless, spending some time tuning these for your >>>> workload will pay off. >>>> >>>> You may also want to be explicit about >>>> >>>> - native_transport_max_concurrent_connections >>>> - native_transport_max_concurrent_connections_per_ip >>>> >>>> Depending on the driver, these may now be allowing 32k streams per >>>> connection(!) as detailed in v3 of the native protocol: >>>> https://github.com/apache/cassandra/blob/cassandra-2.1/doc/ >>>> native_protocol_v3.spec#L130-L152 >>>> >>>> >>>> >>>> On Fri, Feb 19, 2016 at 8:48 AM, Mike Heffner <m...@librato.com> wrote: >>>> >>>>> Anuj, >>>>> >>>>> So we originally started testing with Java8 + G1, however we were able >>>>> to reproduce the same results with the default CMS settings that ship in >>>>> the cassandra-env.sh from the Deb pkg. We didn't detect any large GC >>>>> pauses >>>>> during the runs. >>>>> >>>>> Query pattern during our testing was 100% writes, batching (via Thrift >>>>> mostly) to 5 tables, between 6-1500 rows per batch. >>>>> >>>>> Mike >>>>> >>>>> On Thu, Feb 18, 2016 at 12:22 PM, Anuj Wadehra <anujw_2...@yahoo.co.in >>>>> > wrote: >>>>> >>>>>> Whats the GC overhead? Can you your share your GC collector and >>>>>> settings ? >>>>>> >>>>>> >>>>>> Whats your query pattern? Do you use secondary indexes, batches, in >>>>>> clause etc? >>>>>> >>>>>> >>>>>> Anuj >>>>>> >>>>>> >>>>>> Sent from Yahoo Mail on Android >>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>>>> >>>>>> On Thu, 18 Feb, 2016 at 8:45 pm, Mike Heffner >>>>>> <m...@librato.com> wrote: >>>>>> Alain, >>>>>> >>>>>> Thanks for the suggestions. >>>>>> >>>>>> Sure, tpstats are here: https://gist.github.com/ >>>>>> mheffner/a979ae1a0304480b052a. Looking at the metrics across the >>>>>> ring, there were no blocked tasks nor dropped messages. >>>>>> >>>>>> Iowait metrics look fine, so it doesn't appear to be blocking on >>>>>> disk. Similarly, there are no long GC pauses. >>>>>> >>>>>> We haven't noticed latency on any particular table higher than others >>>>>> or correlated around the occurrence of a timeout. We have noticed with >>>>>> further testing that running cassandra-stress against the ring, while our >>>>>> workload is writing to the same ring, will incur similar 10 second >>>>>> timeouts. If our workload is not writing to the ring, cassandra stress >>>>>> will >>>>>> run without hitting timeouts. This seems to imply that our workload >>>>>> pattern >>>>>> is causing something to block cluster-wide, since the stress tool writes >>>>>> to >>>>>> a different keyspace then our workload. >>>>>> >>>>>> I mentioned in another reply that we've tracked it to something >>>>>> between 2.0.x and 2.1.x, so we are focusing on narrowing which point >>>>>> release it was introduced in. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Mike >>>>>> >>>>>> On Thu, Feb 18, 2016 at 3:33 AM, Alain RODRIGUEZ <arodr...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Mike, >>>>>>> >>>>>>> What about the output of tpstats ? I imagine you have dropped >>>>>>> messages there. Any blocked threads ? Could you paste this output here ? >>>>>>> >>>>>>> May this be due to some network hiccup to access the disks as they >>>>>>> are EBS ? Can you think of anyway of checking this ? Do you have a lot >>>>>>> of >>>>>>> GC logs, how long are the pauses (use something like: grep -i >>>>>>> 'GCInspector' >>>>>>> /var/log/cassandra/system.log) ? >>>>>>> >>>>>>> Something else you could check are local_writes stats to see if only >>>>>>> one table if affected or this is keyspace / cluster wide. You can use >>>>>>> metrics exposed by cassandra or if you have no dashboards I believe a: >>>>>>> 'nodetool cfstats <myks> | grep -e 'Table:' -e 'Local'' should give you >>>>>>> a >>>>>>> rough idea of local latencies. >>>>>>> >>>>>>> Those are just things I would check, I have not a clue on what is >>>>>>> happening here, hope this will help. >>>>>>> >>>>>>> C*heers, >>>>>>> ----------------- >>>>>>> Alain Rodriguez >>>>>>> France >>>>>>> >>>>>>> The Last Pickle >>>>>>> http://www.thelastpickle.com >>>>>>> >>>>>>> 2016-02-18 5:13 GMT+01:00 Mike Heffner <m...@librato.com>: >>>>>>> >>>>>>>> Jaydeep, >>>>>>>> >>>>>>>> No, we don't use any light weight transactions. >>>>>>>> >>>>>>>> Mike >>>>>>>> >>>>>>>> On Wed, Feb 17, 2016 at 6:44 PM, Jaydeep Chovatia < >>>>>>>> chovatia.jayd...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Are you guys using light weight transactions in your write path? >>>>>>>>> >>>>>>>>> On Thu, Feb 11, 2016 at 12:36 AM, Fabrice Facorat < >>>>>>>>> fabrice.faco...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Are your commitlog and data on the same disk ? If yes, you should >>>>>>>>>> put >>>>>>>>>> commitlogs on a separate disk which don't have a lot of IO. >>>>>>>>>> >>>>>>>>>> Others IO may have great impact impact on your commitlog writing >>>>>>>>>> and >>>>>>>>>> it may even block. >>>>>>>>>> >>>>>>>>>> An example of impact IO may have, even for Async writes: >>>>>>>>>> https://engineering.linkedin.com/blog/2016/02/eliminating-la >>>>>>>>>> rge-jvm-gc-pauses-caused-by-background-io-traffic >>>>>>>>>> >>>>>>>>>> 2016-02-11 0:31 GMT+01:00 Mike Heffner <m...@librato.com>: >>>>>>>>>> > Jeff, >>>>>>>>>> > >>>>>>>>>> > We have both commitlog and data on a 4TB EBS with 10k IOPS. >>>>>>>>>> > >>>>>>>>>> > Mike >>>>>>>>>> > >>>>>>>>>> > On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa < >>>>>>>>>> jeff.ji...@crowdstrike.com> >>>>>>>>>> > wrote: >>>>>>>>>> >> >>>>>>>>>> >> What disk size are you using? >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> From: Mike Heffner >>>>>>>>>> >> Reply-To: "user@cassandra.apache.org" >>>>>>>>>> >> Date: Wednesday, February 10, 2016 at 2:24 PM >>>>>>>>>> >> To: "user@cassandra.apache.org" >>>>>>>>>> >> Cc: Peter Norton >>>>>>>>>> >> Subject: Re: Debugging write timeouts on Cassandra 2.2.5 >>>>>>>>>> >> >>>>>>>>>> >> Paulo, >>>>>>>>>> >> >>>>>>>>>> >> Thanks for the suggestion, we ran some tests against CMS and >>>>>>>>>> saw the same >>>>>>>>>> >> timeouts. On that note though, we are going to try doubling >>>>>>>>>> the instance >>>>>>>>>> >> sizes and testing with double the heap (even though current >>>>>>>>>> usage is low). >>>>>>>>>> >> >>>>>>>>>> >> Mike >>>>>>>>>> >> >>>>>>>>>> >> On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta < >>>>>>>>>> pauloricard...@gmail.com> >>>>>>>>>> >> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>> Are you using the same GC settings as the staging 2.0 >>>>>>>>>> cluster? If not, >>>>>>>>>> >>> could you try using the default GC settings (CMS) and see if >>>>>>>>>> that changes >>>>>>>>>> >>> anything? This is just a wild guess, but there were reports >>>>>>>>>> before of >>>>>>>>>> >>> G1-caused instabilities with small heap sizes (< 16GB - see >>>>>>>>>> CASSANDRA-10403 >>>>>>>>>> >>> for more context). Please ignore if you already tried >>>>>>>>>> reverting back to CMS. >>>>>>>>>> >>> >>>>>>>>>> >>> 2016-02-10 16:51 GMT-03:00 Mike Heffner <m...@librato.com>: >>>>>>>>>> >>>> >>>>>>>>>> >>>> Hi all, >>>>>>>>>> >>>> >>>>>>>>>> >>>> We've recently embarked on a project to update our Cassandra >>>>>>>>>> >>>> infrastructure running on EC2. We are long time users of >>>>>>>>>> 2.0.x and are >>>>>>>>>> >>>> testing out a move to version 2.2.5 running on VPC with EBS. >>>>>>>>>> Our test setup >>>>>>>>>> >>>> is a 3 node, RF=3 cluster supporting a small write load >>>>>>>>>> (mirror of our >>>>>>>>>> >>>> staging load). >>>>>>>>>> >>>> >>>>>>>>>> >>>> We are writing at QUORUM and while p95's look good compared >>>>>>>>>> to our >>>>>>>>>> >>>> staging 2.0.x cluster, we are seeing frequent write >>>>>>>>>> operations that time out >>>>>>>>>> >>>> at the max write_request_timeout_in_ms (10 seconds). CPU >>>>>>>>>> across the cluster >>>>>>>>>> >>>> is < 10% and EBS write load is < 100 IOPS. Cassandra is >>>>>>>>>> running with the >>>>>>>>>> >>>> Oracle JDK 8u60 and we're using G1GC and any GC pauses are >>>>>>>>>> less than 500ms. >>>>>>>>>> >>>> >>>>>>>>>> >>>> We run on c4.2xl instances with GP2 EBS attached storage for >>>>>>>>>> data and >>>>>>>>>> >>>> commitlog directories. The nodes are using EC2 enhanced >>>>>>>>>> networking and have >>>>>>>>>> >>>> the latest Intel network driver module. We are running on >>>>>>>>>> HVM instances >>>>>>>>>> >>>> using Ubuntu 14.04.2. >>>>>>>>>> >>>> >>>>>>>>>> >>>> Our schema is 5 tables, all with COMPACT STORAGE. Each table >>>>>>>>>> is similar >>>>>>>>>> >>>> to the definition here: >>>>>>>>>> >>>> https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a >>>>>>>>>> >>>> >>>>>>>>>> >>>> This is our cassandra.yaml: >>>>>>>>>> >>>> https://gist.github.com/mheffner/fea80e6e939dd483f94f#file- >>>>>>>>>> cassandra-yaml >>>>>>>>>> >>>> >>>>>>>>>> >>>> Like I mentioned we use 8u60 with G1GC and have used many of >>>>>>>>>> the GC >>>>>>>>>> >>>> settings in Al Tobey's tuning guide. This is our upstart >>>>>>>>>> config with JVM and >>>>>>>>>> >>>> other CPU settings: https://gist.github.com/mheffn >>>>>>>>>> er/dc44613620b25c4fa46d >>>>>>>>>> >>>> >>>>>>>>>> >>>> We've used several of the sysctl settings from Al's guide as >>>>>>>>>> well: >>>>>>>>>> >>>> https://gist.github.com/mheffner/ea40d58f58a517028152 >>>>>>>>>> >>>> >>>>>>>>>> >>>> Our client application is able to write using either Thrift >>>>>>>>>> batches >>>>>>>>>> >>>> using Asytanax driver or CQL async INSERT's using the >>>>>>>>>> Datastax Java driver. >>>>>>>>>> >>>> >>>>>>>>>> >>>> For testing against Thrift (our legacy infra uses this) we >>>>>>>>>> write batches >>>>>>>>>> >>>> of anywhere from 6 to 1500 rows at a time. Our p99 for batch >>>>>>>>>> execution is >>>>>>>>>> >>>> around 45ms but our maximum (p100) sits less than 150ms >>>>>>>>>> except when it >>>>>>>>>> >>>> periodically spikes to the full 10seconds. >>>>>>>>>> >>>> >>>>>>>>>> >>>> Testing the same write path using CQL writes instead >>>>>>>>>> demonstrates >>>>>>>>>> >>>> similar behavior. Low p99s except for periodic full >>>>>>>>>> timeouts. We enabled >>>>>>>>>> >>>> tracing for several operations but were unable to get a >>>>>>>>>> trace that completed >>>>>>>>>> >>>> successfully -- Cassandra started logging many messages as: >>>>>>>>>> >>>> >>>>>>>>>> >>>> INFO [ScheduledTasks:1] - MessagingService.java:946 - >>>>>>>>>> _TRACE messages >>>>>>>>>> >>>> were dropped in last 5000 ms: 52499 for internal timeout and >>>>>>>>>> 0 for cross >>>>>>>>>> >>>> node timeout >>>>>>>>>> >>>> >>>>>>>>>> >>>> And all the traces contained rows with a "null" >>>>>>>>>> source_elapsed row: >>>>>>>>>> >>>> https://gist.githubusercontent.com/mheffner/ >>>>>>>>>> 1d68a70449bd6688a010/raw/0327d7d3d94c3a93af02b64212e3b7e7d8f >>>>>>>>>> 2911b/trace.out >>>>>>>>>> >>>> >>>>>>>>>> >>>> >>>>>>>>>> >>>> We've exhausted as many configuration option permutations >>>>>>>>>> that we can >>>>>>>>>> >>>> think of. This cluster does not appear to be under any >>>>>>>>>> significant load and >>>>>>>>>> >>>> latencies seem to largely fall in two bands: low normal or >>>>>>>>>> max timeout. This >>>>>>>>>> >>>> seems to imply that something is getting stuck and timing >>>>>>>>>> out at the max >>>>>>>>>> >>>> write timeout. >>>>>>>>>> >>>> >>>>>>>>>> >>>> Any suggestions on what to look for? We had debug enabled >>>>>>>>>> for awhile but >>>>>>>>>> >>>> we didn't see any msg that pointed to something obvious. >>>>>>>>>> Happy to provide >>>>>>>>>> >>>> any more information that may help. >>>>>>>>>> >>>> >>>>>>>>>> >>>> We are pretty much at the point of sprinkling debug around >>>>>>>>>> the code to >>>>>>>>>> >>>> track down what could be blocking. >>>>>>>>>> >>>> >>>>>>>>>> >>>> >>>>>>>>>> >>>> Thanks, >>>>>>>>>> >>>> >>>>>>>>>> >>>> Mike >>>>>>>>>> >>>> >>>>>>>>>> >>>> -- >>>>>>>>>> >>>> >>>>>>>>>> >>>> Mike Heffner <m...@librato.com> >>>>>>>>>> >>>> Librato, Inc. >>>>>>>>>> >>>> >>>>>>>>>> >>> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> -- >>>>>>>>>> >> >>>>>>>>>> >> Mike Heffner <m...@librato.com> >>>>>>>>>> >> Librato, Inc. >>>>>>>>>> >> >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > -- >>>>>>>>>> > >>>>>>>>>> > Mike Heffner <m...@librato.com> >>>>>>>>>> > Librato, Inc. >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Close the World, Open the Net >>>>>>>>>> http://www.linux-wizard.net >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Mike Heffner <m...@librato.com> >>>>>>>> Librato, Inc. >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Mike Heffner <m...@librato.com> >>>>>> Librato, Inc. >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Mike Heffner <m...@librato.com> >>>>> Librato, Inc. >>>>> >>>>> >>>> >>>> >>>> -- >>>> ----------------- >>>> Nate McCall >>>> Austin, TX >>>> @zznate >>>> >>>> Co-Founder & Sr. Technical Consultant >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com >>>> >>> >>> >>> >>> -- >>> >>> Mike Heffner <m...@librato.com> >>> Librato, Inc. >>> >>> >> >