Jaydeep, No, we don't use any light weight transactions.
Mike On Wed, Feb 17, 2016 at 6:44 PM, Jaydeep Chovatia < chovatia.jayd...@gmail.com> wrote: > Are you guys using light weight transactions in your write path? > > On Thu, Feb 11, 2016 at 12:36 AM, Fabrice Facorat < > fabrice.faco...@gmail.com> wrote: > >> Are your commitlog and data on the same disk ? If yes, you should put >> commitlogs on a separate disk which don't have a lot of IO. >> >> Others IO may have great impact impact on your commitlog writing and >> it may even block. >> >> An example of impact IO may have, even for Async writes: >> >> https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic >> >> 2016-02-11 0:31 GMT+01:00 Mike Heffner <m...@librato.com>: >> > Jeff, >> > >> > We have both commitlog and data on a 4TB EBS with 10k IOPS. >> > >> > Mike >> > >> > On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com >> > >> > wrote: >> >> >> >> What disk size are you using? >> >> >> >> >> >> >> >> From: Mike Heffner >> >> Reply-To: "user@cassandra.apache.org" >> >> Date: Wednesday, February 10, 2016 at 2:24 PM >> >> To: "user@cassandra.apache.org" >> >> Cc: Peter Norton >> >> Subject: Re: Debugging write timeouts on Cassandra 2.2.5 >> >> >> >> Paulo, >> >> >> >> Thanks for the suggestion, we ran some tests against CMS and saw the >> same >> >> timeouts. On that note though, we are going to try doubling the >> instance >> >> sizes and testing with double the heap (even though current usage is >> low). >> >> >> >> Mike >> >> >> >> On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta <pauloricard...@gmail.com >> > >> >> wrote: >> >>> >> >>> Are you using the same GC settings as the staging 2.0 cluster? If not, >> >>> could you try using the default GC settings (CMS) and see if that >> changes >> >>> anything? This is just a wild guess, but there were reports before of >> >>> G1-caused instabilities with small heap sizes (< 16GB - see >> CASSANDRA-10403 >> >>> for more context). Please ignore if you already tried reverting back >> to CMS. >> >>> >> >>> 2016-02-10 16:51 GMT-03:00 Mike Heffner <m...@librato.com>: >> >>>> >> >>>> Hi all, >> >>>> >> >>>> We've recently embarked on a project to update our Cassandra >> >>>> infrastructure running on EC2. We are long time users of 2.0.x and >> are >> >>>> testing out a move to version 2.2.5 running on VPC with EBS. Our >> test setup >> >>>> is a 3 node, RF=3 cluster supporting a small write load (mirror of >> our >> >>>> staging load). >> >>>> >> >>>> We are writing at QUORUM and while p95's look good compared to our >> >>>> staging 2.0.x cluster, we are seeing frequent write operations that >> time out >> >>>> at the max write_request_timeout_in_ms (10 seconds). CPU across the >> cluster >> >>>> is < 10% and EBS write load is < 100 IOPS. Cassandra is running with >> the >> >>>> Oracle JDK 8u60 and we're using G1GC and any GC pauses are less than >> 500ms. >> >>>> >> >>>> We run on c4.2xl instances with GP2 EBS attached storage for data and >> >>>> commitlog directories. The nodes are using EC2 enhanced networking >> and have >> >>>> the latest Intel network driver module. We are running on HVM >> instances >> >>>> using Ubuntu 14.04.2. >> >>>> >> >>>> Our schema is 5 tables, all with COMPACT STORAGE. Each table is >> similar >> >>>> to the definition here: >> >>>> https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a >> >>>> >> >>>> This is our cassandra.yaml: >> >>>> >> https://gist.github.com/mheffner/fea80e6e939dd483f94f#file-cassandra-yaml >> >>>> >> >>>> Like I mentioned we use 8u60 with G1GC and have used many of the GC >> >>>> settings in Al Tobey's tuning guide. This is our upstart config with >> JVM and >> >>>> other CPU settings: >> https://gist.github.com/mheffner/dc44613620b25c4fa46d >> >>>> >> >>>> We've used several of the sysctl settings from Al's guide as well: >> >>>> https://gist.github.com/mheffner/ea40d58f58a517028152 >> >>>> >> >>>> Our client application is able to write using either Thrift batches >> >>>> using Asytanax driver or CQL async INSERT's using the Datastax Java >> driver. >> >>>> >> >>>> For testing against Thrift (our legacy infra uses this) we write >> batches >> >>>> of anywhere from 6 to 1500 rows at a time. Our p99 for batch >> execution is >> >>>> around 45ms but our maximum (p100) sits less than 150ms except when >> it >> >>>> periodically spikes to the full 10seconds. >> >>>> >> >>>> Testing the same write path using CQL writes instead demonstrates >> >>>> similar behavior. Low p99s except for periodic full timeouts. We >> enabled >> >>>> tracing for several operations but were unable to get a trace that >> completed >> >>>> successfully -- Cassandra started logging many messages as: >> >>>> >> >>>> INFO [ScheduledTasks:1] - MessagingService.java:946 - _TRACE >> messages >> >>>> were dropped in last 5000 ms: 52499 for internal timeout and 0 for >> cross >> >>>> node timeout >> >>>> >> >>>> And all the traces contained rows with a "null" source_elapsed row: >> >>>> >> https://gist.githubusercontent.com/mheffner/1d68a70449bd6688a010/raw/0327d7d3d94c3a93af02b64212e3b7e7d8f2911b/trace.out >> >>>> >> >>>> >> >>>> We've exhausted as many configuration option permutations that we can >> >>>> think of. This cluster does not appear to be under any significant >> load and >> >>>> latencies seem to largely fall in two bands: low normal or max >> timeout. This >> >>>> seems to imply that something is getting stuck and timing out at the >> max >> >>>> write timeout. >> >>>> >> >>>> Any suggestions on what to look for? We had debug enabled for awhile >> but >> >>>> we didn't see any msg that pointed to something obvious. Happy to >> provide >> >>>> any more information that may help. >> >>>> >> >>>> We are pretty much at the point of sprinkling debug around the code >> to >> >>>> track down what could be blocking. >> >>>> >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Mike >> >>>> >> >>>> -- >> >>>> >> >>>> Mike Heffner <m...@librato.com> >> >>>> Librato, Inc. >> >>>> >> >>> >> >> >> >> >> >> >> >> -- >> >> >> >> Mike Heffner <m...@librato.com> >> >> Librato, Inc. >> >> >> > >> > >> > >> > -- >> > >> > Mike Heffner <m...@librato.com> >> > Librato, Inc. >> > >> >> >> >> -- >> Close the World, Open the Net >> http://www.linux-wizard.net >> > > -- Mike Heffner <m...@librato.com> Librato, Inc.