On 2017-06-15 19:10 (-0700), srinivasarao daruna <sree.srin...@gmail.com> wrote: > Hi, > > Recently one of our spark job had missed cassandra consistency property and > number of concurrent writes property.
Just for the record, you still have a consistency level set, it's just set to whatever your driver/spark defaults to (probably LOCAL_ONE). This probably means it's firing writes faster than you'd expect (no backpressure), which may have contributed to your problems. > > Due to that, some of mutations are failed when we checked tpstats. Also, we > observed readtimeouts are occurring with not only the table that the job > inserts, but also from other tables, for which have always had consistency > level proper. We started repair, but due to the volume of data, repair > might take a day or two to complete. Mean while, wanted to get some inputs. > > As the error planted lot of questions. > 1) Is there a relation between mutation fails to read time outs and overall > cluster performance, if yes, how.? > When the cluster is heavily loaded, you'll see both dropped mutation and read timeouts, yes. It's also true that reads can impact writes, and writes can impact reads - especially since it's all in one shared JVM process, with common garbage collecting. > 2) When i checked the log, i found a warning in debug.log as below. > SELECT * FROM our_table WHERE partition_key = required_value LIMIT 5000: > total time 20353 msec - timeout 20000 msec > > Actual query: > SELECT * FROM our_table WHERE partition_key = required_value > > Even though we are hitting partition key, i do not understand the reason > for such huge read time and timeouts. Likely related to JVM GC pauses. How big is that partition (nodetool cfstats may help here)? Are you seeing a lot of other GC pauses going on (you should have monitoring, or at least glance at the log for 'GCInspector' lines)? > > 3) We are using prepared statements to query the tables from API. How can > we set the fetch size, so that it wont use LIMIT 5000.? > Any thoughts.? > > Driver dependent, but most of them offer this for prepared statements as well. The datastax java driver also offers it globally on the Cluster.builder().withQueryOptions(new QueryOptions().setFetchSize(100)) --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org