BTW a few other details, sorry for omitting these:

   - We are using version 2.0.4 of the Java driver
   - We are running against Cassandra 2.0.9
   - I tried messing around with the page size (even reducing it down to a
   single record) and that didn't seem to help (in the cases where I was
   observing the timeout)

Best regards,
Clint


On Fri, Aug 1, 2014 at 5:02 PM, Clint Kelly <clint.ke...@gmail.com> wrote:

> Hi everyone,
>
> I am seeing occasional read timeouts during multi-row queries, but I'm
> having difficulty reproducing them or understanding what the problem
> is.
>
> First, some background:
>
> Our team wrote a custom MapReduce InputFormat that looks pretty
> similar to the DataStax InputFormat except that it allows queries that
> touch multiple CQL tables with the same PRIMARY KEY format (it then
> assembles together results from multiple tables for the same primary
> key before sending them back to the user in the RecordReader).
>
> During a large batch job in a cluster and during some integration
> tests, we see errors like the following:
>
> com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra
> timeout during read query at consistency ONE (1 responses were
> required but only 0 replica responded)
>
> Our queries look like this:
>
> SELECT token(eid_component), eid_component, lg, family, qualifier,
> version, value FROM "kiji_it0"."t_foo" WHERE lg=? AND family=? AND
> qualifier=?  AND token(eid_component) >= ? AND token(eid_component) <=
> ?ALLOW FILTERING;
>
> Our tables look like the following:
>
> CREATE TABLE "kiji_it0"."t_foo" (
>  eid_component varchar,
>  lg varchar,
>  family blob,
>  qualifier blob,
>  version bigint,
>  value blob,
>  PRIMARY KEY ((eid_component), lg, family, qualifier, version))
> WITH CLUSTERING ORDER BY (lg ASC, family ASC, qualifier ASC, version DESC);
>
> with an additional index on the "lg" column (the lg column is
> *extremely* low cardinality).
>
> (FWIW I realize that having "ALLOW FILTERING" is potentially a Very
> Bad Idea, but we are building a framework on top of Cassandra and
> MapReduce that allows our users to occasionally make queries like
> this.  We don't really mind taking a performance hit since these are
> batch jobs.  We are considering eventually supporting some automatic
> denormalization, but have not done so yet.)
>
> If I change the query above to remove the WHERE clauses, the errors go
> away.
>
> I think I understand the problem here---there are some rows that have
> huge amounts of data that we have to scan over, and occasionally those
> scans take so long that there is a timeout.
>
> I have a couple of questions:
>
> 1. What parameters in my code or in the Cassandra cluster do I need to
> adjust to get rid of these timeouts?  Our table layout is designed
> such that its real-time performance should be pretty good, so I don't
> mind if the batch queries are a little bit slow.  Do I need to change
> the read_request_timeout_in_ms parameter?  Or something else?
>
> 2. I have tried to create a test to reproduce this problem, but I have
> been unable to do so.  Any suggestions on how to do this?  I tried
> creating a table similar to that described above and filling in a huge
> amount of data for some rows to try to increase the amount of space
> that we'd need to skip over.  I also tried reducing
> read_request_timeout_in_ms from 5000 ms to 50 ms and still no dice.
>
> Let me know if anyone has any thoughts or suggestions.  At a minimum
> I'd like to be able to reproduce these read timeout errors in some
> integration tests.
>
> Thanks!
>
> Best regards,
> Clint
>

Reply via email to