Try breaking it up into smaller chunks using multiple threads and token ranges. 86400 is pretty large. I found ~1000 results per query is good. This will spread the burden across all servers a little more evenly.
On Thu, May 7, 2015 at 4:27 AM, Alprema <alpr...@alprema.com> wrote: > Hi, > > I am writing an application that will periodically read big amounts of > data from Cassandra and I am experiencing odd performances. > > My column family is a classic time series one, with series ID and Day as > partition key and a timestamp as clustering key, the value being a double. > > The query I run gets all the values for a given time series for a given > day (so about 86400 points): > > SELECT "UtcDate", "Value"FROM "Metric_OneSec"WHERE "MetricId" = > 12215ece-6544-4fcf-a15d-4f9e9ce1567eAND "Day" = '2015-05-05 > 00:00:00+0000'LIMIT 86400; > > > This takes about 450ms to run and when I trace the query I see that it > takes about 110ms to read the data from disk and 224ms to send the data > from the responsible node to the coordinator (full trace in attachment). > > I did a quick estimation of the requested data (correct me if I'm wrong): > 86400 * (column name + column value + timestamp + ttl) > = 86400 * (8 + 8 + 8 + 8?) > = 2.6Mb > > Let's say about 3Mb with misc. overhead, so these timings seem pretty slow > to me for a modern SSD and a 1Gb/s NIC. > > Do those timings seem normal? Am I missing something? > > Thank you, > > Kévin > > >