Checked tpstats, there are very few dropped messages.

Checked histograms. Mostly nothing surprising. The vast majority of rows
are small, and most reads only access one or two SSTables.

What I did discover is that of our 5 nodes, one is performing well, with
disk I/O in the ballprk that seems reasonable. The other 4 nodes are doing
roughly 4x the disk i/O per second.  Interestingly, the node that is
performing well also seems to be servicing about twice the number of reads
that the other nodes are.

I compared configuration between the node performing well to those that
aren't, and so far haven't found any discrepancies.

On Fri, Mar 22, 2013 at 10:43 AM, Wei Zhu <wz1...@yahoo.com> wrote:

> According to your cfstats, read latency is over 100 ms which is really
> really slow. I am seeing less than 3ms reads for my cluster which is on
> SSD. Can you also check the nodetool cfhistorgram, it tells you more about
> the number of SSTable involved and read/write latency. Somtimes average
> doesn't tell you the whole storey.
> Also check your nodetool tpstats, are there a lot dropped reads?
>
> -Wei
> ----- Original Message -----
> From: "Jon Scarborough" <j...@fifth-aeon.net>
> To: user@cassandra.apache.org
> Sent: Friday, March 22, 2013 9:42:34 AM
> Subject: Re: High disk I/O during reads
>
> Key distribution across probably varies a lot from row to row in our case.
> Most reads would probably only need to look at a few SSTables, a few might
> need to look at more.
>
> I don't yet have a deep understanding of C* internals, but I would imagine
> even the more expensive use cases would involve something like this:
>
> 1) Check the index for each SSTable to determine if part of the row is
> there.
> 2) Look at the endpoints of the slice to determine if the data in a
> particular SSTable is relevant to the query.
> 3) Read the chunks of those SSTables, working backwards from the end of
> the slice until enough columns have been read to satisfy the limit clause
> in the query.
>
> So I would have guessed that even the more expensive queries on wide rows
> typically wouldn't need to read more than a few hundred KB from disk to do
> all that. Seems like I'm missing something major.
>
> Here's the complete CF definition, including compression settings:
>
> CREATE COLUMNFAMILY conversation_text_message (
> conversation_key bigint PRIMARY KEY
> ) WITH
> comment='' AND
> comparator='CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.AsciiType,org.apache.cassandra.db.marshal.AsciiType)'
> AND
> read_repair_chance=0.100000 AND
> gc_grace_seconds=864000 AND
> default_validation=text AND
> min_compaction_threshold=4 AND
> max_compaction_threshold=32 AND
> replicate_on_write=True AND
> compaction_strategy_class='SizeTieredCompactionStrategy' AND
>
> compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompressor';
>
> Much thanks for any additional ideas.
>
> -Jon
>
>
>
> On Fri, Mar 22, 2013 at 8:15 AM, Hiller, Dean < dean.hil...@nrel.gov >
> wrote:
>
>
> Did you mean to ask "are 'all' your keys spread across all SSTables"? I am
> guessing at your intention.
>
> I mean I would very well hope my keys are spread across all sstables or
> otherwise that sstable should not be there as he has no keys in it ;).
>
> And I know we had HUGE disk size from the duplication in our sstables on
> size-tiered compaction….we never ran a major compaction but after we
> switched to LCS, we went from 300G to some 120G or something like that
> which was nice. We only have 300 data point posts / second so not an
> extreme write load on 6 nodes as well though these posts causes read to
> check authorization and such of our system.
>
> Dean
>
> From: Kanwar Sangha < kan...@mavenir.com <mailto: kan...@mavenir.com >>
> Reply-To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org>" <
> user@cassandra.apache.org <mailto: user@cassandra.apache.org >>
> Date: Friday, March 22, 2013 8:38 AM
> To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org >" <
> user@cassandra.apache.org <mailto: user@cassandra.apache.org >>
> Subject: RE: High disk I/O during reads
>
>
> Are your Keys spread across all SSTables ? That will cause every sstable
> read which will increase the I/O.
>
> What compaction are you using ?
>
> From: zod...@fifth-aeon.net <mailto: zod...@fifth-aeon.net > [mailto:
> zod...@fifth-aeon.net ] On Behalf Of Jon Scarborough
>
> Sent: 21 March 2013 23:00
> To: user@cassandra.apache.org <mailto: user@cassandra.apache.org >
>
>
> Subject: High disk I/O during reads
>
> Hello,
>
> We've had a 5-node C* cluster (version 1.1.0) running for several months.
> Up until now we've mostly been writing data, but now we're starting to
> service more read traffic. We're seeing far more disk I/O to service these
> reads than I would have anticipated.
>
> The CF being queried consists of chat messages. Each row represents a
> conversation between two people. Each column represents a message. The
> column key is composite, consisting of the message date and a few other
> bits of information. The CF is using compression.
>
> The query is looking for a maximum of 50 messages between two dates, in
> reverse order. Usually the two dates used as endpoints are 30 days ago and
> the current time. The query in Astyanax looks like this:
>
> ColumnList<ConversationTextMessageKey> result =
> keyspace.prepareQuery(CF_CONVERSATION_TEXT_MESSAGE)
> .setConsistencyLevel(ConsistencyLevel.CL_QUORUM)
> .getKey(conversationKey)
> .withColumnRange(
> textMessageSerializer.makeEndpoint(endDate, Equality.LESS_THAN).toBytes(),
> textMessageSerializer.makeEndpoint(startDate,
> Equality.GREATER_THAN_EQUALS).toBytes(),
> true,
> maxMessages)
> .execute()
> .getResult();
>
> We're currently servicing around 30 of these queries per second.
>
> Here's what the cfstats for the CF look like:
>
> Column Family: conversation_text_message
> SSTable count: 15
> Space used (live): 211762982685
> Space used (total): 211762982685
> Number of Keys (estimate): 330118528
> Memtable Columns Count: 68063
> Memtable Data Size: 53093938
> Memtable Switch Count: 9743
> Read Count: 4313344
> Read Latency: 118.831 ms.
> Write Count: 817876950
> Write Latency: 0.023 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 6055
> Bloom Filter False Ratio: 0.00260
> Bloom Filter Space Used: 686266048
> Compacted row minimum size: 87
> Compacted row maximum size: 14530764
> Compacted row mean size: 1186
>
> On the C* nodes, iostat output like this is typical, and can spike to be
> much worse:
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 1.91 0.00 2.08 30.66 0.50 64.84
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> xvdap1 0.13 0.00 1.07 0 16
> xvdb 474.20 13524.53 25.33 202868 380
> xvdc 469.87 13455.73 30.40 201836 456
> md0 972.13 26980.27 55.73 404704 836
>
> Any thoughts on what could be causing read I/O to the disk from these
> queries?
>
> Much thanks!
>
> -Jon
>
>
>

Reply via email to