You can try to disable readahead on cassandra data disk. Jon Scarborough <j...@fifth-aeon.net> написал(а):
>Checked tpstats, there are very few dropped messages. > >Checked histograms. Mostly nothing surprising. The vast majority of >rows >are small, and most reads only access one or two SSTables. > >What I did discover is that of our 5 nodes, one is performing well, >with >disk I/O in the ballprk that seems reasonable. The other 4 nodes are >doing >roughly 4x the disk i/O per second. Interestingly, the node that is >performing well also seems to be servicing about twice the number of >reads >that the other nodes are. > >I compared configuration between the node performing well to those that >aren't, and so far haven't found any discrepancies. > >On Fri, Mar 22, 2013 at 10:43 AM, Wei Zhu <wz1...@yahoo.com> wrote: > >> According to your cfstats, read latency is over 100 ms which is >really >> really slow. I am seeing less than 3ms reads for my cluster which is >on >> SSD. Can you also check the nodetool cfhistorgram, it tells you more >about >> the number of SSTable involved and read/write latency. Somtimes >average >> doesn't tell you the whole storey. >> Also check your nodetool tpstats, are there a lot dropped reads? >> >> -Wei >> ----- Original Message ----- >> From: "Jon Scarborough" <j...@fifth-aeon.net> >> To: user@cassandra.apache.org >> Sent: Friday, March 22, 2013 9:42:34 AM >> Subject: Re: High disk I/O during reads >> >> Key distribution across probably varies a lot from row to row in our >case. >> Most reads would probably only need to look at a few SSTables, a few >might >> need to look at more. >> >> I don't yet have a deep understanding of C* internals, but I would >imagine >> even the more expensive use cases would involve something like this: >> >> 1) Check the index for each SSTable to determine if part of the row >is >> there. >> 2) Look at the endpoints of the slice to determine if the data in a >> particular SSTable is relevant to the query. >> 3) Read the chunks of those SSTables, working backwards from the end >of >> the slice until enough columns have been read to satisfy the limit >clause >> in the query. >> >> So I would have guessed that even the more expensive queries on wide >rows >> typically wouldn't need to read more than a few hundred KB from disk >to do >> all that. Seems like I'm missing something major. >> >> Here's the complete CF definition, including compression settings: >> >> CREATE COLUMNFAMILY conversation_text_message ( >> conversation_key bigint PRIMARY KEY >> ) WITH >> comment='' AND >> >comparator='CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.AsciiType,org.apache.cassandra.db.marshal.AsciiType)' >> AND >> read_repair_chance=0.100000 AND >> gc_grace_seconds=864000 AND >> default_validation=text AND >> min_compaction_threshold=4 AND >> max_compaction_threshold=32 AND >> replicate_on_write=True AND >> compaction_strategy_class='SizeTieredCompactionStrategy' AND >> >> >compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompressor'; >> >> Much thanks for any additional ideas. >> >> -Jon >> >> >> >> On Fri, Mar 22, 2013 at 8:15 AM, Hiller, Dean < dean.hil...@nrel.gov >> >> wrote: >> >> >> Did you mean to ask "are 'all' your keys spread across all SSTables"? >I am >> guessing at your intention. >> >> I mean I would very well hope my keys are spread across all sstables >or >> otherwise that sstable should not be there as he has no keys in it >;). >> >> And I know we had HUGE disk size from the duplication in our sstables >on >> size-tiered compaction….we never ran a major compaction but after we >> switched to LCS, we went from 300G to some 120G or something like >that >> which was nice. We only have 300 data point posts / second so not an >> extreme write load on 6 nodes as well though these posts causes read >to >> check authorization and such of our system. >> >> Dean >> >> From: Kanwar Sangha < kan...@mavenir.com <mailto: kan...@mavenir.com >>> >> Reply-To: " user@cassandra.apache.org <mailto: >user@cassandra.apache.org>" < >> user@cassandra.apache.org <mailto: user@cassandra.apache.org >> >> Date: Friday, March 22, 2013 8:38 AM >> To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org >" >< >> user@cassandra.apache.org <mailto: user@cassandra.apache.org >> >> Subject: RE: High disk I/O during reads >> >> >> Are your Keys spread across all SSTables ? That will cause every >sstable >> read which will increase the I/O. >> >> What compaction are you using ? >> >> From: zod...@fifth-aeon.net <mailto: zod...@fifth-aeon.net > [mailto: >> zod...@fifth-aeon.net ] On Behalf Of Jon Scarborough >> >> Sent: 21 March 2013 23:00 >> To: user@cassandra.apache.org <mailto: user@cassandra.apache.org > >> >> >> Subject: High disk I/O during reads >> >> Hello, >> >> We've had a 5-node C* cluster (version 1.1.0) running for several >months. >> Up until now we've mostly been writing data, but now we're starting >to >> service more read traffic. We're seeing far more disk I/O to service >these >> reads than I would have anticipated. >> >> The CF being queried consists of chat messages. Each row represents a >> conversation between two people. Each column represents a message. >The >> column key is composite, consisting of the message date and a few >other >> bits of information. The CF is using compression. >> >> The query is looking for a maximum of 50 messages between two dates, >in >> reverse order. Usually the two dates used as endpoints are 30 days >ago and >> the current time. The query in Astyanax looks like this: >> >> ColumnList<ConversationTextMessageKey> result = >> keyspace.prepareQuery(CF_CONVERSATION_TEXT_MESSAGE) >> .setConsistencyLevel(ConsistencyLevel.CL_QUORUM) >> .getKey(conversationKey) >> .withColumnRange( >> textMessageSerializer.makeEndpoint(endDate, >Equality.LESS_THAN).toBytes(), >> textMessageSerializer.makeEndpoint(startDate, >> Equality.GREATER_THAN_EQUALS).toBytes(), >> true, >> maxMessages) >> .execute() >> .getResult(); >> >> We're currently servicing around 30 of these queries per second. >> >> Here's what the cfstats for the CF look like: >> >> Column Family: conversation_text_message >> SSTable count: 15 >> Space used (live): 211762982685 >> Space used (total): 211762982685 >> Number of Keys (estimate): 330118528 >> Memtable Columns Count: 68063 >> Memtable Data Size: 53093938 >> Memtable Switch Count: 9743 >> Read Count: 4313344 >> Read Latency: 118.831 ms. >> Write Count: 817876950 >> Write Latency: 0.023 ms. >> Pending Tasks: 0 >> Bloom Filter False Postives: 6055 >> Bloom Filter False Ratio: 0.00260 >> Bloom Filter Space Used: 686266048 >> Compacted row minimum size: 87 >> Compacted row maximum size: 14530764 >> Compacted row mean size: 1186 >> >> On the C* nodes, iostat output like this is typical, and can spike to >be >> much worse: >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 1.91 0.00 2.08 30.66 0.50 64.84 >> >> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn >> xvdap1 0.13 0.00 1.07 0 16 >> xvdb 474.20 13524.53 25.33 202868 380 >> xvdc 469.87 13455.73 30.40 201836 456 >> md0 972.13 26980.27 55.73 404704 836 >> >> Any thoughts on what could be causing read I/O to the disk from these >> queries? >> >> Much thanks! >> >> -Jon >> >> >> -- Отправлено через К-9 Mail. Извините за краткость, пожалуйста.