> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn > xvdap1 0.13 0.00 1.07 0 16 > xvdb 474.20 13524.53 25.33 202868 380 > xvdc 469.87 13455.73 30.40 201836 456 Perchance are you running on m1.large instances ? You may be seeing the "moderate" IO performance they offer. > The query is looking for a maximum of 50 messages between two dates, in > reverse order. This is probably the slowest query you can do. Removing the reverse would make it faster, or using a reversed comparator. You mentioned that most queries in cfhistograms were hitting one SSTable. If you are comfortable with the output from that I would work out the distribution for read latency, the read_latency in cfstats if the most recent only.
It have a lot of deletes on a row that can slow down reads as well. Does not sound like the case but I wanted to mention in. Next thing to look at is the proxyhistograms, that will give you the end-to-end request latency in cassandra. Which will help with identifying if the issue is the disk read or network / something else. The something else may be DigestMismatch errors if you have been dropping writes and not running repair / disable hints. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/03/2013, at 2:59 PM, Matt Kap <matvey1...@gmail.com> wrote: > Having battled similar issues with read latency recently, here's some > general things to look out for. > > - At 118ms, something is definitely broken. You should be looking at > under 10ms or lower, depending on hardware. > - Do "nodetool info" on all 5 nodes. Is the load distributed evenly? > Is it reasonable (under 500GB)? > - Make sure you aren't running low on heap space. You could see that > from "nodetool info" also. If you are running low, very bad things > begin to happen (lots of GC, constant flushing of Memtables, reduction > of Key Cache, etc). Generally, once there, the node doesn't recover, > and read latency goes to sh*t. > - Which compaction strategy are you using? Leveled compactions or > size-tiered? There's different issues with both. > - Is your Key Cache turned on? What's the Key Cache hit rate? > - Is the Read Latency the same on all nodes? Or just one in particular? > - Are pending compactions building up? > - What's %util on disk? Same on all nodes? > > I would go through "nodetool cfstats, info, compactionstats, tpstats", > and see if things are roughly the same across all the nodes. You could > also just be under capacity, but more likely, there's an actual > problem looming somewhere. > > Cheers! > -Matt > > On Sat, Mar 23, 2013 at 3:18 AM, <i...@4friends.od.ua> wrote: >> You can try to disable readahead on cassandra data disk. >> >> Jon Scarborough <j...@fifth-aeon.net> написал(а): >>> >>> Checked tpstats, there are very few dropped messages. >>> >>> Checked histograms. Mostly nothing surprising. The vast majority of rows >>> are small, and most reads only access one or two SSTables. >>> >>> What I did discover is that of our 5 nodes, one is performing well, with >>> disk I/O in the ballprk that seems reasonable. The other 4 nodes are doing >>> roughly 4x the disk i/O per second. Interestingly, the node that is >>> performing well also seems to be servicing about twice the number of reads >>> that the other nodes are. >>> >>> I compared configuration between the node performing well to those that >>> aren't, and so far haven't found any discrepancies. >>> >>> On Fri, Mar 22, 2013 at 10:43 AM, Wei Zhu <wz1...@yahoo.com> wrote: >>>> >>>> According to your cfstats, read latency is over 100 ms which is really >>>> really slow. I am seeing less than 3ms reads for my cluster which is on >>>> SSD. >>>> Can you also check the nodetool cfhistorgram, it tells you more about the >>>> number of SSTable involved and read/write latency. Somtimes average doesn't >>>> tell you the whole storey. >>>> Also check your nodetool tpstats, are there a lot dropped reads? >>>> >>>> -Wei >>>> ----- Original Message ----- >>>> From: "Jon Scarborough" <j...@fifth-aeon.net> >>>> To: user@cassandra.apache.org >>>> Sent: Friday, March 22, 2013 9:42:34 AM >>>> Subject: Re: High disk I/O during reads >>>> >>>> Key distribution across probably varies a lot from row to row in our >>>> case. Most reads would probably only need to look at a few SSTables, a few >>>> might need to look at more. >>>> >>>> I don't yet have a deep understanding of C* internals, but I would >>>> imagine even the more expensive use cases would involve something like >>>> this: >>>> >>>> 1) Check the index for each SSTable to determine if part of the row is >>>> there. >>>> 2) Look at the endpoints of the slice to determine if the data in a >>>> particular SSTable is relevant to the query. >>>> 3) Read the chunks of those SSTables, working backwards from the end of >>>> the slice until enough columns have been read to satisfy the limit clause >>>> in >>>> the query. >>>> >>>> So I would have guessed that even the more expensive queries on wide rows >>>> typically wouldn't need to read more than a few hundred KB from disk to do >>>> all that. Seems like I'm missing something major. >>>> >>>> Here's the complete CF definition, including compression settings: >>>> >>>> CREATE COLUMNFAMILY conversation_text_message ( >>>> conversation_key bigint PRIMARY KEY >>>> ) WITH >>>> comment='' AND >>>> >>>> comparator='CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.AsciiType,org.apache.cassandra.db.marshal.AsciiType)' >>>> AND >>>> read_repair_chance=0.100000 AND >>>> gc_grace_seconds=864000 AND >>>> default_validation=text AND >>>> min_compaction_threshold=4 AND >>>> max_compaction_threshold=32 AND >>>> replicate_on_write=True AND >>>> compaction_strategy_class='SizeTieredCompactionStrategy' AND >>>> >>>> compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompressor'; >>>> >>>> Much thanks for any additional ideas. >>>> >>>> -Jon >>>> >>>> >>>> >>>> On Fri, Mar 22, 2013 at 8:15 AM, Hiller, Dean < dean.hil...@nrel.gov > >>>> wrote: >>>> >>>> >>>> Did you mean to ask "are 'all' your keys spread across all SSTables"? I >>>> am guessing at your intention. >>>> >>>> I mean I would very well hope my keys are spread across all sstables or >>>> otherwise that sstable should not be there as he has no keys in it ;). >>>> >>>> And I know we had HUGE disk size from the duplication in our sstables on >>>> size-tiered compaction....we never ran a major compaction but after we >>>> switched to LCS, we went from 300G to some 120G or something like that >>>> which >>>> was nice. We only have 300 data point posts / second so not an extreme >>>> write >>>> load on 6 nodes as well though these posts causes read to check >>>> authorization and such of our system. >>>> >>>> Dean >>>> >>>> From: Kanwar Sangha < kan...@mavenir.com <mailto: kan...@mavenir.com >> >>>> Reply-To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org >>>>> " < user@cassandra.apache.org <mailto: user@cassandra.apache.org >> >>>> Date: Friday, March 22, 2013 8:38 AM >>>> To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org >" < >>>> user@cassandra.apache.org <mailto: user@cassandra.apache.org >> >>>> Subject: RE: High disk I/O during reads >>>> >>>> >>>> Are your Keys spread across all SSTables ? That will cause every sstable >>>> read which will increase the I/O. >>>> >>>> What compaction are you using ? >>>> >>>> From: zod...@fifth-aeon.net <mailto: zod...@fifth-aeon.net > [mailto: >>>> zod...@fifth-aeon.net ] On Behalf Of Jon Scarborough >>>> >>>> Sent: 21 March 2013 23:00 >>>> To: user@cassandra.apache.org <mailto: user@cassandra.apache.org > >>>> >>>> >>>> Subject: High disk I/O during reads >>>> >>>> Hello, >>>> >>>> We've had a 5-node C* cluster (version 1.1.0) running for several months. >>>> Up until now we've mostly been writing data, but now we're starting to >>>> service more read traffic. We're seeing far more disk I/O to service these >>>> reads than I would have anticipated. >>>> >>>> The CF being queried consists of chat messages. Each row represents a >>>> conversation between two people. Each column represents a message. The >>>> column key is composite, consisting of the message date and a few other >>>> bits >>>> of information. The CF is using compression. >>>> >>>> The query is looking for a maximum of 50 messages between two dates, in >>>> reverse order. Usually the two dates used as endpoints are 30 days ago and >>>> the current time. The query in Astyanax looks like this: >>>> >>>> ColumnList<ConversationTextMessageKey> result = >>>> keyspace.prepareQuery(CF_CONVERSATION_TEXT_MESSAGE) >>>> .setConsistencyLevel(ConsistencyLevel.CL_QUORUM) >>>> .getKey(conversationKey) >>>> .withColumnRange( >>>> textMessageSerializer.makeEndpoint(endDate, >>>> Equality.LESS_THAN).toBytes(), >>>> textMessageSerializer.makeEndpoint(startDate, >>>> Equality.GREATER_THAN_EQUALS).toBytes(), >>>> true, >>>> maxMessages) >>>> .execute() >>>> .getResult(); >>>> >>>> We're currently servicing around 30 of these queries per second. >>>> >>>> Here's what the cfstats for the CF look like: >>>> >>>> Column Family: conversation_text_message >>>> SSTable count: 15 >>>> Space used (live): 211762982685 >>>> Space used (total): 211762982685 >>>> Number of Keys (estimate): 330118528 >>>> Memtable Columns Count: 68063 >>>> Memtable Data Size: 53093938 >>>> Memtable Switch Count: 9743 >>>> Read Count: 4313344 >>>> Read Latency: 118.831 ms. >>>> Write Count: 817876950 >>>> Write Latency: 0.023 ms. >>>> Pending Tasks: 0 >>>> Bloom Filter False Postives: 6055 >>>> Bloom Filter False Ratio: 0.00260 >>>> Bloom Filter Space Used: 686266048 >>>> Compacted row minimum size: 87 >>>> Compacted row maximum size: 14530764 >>>> Compacted row mean size: 1186 >>>> >>>> On the C* nodes, iostat output like this is typical, and can spike to be >>>> much worse: >>>> >>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>> 1.91 0.00 2.08 30.66 0.50 64.84 >>>> >>>> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn >>>> xvdap1 0.13 0.00 1.07 0 16 >>>> xvdb 474.20 13524.53 25.33 202868 380 >>>> xvdc 469.87 13455.73 30.40 201836 456 >>>> md0 972.13 26980.27 55.73 404704 836 >>>> >>>> Any thoughts on what could be causing read I/O to the disk from these >>>> queries? >>>> >>>> Much thanks! >>>> >>>> -Jon >>>> >>>> >>> >> >> -- >> Отправлено через К-9 Mail. Извините за краткость, пожалуйста. > > > > -- > www.calcmachine.com - easy online calculator.