Re: avg write io wait time regression in 1.2.1

Matthew Von-Maszewski Thu, 01 Nov 2012 18:03:16 -0700

Look for any activity in the LOG.  Level-0 "creations" are fast and not 
typically relevant.  You would be most interested in LOG lines containing 
"Compacting" (start) and "Compacted" (end).  The time in between will throttle. 
 The idea is that these compaction events can pile up, one after another and 
multiple overlapping.  It is these heavy times where the throttle saves the 
user experience.


Matthew


On Nov 1, 2012, at 8:54 PM, Dietrich Featherston wrote:

> Thanks. The amortized stalls may very well describe what we are seeing. If I 
> combine leveldb logs from all partitions on one of the upgraded nodes what 
> should I look for in terms of compaction activity to verify this?
> 
> 
> On Thu, Nov 1, 2012 at 5:48 PM, Matthew Von-Maszewski <matth...@basho.com> 
> wrote:
> Dietrich,
> 
> I can see your concern with the write IOS statistic.  Let me comment on the 
> easy question first:  block_size.
> 
> The block_size parameter in 1.1 was not getting passed to leveldb from the 
> erlang layer.  You were using a 4096 byte block parameter no matter what you 
> typed in the app.config.  The block_size is used by leveldb as a threshold.  
> Once you accumulate enough data above that threshold, the current block is 
> written to disk and a new one started.  If you have 10k data values, your get 
> one data item per block and its size is ~10k.  If you have 1k data values, 
> you get about four per block and the block is about 4k.
> 
> We recommend 4k blocks to help read performance.  The entire block has to run 
> through decompression and potentially CRC calculation when it comes off the 
> disk.  That CPU time really kills any disk performance gains by having larger 
> blocks.  Ok, that might change in 1.3 as we enable hardware CRC … but only if 
> you have "verify_checksums, true" in app.config.
> 
> 
> Back to performance:  I have not seen the change your graph details when 
> testing with SAS drives under moderate load.  I am only today starting 
> qualification tests with SSD drives.
> 
> But my 1.2 and 1.3 tests focus on drive / Riak saturation.  1.1 has the nasty 
> tendency to stall (intentionally) when we saturate the write side of leveldb, 
> .  The stall was measured in seconds or even minutes in 1.1.  1.2.1 has a 
> write throttle that forecasts leveldb's stall state and incrementally slows 
> the individual writes to prevent the stalls.  Maybe that is what is being 
> seen in the graph.  The only way to know for sure is to get an dump of your 
> leveldb LOG files, combined them, then compare compaction activity to your 
> graph.
> 
> Write stalls are detailed here:   
> http://basho.com/blog/technical/2012/10/30/leveldb-in-riak-1p2/
> 
> How can I better assist you at this point?
> 
> Matthew
> 
> 
> On Nov 1, 2012, at 8:13 PM, Dietrich Featherston wrote:
> 
>> We've just gone through the process of upgrading two riak clusters from 1.1  
>> to 1.2.1. Both are on the leveldb backend backed by RAID0'd SSDs. The 
>> process has gone smoothly and we see that latencies as measured at the 
>> gen_fsm level are largely unaffected.
>> 
>> However, we are seeing some troubling disk statistics and I'm looking for an 
>> explanation before we upgrade the remainder of our nodes. The source of the 
>> worry seems to be a huge amplification in the number of writes serviced by 
>> the disk which may be the cause of rising io wait times.
>> 
>> My first thought was that this could be due to some leveldb tuning in 1.2.1 
>> which increases file sizes per the release notes 
>> (https://github.com/basho/riak/blob/master/RELEASE-NOTES.md). But nodes that 
>> were upgraded yesterday are still showing this symptom. I would have 
>> expected any block re-writing to have subsided by now.
>> 
>> Next hypothesis has to do with block size overriding in app.config. In 1.1, 
>> we had specified custom block sizes of 256k. We removed this prior to 
>> upgrading to 1.2.1 at the advice of #riak since block size configuration was 
>> ignored prior to 1.2 ('"block_size" parameter within app.config for leveldb 
>> was ignored.  This parameter is now properly passed to leveldb.' --> 
>> https://github.com/basho/riak/commit/f12596c221a9d942cc23d8e4fd83c9ca46e02105).
>>  I'm wondering if the block size parameter really was being passed to 
>> leveldb, and having removed it, blocks are now being rewritten to a new 
>> size, perhaps different from what they were being written as before 
>> (https://github.com/basho/riak_kv/commit/ad192ee775b2f5a68430d230c0999a2caabd1155)
>> 
>> Here is the output of the following script showing the increased writes to 
>> disk (https://gist.github.com/37319a8ed2679bb8b21d)
>> 
>> --an upgraded 1.2.1 node--
>> read ios: 238406742
>> write ios: 4814320281
>> read/write ratio: .04952033
>> avg wait: .10712340
>> read wait: .49174364
>> write wait: .42695475
>> 
>> 
>> --a node still running 1.1--
>> read ios: 267770032
>> write ios: 944170656
>> read/write ratio: .28360342
>> avg wait: .34237204
>> read wait: .47222371
>> write wait: 1.83283749
>> 
>> And here's what munin is showing us in terms of avg io wait times.
>> 
>> <image.png>
>> 
>> 
>> Any thoughts on what might explain this?
>> 
>> Thanks,
>> D
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: avg write io wait time regression in 1.2.1

Reply via email to