Re: avg write io wait time regression in 1.2.1

Matthew Von-Maszewski Thu, 01 Nov 2012 17:48:43 -0700

Dietrich,

I can see your concern with the write IOS statistic.  Let me comment on the 
easy question first:  block_size.

The block_size parameter in 1.1 was not getting passed to leveldb from the 
erlang layer.  You were using a 4096 byte block parameter no matter what you 
typed in the app.config.  The block_size is used by leveldb as a threshold.  
Once you accumulate enough data above that threshold, the current block is 
written to disk and a new one started.  If you have 10k data values, your get 
one data item per block and its size is ~10k.  If you have 1k data values, you 
get about four per block and the block is about 4k.

We recommend 4k blocks to help read performance.  The entire block has to run 
through decompression and potentially CRC calculation when it comes off the 
disk.  That CPU time really kills any disk performance gains by having larger 
blocks.  Ok, that might change in 1.3 as we enable hardware CRC … but only if 
you have "verify_checksums, true" in app.config.

Back to performance:  I have not seen the change your graph details when 
testing with SAS drives under moderate load.  I am only today starting 
qualification tests with SSD drives.

But my 1.2 and 1.3 tests focus on drive / Riak saturation.  1.1 has the nasty 
tendency to stall (intentionally) when we saturate the write side of leveldb, . 
 The stall was measured in seconds or even minutes in 1.1.  1.2.1 has a write 
throttle that forecasts leveldb's stall state and incrementally slows the 
individual writes to prevent the stalls.  Maybe that is what is being seen in 
the graph.  The only way to know for sure is to get an dump of your leveldb LOG 
files, combined them, then compare compaction activity to your graph.

Write stalls are detailed here:   
http://basho.com/blog/technical/2012/10/30/leveldb-in-riak-1p2/

How can I better assist you at this point?

Matthew

On Nov 1, 2012, at 8:13 PM, Dietrich Featherston wrote:

> We've just gone through the process of upgrading two riak clusters from 1.1  
> to 1.2.1. Both are on the leveldb backend backed by RAID0'd SSDs. The process 
> has gone smoothly and we see that latencies as measured at the gen_fsm level 
> are largely unaffected.
> 
> However, we are seeing some troubling disk statistics and I'm looking for an 
> explanation before we upgrade the remainder of our nodes. The source of the 
> worry seems to be a huge amplification in the number of writes serviced by 
> the disk which may be the cause of rising io wait times.
> 
> My first thought was that this could be due to some leveldb tuning in 1.2.1 
> which increases file sizes per the release notes 
> (https://github.com/basho/riak/blob/master/RELEASE-NOTES.md). But nodes that 
> were upgraded yesterday are still showing this symptom. I would have expected 
> any block re-writing to have subsided by now.
> 
> Next hypothesis has to do with block size overriding in app.config. In 1.1, 
> we had specified custom block sizes of 256k. We removed this prior to 
> upgrading to 1.2.1 at the advice of #riak since block size configuration was 
> ignored prior to 1.2 ('"block_size" parameter within app.config for leveldb 
> was ignored.  This parameter is now properly passed to leveldb.' --> 
> https://github.com/basho/riak/commit/f12596c221a9d942cc23d8e4fd83c9ca46e02105).
>  I'm wondering if the block size parameter really was being passed to 
> leveldb, and having removed it, blocks are now being rewritten to a new size, 
> perhaps different from what they were being written as before 
> (https://github.com/basho/riak_kv/commit/ad192ee775b2f5a68430d230c0999a2caabd1155)
> 
> Here is the output of the following script showing the increased writes to 
> disk (https://gist.github.com/37319a8ed2679bb8b21d)
> 
> --an upgraded 1.2.1 node--
> read ios: 238406742
> write ios: 4814320281
> read/write ratio: .04952033
> avg wait: .10712340
> read wait: .49174364
> write wait: .42695475
> 
> 
> --a node still running 1.1--
> read ios: 267770032
> write ios: 944170656
> read/write ratio: .28360342
> avg wait: .34237204
> read wait: .47222371
> write wait: 1.83283749
> 
> And here's what munin is showing us in terms of avg io wait times.
> 
> <image.png>
> 
> 
> Any thoughts on what might explain this?
> 
> Thanks,
> D
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: avg write io wait time regression in 1.2.1

Reply via email to