Re: avg write io wait time regression in 1.2.1

Dietrich Featherston Thu, 01 Nov 2012 17:54:42 -0700

Thanks. The amortized stalls may very well describe what we are seeing. If
I combine leveldb logs from all partitions on one of the upgraded nodes
what should I look for in terms of compaction activity to verify this?



On Thu, Nov 1, 2012 at 5:48 PM, Matthew Von-Maszewski <matth...@basho.com>wrote:

> Dietrich,
>
> I can see your concern with the write IOS statistic.  Let me comment on
> the easy question first:  block_size.
>
> The block_size parameter in 1.1 was not getting passed to leveldb from the
> erlang layer.  You were using a 4096 byte block parameter no matter what
> you typed in the app.config.  The block_size is used by leveldb as a
> threshold.  Once you accumulate enough data above that threshold, the
> current block is written to disk and a new one started.  If you have 10k
> data values, your get one data item per block and its size is ~10k.  If you
> have 1k data values, you get about four per block and the block is about 4k.
>
> We recommend 4k blocks to help read performance.  The entire block has to
> run through decompression and potentially CRC calculation when it comes off
> the disk.  That CPU time really kills any disk performance gains by having
> larger blocks.  Ok, that might change in 1.3 as we enable hardware CRC …
> but only if you have "verify_checksums, true" in app.config.
>
>
> Back to performance:  I have not seen the change your graph details when
> testing with SAS drives under moderate load.  I am only today starting
> qualification tests with SSD drives.
>
> But my 1.2 and 1.3 tests focus on drive / Riak saturation.  1.1 has the
> nasty tendency to stall (intentionally) when we saturate the write side of
> leveldb, .  The stall was measured in seconds or even minutes in 1.1.
>  1.2.1 has a write throttle that forecasts leveldb's stall state and
> incrementally slows the individual writes to prevent the stalls.  Maybe
> that is what is being seen in the graph.  The only way to know for sure is
> to get an dump of your leveldb LOG files, combined them, then compare
> compaction activity to your graph.
>
> Write stalls are detailed here:
> http://basho.com/blog/technical/2012/10/30/leveldb-in-riak-1p2/
>
> How can I better assist you at this point?
>
> Matthew
>
>
> On Nov 1, 2012, at 8:13 PM, Dietrich Featherston wrote:
>
> We've just gone through the process of upgrading two riak clusters from
> 1.1  to 1.2.1. Both are on the leveldb backend backed by RAID0'd SSDs. The
> process has gone smoothly and we see that latencies as measured at the
> gen_fsm level are largely unaffected.
>
> However, we are seeing some troubling disk statistics and I'm looking for
> an explanation before we upgrade the remainder of our nodes. The source of
> the worry seems to be a huge amplification in the number of writes serviced
> by the disk which may be the cause of rising io wait times.
>
> My first thought was that this could be due to some leveldb tuning in
> 1.2.1 which increases file sizes per the release notes (
> https://github.com/basho/riak/blob/master/RELEASE-NOTES.md). But nodes
> that were upgraded yesterday are still showing this symptom. I would have
> expected any block re-writing to have subsided by now.
>
> Next hypothesis has to do with block size overriding in app.config. In
> 1.1, we had specified custom block sizes of 256k. We removed this prior to
> upgrading to 1.2.1 at the advice of #riak since block size configuration
> was ignored prior to 1.2 ('"block_size" parameter within app.config for
> leveldb was ignored.  This parameter is now properly passed to leveldb.'
> -->
> https://github.com/basho/riak/commit/f12596c221a9d942cc23d8e4fd83c9ca46e02105).
> I'm wondering if the block size parameter really was being passed to
> leveldb, and having removed it, blocks are now being rewritten to a new
> size, perhaps different from what they were being written as before (
> https://github.com/basho/riak_kv/commit/ad192ee775b2f5a68430d230c0999a2caabd1155
> )
>
> Here is the output of the following script showing the increased writes to
> disk (https://gist.github.com/37319a8ed2679bb8b21d)
>
>  --an upgraded 1.2.1 node--
> read ios: 238406742
> write ios: 4814320281
> read/write ratio: .04952033
> avg wait: .10712340
> read wait: .49174364
> write wait: .42695475
>
>
> --a node still running 1.1--
> read ios: 267770032
> write ios: 944170656
> read/write ratio: .28360342
> avg wait: .34237204
> read wait: .47222371
> write wait: 1.83283749
>
> And here's what munin is showing us in terms of avg io wait times.
>
> <image.png>
>
>
> Any thoughts on what might explain this?
>
> Thanks,
> D
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: avg write io wait time regression in 1.2.1

Reply via email to