Thanks. The amortized stalls may very well describe what we are seeing. If I combine leveldb logs from all partitions on one of the upgraded nodes what should I look for in terms of compaction activity to verify this?
On Thu, Nov 1, 2012 at 5:48 PM, Matthew Von-Maszewski <matth...@basho.com>wrote: > Dietrich, > > I can see your concern with the write IOS statistic. Let me comment on > the easy question first: block_size. > > The block_size parameter in 1.1 was not getting passed to leveldb from the > erlang layer. You were using a 4096 byte block parameter no matter what > you typed in the app.config. The block_size is used by leveldb as a > threshold. Once you accumulate enough data above that threshold, the > current block is written to disk and a new one started. If you have 10k > data values, your get one data item per block and its size is ~10k. If you > have 1k data values, you get about four per block and the block is about 4k. > > We recommend 4k blocks to help read performance. The entire block has to > run through decompression and potentially CRC calculation when it comes off > the disk. That CPU time really kills any disk performance gains by having > larger blocks. Ok, that might change in 1.3 as we enable hardware CRC … > but only if you have "verify_checksums, true" in app.config. > > > Back to performance: I have not seen the change your graph details when > testing with SAS drives under moderate load. I am only today starting > qualification tests with SSD drives. > > But my 1.2 and 1.3 tests focus on drive / Riak saturation. 1.1 has the > nasty tendency to stall (intentionally) when we saturate the write side of > leveldb, . The stall was measured in seconds or even minutes in 1.1. > 1.2.1 has a write throttle that forecasts leveldb's stall state and > incrementally slows the individual writes to prevent the stalls. Maybe > that is what is being seen in the graph. The only way to know for sure is > to get an dump of your leveldb LOG files, combined them, then compare > compaction activity to your graph. > > Write stalls are detailed here: > http://basho.com/blog/technical/2012/10/30/leveldb-in-riak-1p2/ > > How can I better assist you at this point? > > Matthew > > > On Nov 1, 2012, at 8:13 PM, Dietrich Featherston wrote: > > We've just gone through the process of upgrading two riak clusters from > 1.1 to 1.2.1. Both are on the leveldb backend backed by RAID0'd SSDs. The > process has gone smoothly and we see that latencies as measured at the > gen_fsm level are largely unaffected. > > However, we are seeing some troubling disk statistics and I'm looking for > an explanation before we upgrade the remainder of our nodes. The source of > the worry seems to be a huge amplification in the number of writes serviced > by the disk which may be the cause of rising io wait times. > > My first thought was that this could be due to some leveldb tuning in > 1.2.1 which increases file sizes per the release notes ( > https://github.com/basho/riak/blob/master/RELEASE-NOTES.md). But nodes > that were upgraded yesterday are still showing this symptom. I would have > expected any block re-writing to have subsided by now. > > Next hypothesis has to do with block size overriding in app.config. In > 1.1, we had specified custom block sizes of 256k. We removed this prior to > upgrading to 1.2.1 at the advice of #riak since block size configuration > was ignored prior to 1.2 ('"block_size" parameter within app.config for > leveldb was ignored. This parameter is now properly passed to leveldb.' > --> > https://github.com/basho/riak/commit/f12596c221a9d942cc23d8e4fd83c9ca46e02105). > I'm wondering if the block size parameter really was being passed to > leveldb, and having removed it, blocks are now being rewritten to a new > size, perhaps different from what they were being written as before ( > https://github.com/basho/riak_kv/commit/ad192ee775b2f5a68430d230c0999a2caabd1155 > ) > > Here is the output of the following script showing the increased writes to > disk (https://gist.github.com/37319a8ed2679bb8b21d) > > --an upgraded 1.2.1 node-- > read ios: 238406742 > write ios: 4814320281 > read/write ratio: .04952033 > avg wait: .10712340 > read wait: .49174364 > write wait: .42695475 > > > --a node still running 1.1-- > read ios: 267770032 > write ios: 944170656 > read/write ratio: .28360342 > avg wait: .34237204 > read wait: .47222371 > write wait: 1.83283749 > > And here's what munin is showing us in terms of avg io wait times. > > <image.png> > > > Any thoughts on what might explain this? > > Thanks, > D > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com