Hi Matthew, I believe 60 seconds is the default timeout in the client, so it is possible the `busy_dist_port` issues have caused a timeout and that the automatic retry then has succeeded.
A small +zdbbl value will cause the internal buffers to fill up and result in `busy_dist_port` messages, which will cause performance problems. I would recommend setting +zdbbl to 16384 (16MB) or 32768 (32MB) and verify that you stop seeing busy_dist_post messages in the logs. If problems persist it may be required to set it even higher. It is also important to note that `busy_dist_port` messages can be caused by individual large objects even if +zdbbl is set to a reasonably large value as outlined above. In Riak 1.4.8, logging of large objects has been introduced, which will allow you to identify large objects that could cause problems by going through the logs. You can also track large objects by trending the `node_get_fsm_objsize_100` statistic. Best regards, Christian On Wed, Mar 12, 2014 at 5:43 AM, Matthew MacClary < maccl...@lifetime.oregonstate.edu> wrote: > Hi everyone, we are running Riak 1.4.1 on RHEL 6.2 using bitcask. We are > using protobufs with the Java client, and our binary objects are typically > a few hundred KB in size. I have noticed a persistent anomaly with riak > reads and writes. It seems like often, maybe 0.5% of the time, writing to > Riak takes 60 seconds longer than it should. Here is a prime example I just > trimmed from a log file (see the last time entry below). > > This is not bitcask merge related because it happens before any of the > bitcask slabs are large enough to merge. I am seeing lots of busy_dist_port > messages in the Riak logs. One unique setting is that we have a small zdbbl > setting of 128K because this seemed to prevent congestive collapse of > throughput at high sustained loads. I believe that this 60 second timeout > persisted across the various zdbbl settings we tried. Also we see this > occasional 60 second delay on both VMs and real server hardware. > > Does anyone know where this 60 second delay comes from? > > Thanks! > > -Matt > > > 2014-03-11 20:40:00,747 INFO [Thread-61] sally.ReportHandler - Riak load > time for 88: 0.087 seconds > 2014-03-11 20:40:01,137 INFO [Thread-62] sally.ReportHandler - Riak load > time for 70: 0.185 seconds > 2014-03-11 20:40:01,958 INFO [Thread-63] sally.ReportHandler - Riak load > time for 97: 0.054 seconds > 2014-03-11 20:40:02,566 INFO [Thread-64] sally.ReportHandler - Riak load > time for 90: 0.043 seconds > 2014-03-11 20:40:02,830 INFO [Thread-65] sally.ReportHandler - Riak load > time for 85: 0.051 seconds > 2014-03-11 20:40:04,162 INFO [Thread-66] sally.ReportHandler - Riak load > time for 101: 0.075 seconds > 2014-03-11 20:40:04,503 INFO [Thread-67] sally.ReportHandler - Riak load > time for 103: 0.048 seconds > 2014-03-11 20:40:05,745 INFO [Thread-68] sally.ReportHandler - Riak load > time for 98: 0.031 seconds > 2014-03-11 20:40:06,041 INFO [Thread-69] sally.ReportHandler - Riak load > time for 102: 0.063 seconds > 2014-03-11 20:40:06,444 INFO [Thread-70] sally.ReportHandler - Riak load > time for 92: 0.022 seconds > 2014-03-11 20:40:06,903 INFO [Thread-71] sally.ReportHandler - Riak load > time for 99: 0.039 seconds > 2014-03-11 20:40:09,847 INFO [Thread-72] sally.ReportHandler - Riak load > time for 106: 0.019 seconds > 2014-03-11 20:40:10,107 INFO [Thread-73] sally.ReportHandler - Riak load > time for 108: 0.043 seconds > 2014-03-11 20:40:47,820 INFO [Thread-52] sally.ReportHandler - Riak load > time for 62: 1 minutes, 0.190 seconds > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com