Hi,

we’re currently expanding our cluster to grow the number of IOPS we can provide 
to clients. We’re still on Hammer but in the process of upgrading to Jewel. We 
started adding pure-SSD OSDs in the last days (based on MICRON S610DC-3840) and 
the slow requests we’ve seen in the past have started to show a different 
pattern.

I’m currently seeing those: 

2016-12-05 15:13:37.527469 osd.60 172.22.4.46:6818/19894 8080 : cluster [WRN] 5 
slow requests, 1 included below; oldest blocked for > 31.675358 secs
2016-12-05 15:13:37.527478 osd.60 172.22.4.46:6818/19894 8081 : cluster [WRN] 
slow request 31.674886 seconds old, received at 2016-12-05 15:13:05.852525: 
osd_op(client.518589944.0:2734750 rbd_data.1e2b40f879e2a9e3.00000000000000a2 
[stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1892352~4096] 
277.ceaf1c22 ack+ondisk+write+known_if_redirected e1107736) currently waiting 
for rw locks

As slow requests is something that happens a lot to us, I’m willing to invest 
some time to understand this more in-depth. I’d be happy to either write an 
open source tool to help interpreting diagnosing those, or at least write a 
blog post. The documentation and google don't tell much about the way to 
interpret those messages.

So. Two questions:

- any hint (beside from meticuluously reading the source) on interpreting those 
slow request messages in detail?
- specifically the “waiting for rw locks” is something that’s new to us - can 
someone enlighten me that it means given the message above? 

Cheers,
Christian

-- 
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to