Re: [ceph-users] osd_op_tp timeouts

2017-06-13 Thread Eric Choi
I realized I sent this under wrong thread: here I am sending it again: --- Hello all, I work in the same team as Tyler here, and I can provide more info here.. The cluster is indeed an RGW cluster, with many small (100 KB) objects similar to your use case Bryan. But we have the blind bucket se

Re: [ceph-users] osd_op_tp timeouts

2017-06-13 Thread Bryan Stillwell
users on behalf of Tyler Bischel Date: Monday, June 12, 2017 at 5:12 PM To: "ceph-us...@ceph.com" Subject: [ceph-users] osd_op_tp timeouts Hi, We've been having this ongoing problem with threads timing out on the OSDs. Typically we'll see the OSD become unresponsive for a

Re: [ceph-users] osd_op_tp timeouts

2017-06-13 Thread Mark Nelson
Hi Tyler, I wanted to make sure you got a reply to this, but unfortunately I don't have much to give you. It sounds like you already took a look at the disk metrics and ceph is probably not waiting on disk IO based on your description. If you can easily invoke the problem, you could attach g

[ceph-users] osd_op_tp timeouts

2017-06-12 Thread Tyler Bischel
Hi, We've been having this ongoing problem with threads timing out on the OSDs. Typically we'll see the OSD become unresponsive for about a minute, as threads from other OSDs time out. The timeouts don't seem to be correlated to high load. We turned up the logs to 10/10 for part of a day to ca