Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-04-01 Thread mart.v
" " Thanks for this advice. It helped me to identify a subset of devices (only 3 of the whole cluster) where was this problem happening. The SAS adapter (LSI SAS 3008) on my Supermicro board was the issue. There is a RAID mode enabled by default. I have flashed the latest firmware (v

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-03-11 Thread mart.v
From: ceph-users on behalf of Paul Emmerich Sent: Friday, February 22, 2019 9:04 AM To: Massimo Sgaravatto Cc: Ceph Users Subject: Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time   Bad SSDs can also cause this. Which SSD are you using? Paul -- Paul Emmerich Lookin

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-28 Thread Matthew H
ebruary 22, 2019 9:04 AM To: Massimo Sgaravatto Cc: Ceph Users Subject: Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time Bad SSDs can also cause this. Which SSD are you using? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Fr

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-26 Thread Massimo Sgaravatto
On Mon, Feb 25, 2019 at 9:26 PM mart.v wrote: > - As far as I understand the reported 'implicated osds' are only the > primary ones. In the log of the osds you should find also the relevant pg > number, and with this information you can get all the involved OSDs. This > might be useful e.g. to se

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-25 Thread mart.v
" - As far as I understand the reported 'implicated osds' are only the primary ones. In the log of the osds you should find also the relevant pg number, and with this information you can get all the involved OSDs. This might be useful e.g. to see if a specific OSD node is always involved. This w

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-25 Thread mart.v
times are different each day so it is not a periodic task. Martin -- Původní e-mail -- Od: David Turner Komu: mart.v Datum: 22. 2. 2019 12:23:37 Předmět: Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time " Can you correlate the times to scheduled tasks inside o

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-25 Thread mart.v
I'm using combination of Intel S4510 and Micron 5200 MAX. Slow requests are happening on both of the brands. Martin -- Původní e-mail -- Od: Paul Emmerich Komu: Massimo Sgaravatto Datum: 22. 2. 2019 15:08:29 Předmět: Re: [ceph-users] REQUEST_SLOW across many OSDs a

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-22 Thread Paul Emmerich
Bad SSDs can also cause this. Which SSD are you using? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, Feb 22, 2019 at 2:53 PM Massimo Sgaravatto wrote: > > A

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-22 Thread Massimo Sgaravatto
A couple of hints to debug the issue (since I had to recently debug a problem with the same symptoms): - As far as I understand the reported 'implicated osds' are only the primary ones. In the log of the osds you should find also the relevant pg number, and with this information you can get all th

Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-22 Thread David Turner
Can you correlate the times to scheduled tasks inside of any VMs? For instance if you have several Linux VMs with the updatedb command installed that by default they will all be scanning their disks at the same time each day to see where files are. Other common culprits could be scheduled backups,

[ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-02-22 Thread mart.v
Hello everyone, I'm experiencing a strange behaviour. My cluster is relatively small (43 OSDs, 11 nodes), running Ceph 12.2.10 (and Proxmox 5). Nodes are connected via 10 Gbit network (Nexus 6000). Cluster is mixed (SSD and HDD), but with different pools. Descibed error is only on the SSD par