subject:"Re\: \[ceph\-users\] Slow requests"

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-09 Thread Igor Fedotov

Hi Lukasz, if this is filestore then most probably my comments are irrelevant. The issue I expected is BlueStore specific Unfortunately I'm not an expert in filestore hence unable to help in further investigation. Sorry... Thanks, Igor On 7/9/2019 11:39 AM, Luk wrote: We have (stil

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-09 Thread Luk

We have (still) on these OSDs filestore. Regards Lukasz > Hi Igor, > ThankYoufor Your input, will try Your suggestion with > ceph-objectstore-tool. > But for now it looks like main problem is this: > 2019-07-09 09:29:25.410839 7f5e4b64f700 1 heartbeat_map is_healthy > 'OSD::o

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-09 Thread Luk

Hi Igor, ThankYoufor Your input, will try Your suggestion with ceph-objectstore-tool. But for now it looks like main problem is this: 2019-07-09 09:29:25.410839 7f5e4b64f700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f5e20e87700' had timed out after 15 2019-07-09 09:

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-04 Thread Igor Fedotov

Hi Lukasz, I've seen something like that - slow requests and relevant OSD reboots on suicide timeout at least twice with two different clusters. The root cause was slow omap listing for some objects which had started to happen after massive removals from RocksDB. To verify if this is the cas

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Paul Emmerich

On Wed, Jul 3, 2019 at 4:47 PM Luk wrote: > > > this pool is that 'big' : > > [root@ceph-mon-01 ~]# rados df | grep -e index -e WR > POOL_NAME USEDOBJECTS CLONES COPIES > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR > > default.rgw.buckets.index

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Luk

Hi, > Den ons 3 juli 2019 kl 09:01 skrev Luk : > Hello, > I have strange problem with scrubbing. > When scrubbing starts on PG which belong to default.rgw.buckets.index > pool, I can see that this OSD is very busy (see attachment), and starts > showing many > slow request, after the

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Janne Johansson

Den ons 3 juli 2019 kl 09:01 skrev Luk : > Hello, > > I have strange problem with scrubbing. > > When scrubbing starts on PG which belong to default.rgw.buckets.index > pool, I can see that this OSD is very busy (see attachment), and starts > showing many > slow request, after the scrubbin

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-11 Thread BASSAGET Cédric

Hello Robert, I did not make any changes, so I'm still using the prio queue. Regards Le lun. 10 juin 2019 à 17:44, Robert LeBlanc a écrit : > I'm glad it's working, to be clear did you use wpq, or is it still the > prio queue? > > Sent from a mobile device, please excuse any typos. > > On Mon, J

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread Robert LeBlanc

I'm glad it's working, to be clear did you use wpq, or is it still the prio queue? Sent from a mobile device, please excuse any typos. On Mon, Jun 10, 2019, 4:45 AM BASSAGET Cédric wrote: > an update from 12.2.9 to 12.2.12 seems to have fixed the problem ! > > Le lun. 10 juin 2019 à 12:25, BASS

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread BASSAGET Cédric

an update from 12.2.9 to 12.2.12 seems to have fixed the problem ! Le lun. 10 juin 2019 à 12:25, BASSAGET Cédric a écrit : > Hi Robert, > Before doing anything on my prod env, I generate r/w on ceph cluster using > fio . > On my newest cluster, release 12.2.12, I did not manage to get > the (REQ

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread BASSAGET Cédric

Hi Robert, Before doing anything on my prod env, I generate r/w on ceph cluster using fio . On my newest cluster, release 12.2.12, I did not manage to get the (REQUEST_SLOW) warning, even if my OSD disk usage goes above 95% (fio ran from 4 diffrent hosts) On my prod cluster, release 12.2.9, as soo

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread Robert LeBlanc

On Mon, Jun 10, 2019 at 1:00 AM BASSAGET Cédric < cedric.bassaget...@gmail.com> wrote: > Hello Robert, > My disks did not reach 100% on the last warning, they climb to 70-80% > usage. But I see rrqm / wrqm counters increasing... > > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread BASSAGET Cédric

Hello Robert, My disks did not reach 100% on the last warning, they climb to 70-80% usage. But I see rrqm / wrqm counters increasing... Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 4.000.00

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-07 Thread Robert LeBlanc

With the low number of OSDs, you are probably satuarting the disks. Check with `iostat -xd 2` and see what the utilization of your disks are. A lot of SSDs don't perform well with Ceph's heavy sync writes and performance is terrible. If some of your drives are 100% while others are lower utilizati

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Charles Alva

ate: 395 slow requests are blocked > > 32 sec. Implicated osds 51 (REQUEST_SLOW) > >> > 2019-05-20 00:04:19.234877 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 > 173641 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 238 slow > requests are blocked > 32 sec. Impli

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman

-s43 mon.0 10.23.27.153:6789/0 >> > 173640 : cluster [WRN] Health check update: 395 slow requests are blocked >> > > 32 sec. Implicated osds 51 (REQUEST_SLOW) >> > 2019-05-20 00:04:19.234877 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 >> > 173641 : cluster [INF

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Charles Alva

/0 > 174035 : cluster [INF] overall HEALTH_OK > > > > The parameters of our environment: > > > > Storage System (OSDs and MONs) > > > > Ceph 12.2.11 > > Ubuntu 16.04/1804 > > 30 * 8GB spinners distributed over > > > > Client > > > &g

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman

On Tue, May 21, 2019 at 11:28 AM Marc Schöchlin wrote: > > Hello Jason, > > Am 20.05.19 um 23:49 schrieb Jason Dillaman: > > On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote: > > Hello cephers, > > we have a few systems which utilize a rbd-bd map/mount to get access to a rbd > volume. > (Thi

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Marc Schöchlin

Hello Jason, Am 20.05.19 um 23:49 schrieb Jason Dillaman: > On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote: >> Hello cephers, >> >> we have a few systems which utilize a rbd-bd map/mount to get access to a >> rbd volume. >> (This problem seems to be related to "[ceph-users] Slow requests f

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-20 Thread Jason Dillaman

> > [client] > rbd cache = true > rbd cache size = 536870912 > rbd cache max dirty = 268435456 > rbd cache target dirty = 134217728 > rbd cache max dirty age = 30 > rbd readahead max bytes = 4194304 > > > Regards > Marc > > Am 13.05.19 um 07:40 schrieb EDH -

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-20 Thread Marc Schöchlin

el Rios Fernandez: > Hi Marc, > > Try to compact OSD with slow request > > ceph tell osd.[ID] compact > > This will make the OSD offline for some seconds(SSD) to minutes(HDD) and > perform a compact of OMAP database. > > Regards, > > > > > -----Mens

Re: [ceph-users] Slow requests from bluestore osds

2019-05-14 Thread Stefan Kooman

Quoting Marc Schöchlin (m...@256bit.org): > Out new setup is now: > (12.2.10 on Ubuntu 16.04) > > [osd] > osd deep scrub interval = 2592000 > osd scrub begin hour = 19 > osd scrub end hour = 6 > osd scrub load threshold = 6 > osd scrub sleep = 0.3 > osd snap trim sleep = 0.4 > pg max concurrent s

Re: [ceph-users] Slow requests from bluestore osds

2019-05-12 Thread Marc Schöchlin

ome seconds(SSD) to minutes(HDD) and > perform a compact of OMAP database. > > Regards, > > > > > -Mensaje original- > De: ceph-users En nombre de Marc Schöchlin > Enviado el: lunes, 13 de mayo de 2019 6:59 > Para: ceph-users@lists.ceph.com > Asunto: Re: [ceph-

Re: [ceph-users] Slow requests from bluestore osds

2019-05-12 Thread EDH - Manuel Rios Fernandez

mayo de 2019 6:59 Para: ceph-users@lists.ceph.com Asunto: Re: [ceph-users] Slow requests from bluestore osds Hello cephers, one week ago we replaced the bluestore cache size by "osd memory target" and removed the detail memory settings. This storage class now runs 42*8GB spinners with a

Re: [ceph-users] Slow requests from bluestore osds

2019-05-12 Thread Marc Schöchlin

Hello cephers, one week ago we replaced the bluestore cache size by "osd memory target" and removed the detail memory settings. This storage class now runs 42*8GB spinners with a permanent write workload of 2000-3000 write IOPS, and 1200-8000 read IOPS. Out new setup is now: (12.2.10 on Ubuntu

Re: [ceph-users] Slow requests from bluestore osds

2019-01-28 Thread Marc Schöchlin

Hello cephers, as described - we also have the slow requests in our setup. We recently updated from ceph 12.2.4 to 12.2.10, updated Ubuntu 16.04 to the latest patchlevel (with kernel 4.15.0-43) and applied dell firmware 2.8.0. On 12.2.5 (before updating the cluster) we had in a frequency of 10m

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-19 Thread Mykola Golub

On Fri, Jan 18, 2019 at 11:06:54AM -0600, Mark Nelson wrote: > IE even though you guys set bluestore_cache_size to 1GB, it is being > overridden by bluestore_cache_size_ssd. Isn't it vice versa [1]? [1] https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlueStore.cc#L3976 -- Mykola G

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-18 Thread Mark Nelson

Am 15.01.19 um 12:45 schrieb Marc Roos: I upgraded this weekend from 12.2.8 to 12.2.10 without such issues (osd's are idle) it turns out this was a kernel bug. Updating to a newer kernel - has solved this issue. Greets, Stefan -Original Message- From: Stefan Priebe - Profih

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-18 Thread Nils Fahldieck - Profihost AG

;>>> 21,1318474'61584855] local-lis/les=1318472/1318473 n=1912 >>>>>> ec=133405/133405 lis/c 1318472/1278145 les/c/f >>>>>> 1318473/1278148/1211861 131 >>>>>> 8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 >>>>>> pi

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-17 Thread Mark Nelson

s only in the recovery case. Greets, Stefan Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG: Am 15.01.19 um 12:45 schrieb Marc Roos: I upgraded this weekend from 12.2.8 to 12.2.10 without such issues (osd's are idle) it turns out this was a kernel bug. Updating to a newer kernel -

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-17 Thread Stefan Priebe - Profihost AG

aded m=183 snaptrimq=[ec1a0~1,ec808~1] >>>> mbc={255={(2+0)=183,(3+0)=3}}] _update_calc_stats ml 183 upset size 3 up 2 >>>> >>>> Greets, >>>> Stefan >>>> Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG: >>>>> Hi, >&

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-17 Thread Stefan Priebe - Profihost AG

;>> Greets, >>> Stefan >>> Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG: >>>> Hi, >>>> >>>> no ok it was not. Bug still present. It was only working because the >>>> osdmap was so far away that it has started backf

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-17 Thread Mark Nelson

I upgraded this weekend from 12.2.8 to 12.2.10 without such issues (osd's are idle) it turns out this was a kernel bug. Updating to a newer kernel - has solved this issue. Greets, Stefan -Original Message----- From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] Sent:

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Stefan Priebe - Profihost AG

> Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG: >>>> >>>> Am 15.01.19 um 12:45 schrieb Marc Roos: >>>>> >>>>> I upgraded this weekend from 12.2.8 to 12.2.10 without such issues >>>>> (osd's are

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Mark Nelson

such issues (osd's are idle) it turns out this was a kernel bug. Updating to a newer kernel - has solved this issue. Greets, Stefan -Original Message----- From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] Sent: 15 January 2019 10:26 To: ceph-users@lists.ceph.com Cc: n.

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Stefan Priebe - Profihost AG

the recovery case. >> >> Greets, >> Stefan >> >> Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG: >>> >>> Am 15.01.19 um 12:45 schrieb Marc Roos: >>>> >>>> I upgraded this weekend from 12.2.8 to 12.2.10 without su

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Stefan Priebe - Profihost AG

t;> it turns out this was a kernel bug. Updating to a newer kernel - has >> solved this issue. >> >> Greets, >> Stefan >> >> >>> -Original Message- >>> From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] >>> Sen

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Stefan Priebe - Profihost AG

-Original Message- >> From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] >> Sent: 15 January 2019 10:26 >> To: ceph-users@lists.ceph.com >> Cc: n.fahldi...@profihost.ag >> Subject: Re: [ceph-users] slow requests and high i/o / read rate on >

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-15 Thread Mark Nelson

ofihost.ag] Sent: 15 January 2019 10:26 To: ceph-users@lists.ceph.com Cc: n.fahldi...@profihost.ag Subject: Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10 Hello list, i also tested current upstream/luminous branch and it happens as well. A

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-15 Thread Stefan Priebe - Profihost AG

From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] > Sent: 15 January 2019 10:26 > To: ceph-users@lists.ceph.com > Cc: n.fahldi...@profihost.ag > Subject: Re: [ceph-users] slow requests and high i/o / read rate on > bluestore osds after upgrade 12.2.8 -> 12.2.10 >

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-15 Thread Marc Roos

I upgraded this weekend from 12.2.8 to 12.2.10 without such issues (osd's are idle) -Original Message- From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] Sent: 15 January 2019 10:26 To: ceph-users@lists.ceph.com Cc: n.fahldi...@profihost.ag Subject: Re: [ceph-

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-15 Thread Stefan Priebe - Profihost AG

Hello list, i also tested current upstream/luminous branch and it happens as well. A clean install works fine. It only happens on upgraded bluestore osds. Greets, Stefan Am 14.01.19 um 20:35 schrieb Stefan Priebe - Profihost AG: > while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm expe

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Mark Nelson

Hi Stefan, Any idea if the reads are constant or bursty? One cause of heavy reads is when rocksdb is compacting and has to read SST files from disk. It's also possible you could see heavy read traffic during writes if data has to be read from SST files rather than cache. It's possible this

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Stefan Priebe - Profihost AG

Hi Paul, Am 14.01.19 um 21:39 schrieb Paul Emmerich: > What's the output of "ceph daemon osd. status" on one of the OSDs > while it's starting? { "cluster_fsid": "b338193d-39e0-40e9-baba-4965ef3868a3", "osd_fsid": "d95d0e3b-7441-4ab0-869c-fe0551d3bd52", "whoami": 2, "state": "act

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Paul Emmerich

What's the output of "ceph daemon osd. status" on one of the OSDs while it's starting? Is the OSD crashing and being restarted all the time? Anything weird in the log files? Was there recovery or backfill during the upgrade? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contac

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Jaime Ibar

Hi all, after increasing mon_max_pg_per_osd number ceph starts rebalancing as usual. However, the slow requests warnings are still there, even after setting primary-affinity to 0 beforehand. By the other hand, if I destroy the osd, ceph will start rebalancing unless noout flag is set, am I ri

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Paul Emmerich

You can prevent creation of the PGs on the old filestore OSDs (which seems to be the culprit here) during replacement by replacing the disks the hard way: * ceph osd destroy osd.X * re-create with bluestore under the same id (ceph volume ... --osd-id X) it will then just backfill onto the same di

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Eugen Block

Hi, to reduce impact on clients during migration I would set the OSD's primary-affinity to 0 beforehand. This should prevent the slow requests, at least this setting has helped us a lot with problematic OSDs. Regards Eugen Zitat von Jaime Ibar : Hi all, we recently upgrade from Jewel

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Darius Kasparavičius

Hello, 2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update: 249 PGs pending on creation (PENDING_CREATING_PGS) This error might indicate that you are hitting a PG limit per osd. Here some information on it https://ceph.com/community/new-luminous-pg-overdose-protection/ . You migh

Re: [ceph-users] Slow requests from bluestore osds

2018-09-18 Thread Augusto Rodrigues

I solved my slow requests by increasing the size of block.db. Calculate 4% per stored TB and preferably host the DB in NVME. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow requests from bluestore osds

2018-09-06 Thread Marc Schöchlin

Hello Uwe, as described in my mail we are running 4.13.0-39. In conjunction with some later mails of this thread it seems that this problem might related to os/microcode (spectre) updates. I am planning a ceph/ubuntu upgrade in the next week because of various reasons, let's see what happens...

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Tim Bishop

On Sat, Sep 01, 2018 at 12:45:06PM -0400, Brett Chancellor wrote: > Hi Cephers, > I am in the process of upgrading a cluster from Filestore to bluestore, > but I'm concerned about frequent warnings popping up against the new > bluestore devices. I'm frequently seeing messages like this, although

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Brett Chancellor

Mine is currently at 1000 due to the high number of pgs we had coming from Jewel. I do find it odd that only the bluestore OSDs have this issue. Filestore OSDs seem to be unaffected. On Wed, Sep 5, 2018, 3:43 PM Samuel Taylor Liston wrote: > Just a thought - have you looked at increasing your "—

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Samuel Taylor Liston

Just a thought - have you looked at increasing your "—mon_max_pg_per_osd” both on the mons and osds? I was having a similar issue while trying to add more OSDs to my cluster (12.2.27, CentOS7.5, 3.10.0-862.9.1.el7.x86_64). I increased mine to 300 temporarily while adding OSDs and stopped havi

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Daniel Pryor

I've experienced the same thing during scrubbing and/or any kind of expansion activity. *Daniel Pryor* On Mon, Sep 3, 2018 at 2:13 AM Marc Schöchlin wrote: > Hi, > > we are also experiencing this type of behavior for some weeks on our not > so performance critical hdd pools. > We haven't spent

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Brett Chancellor

I'm running Centos 7.5. If I turn off spectre/meltdown protection then a security sweep will disconnect it from the network. -Brett On Wed, Sep 5, 2018 at 2:24 PM, Uwe Sauter wrote: > I'm also experiencing slow requests though I cannot point it to scrubbing. > > Which kernel do you run? Would y

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Uwe Sauter

I'm also experiencing slow requests though I cannot point it to scrubbing. Which kernel do you run? Would you be able to test against the same kernel with Spectre/Meltdown mitigations disabled ("noibrs noibpb nopti nospectre_v2" as boot option)? Uwe Am 05.09.18 um 19:30 schrieb Brett

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Brett Chancellor

Marc, As with you, this problem manifests itself only when the bluestore OSD is involved in some form of deep scrub. Anybody have any insight on what might be causing this? -Brett On Mon, Sep 3, 2018 at 4:13 AM, Marc Schöchlin wrote: > Hi, > > we are also experiencing this type of behavior f

Re: [ceph-users] Slow requests from bluestore osds

2018-09-02 Thread Brett Chancellor

The warnings look like this. 6 ops are blocked > 32.768 sec on osd.219 1 osds have slow requests On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza wrote: > On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor > wrote: > > Hi Cephers, > > I am in the process of upgrading a cluster from Filestore to blue

Re: [ceph-users] Slow requests from bluestore osds

2018-09-02 Thread Alfredo Deza

On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor wrote: > Hi Cephers, > I am in the process of upgrading a cluster from Filestore to bluestore, > but I'm concerned about frequent warnings popping up against the new > bluestore devices. I'm frequently seeing messages like this, although the > sp

Re: [ceph-users] Slow requests during OSD maintenance

2018-07-17 Thread Konstantin Shalygin

2. What is the best way to remove an OSD node from the cluster during maintenance? ceph osd set noout is not the way to go, since no OSD's are out during yum update and the node is still part of the cluster and will handle I/O. I think the best way is the combination of "ceph osd set noout" + stop

Re: [ceph-users] Slow requests

2018-07-09 Thread Brad Hubbard

On Mon, Jul 9, 2018 at 5:28 PM, Benjamin Naber wrote: > Hi @all, > > Problem seems to be solved, afther downgrading from Kernel 4.17.2 to > 3.10.0-862. > Anyone other have issues with newer Kernels and osd nodes? I'd suggest you pursue that with whoever supports the kernel exhibiting the problem

Re: [ceph-users] Slow requests

2018-07-04 Thread Brad Hubbard

On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber wrote: > Hi @all, > > im currently in testing for setup an production environment based on the > following OSD Nodes: > > CEPH Version: luminous 12.2.5 > > 5x OSD Nodes with following specs: > > - 8 Core Intel Xeon 2,0 GHZ > > - 96GB Ram > > - 10x 1,

Re: [ceph-users] Slow requests

2018-07-04 Thread Benjamin Naber

hi Caspar, ty for the reply. ive updatet all SSDs to actual firmware. Still having the same error. the strange thing is that this issue switches from node to node and from osd to osd. HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 slow requests are blocked > 32 sec 1 ops ar

Re: [ceph-users] Slow requests

2018-07-04 Thread Caspar Smit

Hi Ben, At first glance i would say the CPU's are a bit weak for this setup. Recommended is to have at least 1 core per OSD. Since you have 8 cores and 10 OSD's there isn't much left for other processes. Furthermore, did you upgrade the firmware of those DC S4500's to the latest firmware? (SCV101

Re: [ceph-users] slow requests are blocked

2018-05-16 Thread Paul Emmerich

By looking at the operations that are slow in your dump_*_ops command. We've found that it's best to move all the metadata stuff for RGW onto SSDs, i.e., all pools except the actual data pool. But that depends on your use case and whether the slow requests you are seeing is actually a problem for

Re: [ceph-users] slow requests are blocked

2018-05-16 Thread Grigory Murashov

Hello Paul! Thanks for your answer. How did you understand it's RGW Metadata stuff? No, I don't use any SSDs. Where I can find out more about Metadata pools, using SSD etc?.. Thanks. Grigory Murashov Voximplant 15.05.2018 23:42, Paul Emmerich пишет: Looks like it's mostly RGW metadata stuf

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread David Turner

I've been happening into slow requests with my rgw metadata pools just this week. I tracked it down because the slow requests were on my nmve osds. I haven't solved the issue yet, but I can confirm that no resharding was taking place and that the auto-resharder is working as all of my larger bucket

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread Paul Emmerich

Looks like it's mostly RGW metadata stuff; are you running your non-data RGW pools on SSDs (you should, that can help *a lot*)? Paul 2018-05-15 18:49 GMT+02:00 Grigory Murashov : > Hello guys! > > I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph > daemon osd.16 dump_historic

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread LOPEZ Jean-Charles

Hi Grigory, looks like osd.16 is having a hard time acknowledging the write request (for bucket resharding operations from what it looks like) as it takes about 15 seconds for osd.16 to receive the commit confirmation from osd.21 on subop communication. Have a go and check at the journal devic

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread Grigory Murashov

Hello guys! I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph daemon osd.16 dump_historic_ops. Here is the output of ceph heath details in the moment of problem HEALTH_WARN 20 slow requests are blocked > 32 sec REQUEST_SLOW 20 slow requests are blocked > 32 sec 20 ops a

Re: [ceph-users] slow requests are blocked

2018-05-14 Thread Grigory Murashov

Hello David! 2. I set it up 10/10 3. Thanks, my problem was I did it on host where was no osd.15 daemon. Could you please help to read osd logs? Here is a part from ceph.log 2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553895 : cluster [INF] Cluster is now healt

Re: [ceph-users] slow requests are blocked

2018-05-10 Thread David Turner

2. When logging the 1/5 is what's written to the log file/what's temporarily stored in memory. If you want to increase logging, you need to increase both numbers to 20/20 or 10/10. You can also just set it to 20 or 10 and ceph will set them to the same number. I personally do both numbers to rem

Re: [ceph-users] slow requests are blocked

2018-05-10 Thread Grigory Murashov

Hi JC! Thanks for your answer first. 1. I have added output of ceph health detail to Zabbix in case of warning. So every time I will see with which OSD the problem is. 2. I have default level of all logs. As I see here http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/ d

Re: [ceph-users] slow requests are blocked

2018-05-08 Thread Jean-Charles Lopez

Hi Grigory, are these lines the only lines in your log file for OSD 15? Just for sanity, what are the log levels you have set, if any, in your config file away from the default? If you set all log levels to 0 like some people do you may want to simply go back to the default by commenting out th

Re: [ceph-users] slow requests are blocked

2018-05-08 Thread Grigory Murashov

Hello Jean-Charles! I have finally catch the problem, It was at 13-02. [cephuser@storage-ru1-osd3 ~]$ ceph health detail HEALTH_WARN 18 slow requests are blocked > 32 sec REQUEST_SLOW 18 slow requests are blocked > 32 sec 3 ops are blocked > 65.536 sec 15 ops are blocked > 32.768 sec

Re: [ceph-users] slow requests are blocked

2018-05-07 Thread Jean-Charles Lopez

Hi, ceph health detail This will tell you which OSDs are experiencing the problem so you can then go and inspect the logs and use the admin socket to find out which requests are at the source. Regards JC > On May 7, 2018, at 03:52, Grigory Murashov wrote: > > Hello! > > I'm not much experi

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-11 Thread Alex Gorbachev

On Mon, Mar 5, 2018 at 11:20 PM, Brad Hubbard wrote: > On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev > wrote: >> On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: >>> Blocked requests and slow requests are synonyms in ceph. They are 2 names >>> for the exact same thing. >>> >>> >>> On Thu,

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-05 Thread Brad Hubbard

On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev wrote: > On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: >> Blocked requests and slow requests are synonyms in ceph. They are 2 names >> for the exact same thing. >> >> >> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev >> wrote: >>> >>> On Thu,

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-03 Thread Alex Gorbachev

On Fri, Mar 2, 2018 at 9:56 AM, Alex Gorbachev wrote: > > On Fri, Mar 2, 2018 at 4:17 AM Maged Mokhtar wrote: >> >> On 2018-03-02 07:54, Alex Gorbachev wrote: >> >> On Thu, Mar 1, 2018 at 10:57 PM, David Turner >> wrote: >> >> Blocked requests and slow requests are synonyms in ceph. They are 2 n

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-02 Thread Alex Gorbachev

On Fri, Mar 2, 2018 at 4:17 AM Maged Mokhtar wrote: > On 2018-03-02 07:54, Alex Gorbachev wrote: > > On Thu, Mar 1, 2018 at 10:57 PM, David Turner > wrote: > > Blocked requests and slow requests are synonyms in ceph. They are 2 names > for the exact same thing. > > > On Thu, Mar 1, 2018, 10:21 P

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-02 Thread Maged Mokhtar

On 2018-03-02 07:54, Alex Gorbachev wrote: > On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: > Blocked requests and slow requests are synonyms in ceph. They are 2 names > for the exact same thing. > > On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev > wrote: > On Thu, Mar 1, 2018 at 2:47 PM

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread Alex Gorbachev

On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: > Blocked requests and slow requests are synonyms in ceph. They are 2 names > for the exact same thing. > > > On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev wrote: >> >> On Thu, Mar 1, 2018 at 2:47 PM, David Turner >> wrote: >> > `ceph health de

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread David Turner

Blocked requests and slow requests are synonyms in ceph. They are 2 names for the exact same thing. On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev wrote: > On Thu, Mar 1, 2018 at 2:47 PM, David Turner > wrote: > > `ceph health detail` should show you more information about the slow > > requests.

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread Alex Gorbachev

On Thu, Mar 1, 2018 at 2:47 PM, David Turner wrote: > `ceph health detail` should show you more information about the slow > requests. If the output is too much stuff, you can grep out for blocked or > something. It should tell you which OSDs are involved, how long they've > been slow, etc. The

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread David Turner

`ceph health detail` should show you more information about the slow requests. If the output is too much stuff, you can grep out for blocked or something. It should tell you which OSDs are involved, how long they've been slow, etc. The default is for them to show '> 32 sec' but that may very wel

Re: [ceph-users] slow requests on a specific osd

2018-01-15 Thread lists

Hi Wes, On 15-1-2018 20:57, Wes Dillingham wrote: My understanding is that the exact same objects would move back to the OSD if weight went 1 -> 0 -> 1 given the same Cluster state and same object names, CRUSH is deterministic so that would be the almost certain result. Ok, thanks! So this

Re: [ceph-users] slow requests on a specific osd

2018-01-15 Thread Wes Dillingham

My understanding is that the exact same objects would move back to the OSD if weight went 1 -> 0 -> 1 given the same Cluster state and same object names, CRUSH is deterministic so that would be the almost certain result. On Mon, Jan 15, 2018 at 2:46 PM, lists wrote: > Hi Wes, > > On 15-1-2018 20

Re: [ceph-users] slow requests on a specific osd

2018-01-15 Thread lists

Hi Wes, On 15-1-2018 20:32, Wes Dillingham wrote: I dont hear a lot of people discuss using xfs_fsr on OSDs and going over the mailing list history it seems to have been brought up very infrequently and never as a suggestion for regular maintenance. Perhaps its not needed. True, it's just some

Re: [ceph-users] slow requests on a specific osd

2018-01-15 Thread Wes Dillingham

I dont hear a lot of people discuss using xfs_fsr on OSDs and going over the mailing list history it seems to have been brought up very infrequently and never as a suggestion for regular maintenance. Perhaps its not needed. One thing to consider trying, and to rule out something funky with the XFS

Re: [ceph-users] Slow requests in cache tier with rep_size 2

2017-11-01 Thread David Turner

Here's some good reading for you. https://www.spinics.net/lists/ceph-users/msg32895.html I really like how Wido puts it, "Loosing two disks at the same time is something which doesn't happen that much, but if it happens you don't want to modify any data on the only copy which you still have left.

Re: [ceph-users] Slow requests in cache tier with rep_size 2

2017-11-01 Thread Mazzystr

I disagree. We have the following setting... osd pool default size = 3 osd pool default min size = 1 There's maths that need to be conducted for 'osd pool default size'. A setting of 3 and 1 allows for 2 disks to fail ... at the same time ... without a loss of data. This is standard storage

Re: [ceph-users] Slow requests in cache tier with rep_size 2

2017-11-01 Thread Eugen Block

Hi David, What is your min_size in the cache pool? If your min_size is 2, then the cluster would block requests to that pool due to it having too few copies available. this is a little embarassing, but it seems it was the min_size indeed. I had changed this setting a couple of weeks ago, bu

Re: [ceph-users] Slow requests in cache tier with rep_size 2

2017-11-01 Thread David Turner

PPS - or min_size 1 in production On Wed, Nov 1, 2017 at 10:08 AM David Turner wrote: > What is your min_size in the cache pool? If your min_size is 2, then the > cluster would block requests to that pool due to it having too few copies > available. > > PS - Please don't consider using rep_size

Re: [ceph-users] Slow requests in cache tier with rep_size 2

2017-11-01 Thread David Turner

What is your min_size in the cache pool? If your min_size is 2, then the cluster would block requests to that pool due to it having too few copies available. PS - Please don't consider using rep_size 2 in production. On Wed, Nov 1, 2017 at 5:14 AM Eugen Block wrote: > Hi experts, > > we have u

Re: [ceph-users] Slow requests

2017-10-20 Thread Brad Hubbard

On Fri, Oct 20, 2017 at 8:23 PM, Ольга Ухина wrote: > I was able to collect dump data during slow request, but this time I saw > that it was related to high load average and iowait so I keep watching. > And it was on particular two osds, but yesterday on other osds. > I see in dump of these two

Re: [ceph-users] Slow requests

2017-10-20 Thread Ольга Ухина

I was able to collect dump data during slow request, but this time I saw that it was related to high load average and iowait so I keep watching. And it was on particular two osds, but yesterday on other osds. I see in dump of these two osds that operations are stuck on queued_for_pg, for example:

Re: [ceph-users] Slow requests

2017-10-20 Thread Ольга Ухина

Hi! Thanks for your help. How can I increase interval of history for command ceph daemon osd. dump_historic_ops? It shows only for several minutes. I see slow requests on random osds each time and on different hosts (there are three). As I see in logs the problem doesn't relate to scrubbing. Regar

Re: [ceph-users] Slow requests

2017-10-19 Thread Brad Hubbard

On Fri, Oct 20, 2017 at 1:09 PM, J David wrote: > On Thu, Oct 19, 2017 at 9:42 PM, Brad Hubbard wrote: >> I guess you have both read and followed >> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests >> >> What was the result?

Re: [ceph-users] Slow requests

2017-10-19 Thread J David

On Thu, Oct 19, 2017 at 9:42 PM, Brad Hubbard wrote: > I guess you have both read and followed > http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests > > What was the result? Not sure if you’re asking Ольга or myself, but in my cas

1 2 >

1 - 100 of 169 matches

Mail list logo