subject:"\[ceph\-users\] slow requests"

[ceph-users] slow requests after rocksdb delete wal or table_file_deletion

2019-09-25 Thread lin zhou

hi, cephers recenty, I am testing ceph 12.2.12 with bluestore using cosbench. both SATA osd and ssd osd has slow request. many slow request occur, and most slow logs after rocksdb delete wal or table_file_deletion logs does it means the bottleneck of Rocksdb? if so how to improve. if not how to fi

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-09 Thread Igor Fedotov

Hi Lukasz, if this is filestore then most probably my comments are irrelevant. The issue I expected is BlueStore specific Unfortunately I'm not an expert in filestore hence unable to help in further investigation. Sorry... Thanks, Igor On 7/9/2019 11:39 AM, Luk wrote: We have (stil

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-09 Thread Luk

We have (still) on these OSDs filestore. Regards Lukasz > Hi Igor, > ThankYoufor Your input, will try Your suggestion with > ceph-objectstore-tool. > But for now it looks like main problem is this: > 2019-07-09 09:29:25.410839 7f5e4b64f700 1 heartbeat_map is_healthy > 'OSD::o

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-09 Thread Luk

Hi Igor, ThankYoufor Your input, will try Your suggestion with ceph-objectstore-tool. But for now it looks like main problem is this: 2019-07-09 09:29:25.410839 7f5e4b64f700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f5e20e87700' had timed out after 15 2019-07-09 09:

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-04 Thread Igor Fedotov

Hi Lukasz, I've seen something like that - slow requests and relevant OSD reboots on suicide timeout at least twice with two different clusters. The root cause was slow omap listing for some objects which had started to happen after massive removals from RocksDB. To verify if this is the cas

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Paul Emmerich

On Wed, Jul 3, 2019 at 4:47 PM Luk wrote: > > > this pool is that 'big' : > > [root@ceph-mon-01 ~]# rados df | grep -e index -e WR > POOL_NAME USEDOBJECTS CLONES COPIES > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR > > default.rgw.buckets.index

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Luk

Hi, > Den ons 3 juli 2019 kl 09:01 skrev Luk : > Hello, > I have strange problem with scrubbing. > When scrubbing starts on PG which belong to default.rgw.buckets.index > pool, I can see that this OSD is very busy (see attachment), and starts > showing many > slow request, after the

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Janne Johansson

Den ons 3 juli 2019 kl 09:01 skrev Luk : > Hello, > > I have strange problem with scrubbing. > > When scrubbing starts on PG which belong to default.rgw.buckets.index > pool, I can see that this OSD is very busy (see attachment), and starts > showing many > slow request, after the scrubbin

[ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Luk

Hello, I have strange problem with scrubbing. When scrubbing starts on PG which belong to default.rgw.buckets.index pool, I can see that this OSD is very busy (see attachment), and starts showing many slow request, after the scrubbing of this PG stops, slow requests stops immediately.

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-11 Thread BASSAGET Cédric

Hello Robert, I did not make any changes, so I'm still using the prio queue. Regards Le lun. 10 juin 2019 à 17:44, Robert LeBlanc a écrit : > I'm glad it's working, to be clear did you use wpq, or is it still the > prio queue? > > Sent from a mobile device, please excuse any typos. > > On Mon, J

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread Robert LeBlanc

I'm glad it's working, to be clear did you use wpq, or is it still the prio queue? Sent from a mobile device, please excuse any typos. On Mon, Jun 10, 2019, 4:45 AM BASSAGET Cédric wrote: > an update from 12.2.9 to 12.2.12 seems to have fixed the problem ! > > Le lun. 10 juin 2019 à 12:25, BASS

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread BASSAGET Cédric

an update from 12.2.9 to 12.2.12 seems to have fixed the problem ! Le lun. 10 juin 2019 à 12:25, BASSAGET Cédric a écrit : > Hi Robert, > Before doing anything on my prod env, I generate r/w on ceph cluster using > fio . > On my newest cluster, release 12.2.12, I did not manage to get > the (REQ

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread BASSAGET Cédric

Hi Robert, Before doing anything on my prod env, I generate r/w on ceph cluster using fio . On my newest cluster, release 12.2.12, I did not manage to get the (REQUEST_SLOW) warning, even if my OSD disk usage goes above 95% (fio ran from 4 diffrent hosts) On my prod cluster, release 12.2.9, as soo

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread Robert LeBlanc

On Mon, Jun 10, 2019 at 1:00 AM BASSAGET Cédric < cedric.bassaget...@gmail.com> wrote: > Hello Robert, > My disks did not reach 100% on the last warning, they climb to 70-80% > usage. But I see rrqm / wrqm counters increasing... > > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread BASSAGET Cédric

Hello Robert, My disks did not reach 100% on the last warning, they climb to 70-80% usage. But I see rrqm / wrqm counters increasing... Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 4.000.00

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-07 Thread Robert LeBlanc

With the low number of OSDs, you are probably satuarting the disks. Check with `iostat -xd 2` and see what the utilization of your disks are. A lot of SSDs don't perform well with Ceph's heavy sync writes and performance is terrible. If some of your drives are 100% while others are lower utilizati

[ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-06 Thread BASSAGET Cédric

Hello, I see messages related to REQUEST_SLOW a few times per day. here's my ceph -s : root@ceph-pa2-1:/etc/ceph# ceph -s cluster: id: 72d94815-f057-4127-8914-448dfd25f5bc health: HEALTH_OK services: mon: 3 daemons, quorum ceph-pa2-1,ceph-pa2-2,ceph-pa2-3 mgr: ceph-pa2-

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Charles Alva

> > > > On Tue, May 21, 2019, 4:49 AM Jason Dillaman > wrote: > >> > >> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote: > >> > > >> > Hello cephers, > >> > > >> > we have a few systems which utilize a rbd-bd map/m

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman

-s43 mon.0 10.23.27.153:6789/0 >> > 173640 : cluster [WRN] Health check update: 395 slow requests are blocked >> > > 32 sec. Implicated osds 51 (REQUEST_SLOW) >> > 2019-05-20 00:04:19.234877 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 >> > 173641 : cluster [INF

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Charles Alva

AM Jason Dillaman wrote: > On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote: > > > > Hello cephers, > > > > we have a few systems which utilize a rbd-bd map/mount to get access to > a rbd volume. > > (This problem seems to be related to "[ceph-users]

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman

get access to a rbd > volume. > (This problem seems to be related to "[ceph-users] Slow requests from > bluestore osds" (the original thread)) > > Unfortunately the rbd-nbd device of a system crashes three mondays in series > at ~00:00 when the systemd fstrim timer exec

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Marc Schöchlin

Hello Jason, Am 20.05.19 um 23:49 schrieb Jason Dillaman: > On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote: >> Hello cephers, >> >> we have a few systems which utilize a rbd-bd map/mount to get access to a >> rbd volume. >> (This problem seems to be relat

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-20 Thread Jason Dillaman

On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote: > > Hello cephers, > > we have a few systems which utilize a rbd-bd map/mount to get access to a rbd > volume. > (This problem seems to be related to "[ceph-users] Slow requests from > bluestore osds" (the orig

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-20 Thread Marc Schöchlin

Hello cephers, we have a few systems which utilize a rbd-bd map/mount to get access to a rbd volume. (This problem seems to be related to "[ceph-users] Slow requests from bluestore osds" (the original thread)) Unfortunately the rbd-nbd device of a system crashes three mondays in seri

Re: [ceph-users] Slow requests from bluestore osds

2019-05-14 Thread Stefan Kooman

Quoting Marc Schöchlin (m...@256bit.org): > Out new setup is now: > (12.2.10 on Ubuntu 16.04) > > [osd] > osd deep scrub interval = 2592000 > osd scrub begin hour = 19 > osd scrub end hour = 6 > osd scrub load threshold = 6 > osd scrub sleep = 0.3 > osd snap trim sleep = 0.4 > pg max concurrent s

Re: [ceph-users] Slow requests from bluestore osds

2019-05-12 Thread Marc Schöchlin

ome seconds(SSD) to minutes(HDD) and > perform a compact of OMAP database. > > Regards, > > > > > -Mensaje original- > De: ceph-users En nombre de Marc Schöchlin > Enviado el: lunes, 13 de mayo de 2019 6:59 > Para: ceph-users@lists.ceph.com > Asunto: Re: [ceph-

Re: [ceph-users] Slow requests from bluestore osds

2019-05-12 Thread EDH - Manuel Rios Fernandez

mayo de 2019 6:59 Para: ceph-users@lists.ceph.com Asunto: Re: [ceph-users] Slow requests from bluestore osds Hello cephers, one week ago we replaced the bluestore cache size by "osd memory target" and removed the detail memory settings. This storage class now runs 42*8GB spinners with a

Re: [ceph-users] Slow requests from bluestore osds

2019-05-12 Thread Marc Schöchlin

Hello cephers, one week ago we replaced the bluestore cache size by "osd memory target" and removed the detail memory settings. This storage class now runs 42*8GB spinners with a permanent write workload of 2000-3000 write IOPS, and 1200-8000 read IOPS. Out new setup is now: (12.2.10 on Ubuntu

Re: [ceph-users] Slow requests from bluestore osds

2019-01-28 Thread Marc Schöchlin

Hello cephers, as described - we also have the slow requests in our setup. We recently updated from ceph 12.2.4 to 12.2.10, updated Ubuntu 16.04 to the latest patchlevel (with kernel 4.15.0-43) and applied dell firmware 2.8.0. On 12.2.5 (before updating the cluster) we had in a frequency of 10m

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-19 Thread Mykola Golub

On Fri, Jan 18, 2019 at 11:06:54AM -0600, Mark Nelson wrote: > IE even though you guys set bluestore_cache_size to 1GB, it is being > overridden by bluestore_cache_size_ssd. Isn't it vice versa [1]? [1] https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlueStore.cc#L3976 -- Mykola G

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-18 Thread Mark Nelson

Am 15.01.19 um 12:45 schrieb Marc Roos: I upgraded this weekend from 12.2.8 to 12.2.10 without such issues (osd's are idle) it turns out this was a kernel bug. Updating to a newer kernel - has solved this issue. Greets, Stefan -Original Message- From: Stefan Priebe - Profih

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-18 Thread Nils Fahldieck - Profihost AG

;>>> 21,1318474'61584855] local-lis/les=1318472/1318473 n=1912 >>>>>> ec=133405/133405 lis/c 1318472/1278145 les/c/f >>>>>> 1318473/1278148/1211861 131 >>>>>> 8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 >>>>>> pi

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-17 Thread Mark Nelson

s only in the recovery case. Greets, Stefan Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG: Am 15.01.19 um 12:45 schrieb Marc Roos: I upgraded this weekend from 12.2.8 to 12.2.10 without such issues (osd's are idle) it turns out this was a kernel bug. Updating to a newer kernel -

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-17 Thread Stefan Priebe - Profihost AG

aded m=183 snaptrimq=[ec1a0~1,ec808~1] >>>> mbc={255={(2+0)=183,(3+0)=3}}] _update_calc_stats ml 183 upset size 3 up 2 >>>> >>>> Greets, >>>> Stefan >>>> Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG: >>>>> Hi, >&

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-17 Thread Stefan Priebe - Profihost AG

;>> Greets, >>> Stefan >>> Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG: >>>> Hi, >>>> >>>> no ok it was not. Bug still present. It was only working because the >>>> osdmap was so far away that it has started backf

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-17 Thread Mark Nelson

I upgraded this weekend from 12.2.8 to 12.2.10 without such issues (osd's are idle) it turns out this was a kernel bug. Updating to a newer kernel - has solved this issue. Greets, Stefan -Original Message----- From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] Sent:

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Stefan Priebe - Profihost AG

> Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG: >>>> >>>> Am 15.01.19 um 12:45 schrieb Marc Roos: >>>>> >>>>> I upgraded this weekend from 12.2.8 to 12.2.10 without such issues >>>>> (osd's are

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Mark Nelson

such issues (osd's are idle) it turns out this was a kernel bug. Updating to a newer kernel - has solved this issue. Greets, Stefan -Original Message- From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] Sent: 15 January 2019 10:26 To: ceph-users@lists.ceph.com Cc: n.

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Stefan Priebe - Profihost AG

the recovery case. >> >> Greets, >> Stefan >> >> Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG: >>> >>> Am 15.01.19 um 12:45 schrieb Marc Roos: >>>> >>>> I upgraded this weekend from 12.2.8 to 12.2.10 without su

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Stefan Priebe - Profihost AG

t;> it turns out this was a kernel bug. Updating to a newer kernel - has >> solved this issue. >> >> Greets, >> Stefan >> >> >>> -Original Message- >>> From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] >>> Sen

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-16 Thread Stefan Priebe - Profihost AG

-Original Message- >> From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] >> Sent: 15 January 2019 10:26 >> To: ceph-users@lists.ceph.com >> Cc: n.fahldi...@profihost.ag >> Subject: Re: [ceph-users] slow requests and high i/o / read rate on >

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-15 Thread Mark Nelson

ofihost.ag] Sent: 15 January 2019 10:26 To: ceph-users@lists.ceph.com Cc: n.fahldi...@profihost.ag Subject: Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10 Hello list, i also tested current upstream/luminous branch and it happens as well. A

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-15 Thread Stefan Priebe - Profihost AG

From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] > Sent: 15 January 2019 10:26 > To: ceph-users@lists.ceph.com > Cc: n.fahldi...@profihost.ag > Subject: Re: [ceph-users] slow requests and high i/o / read rate on > bluestore osds after upgrade 12.2.8 -> 12.2.10 >

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-15 Thread Marc Roos

I upgraded this weekend from 12.2.8 to 12.2.10 without such issues (osd's are idle) -Original Message- From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] Sent: 15 January 2019 10:26 To: ceph-users@lists.ceph.com Cc: n.fahldi...@profihost.ag Subject: Re: [ceph-

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-15 Thread Stefan Priebe - Profihost AG

Hello list, i also tested current upstream/luminous branch and it happens as well. A clean install works fine. It only happens on upgraded bluestore osds. Greets, Stefan Am 14.01.19 um 20:35 schrieb Stefan Priebe - Profihost AG: > while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm expe

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Mark Nelson

Hi Stefan, Any idea if the reads are constant or bursty? One cause of heavy reads is when rocksdb is compacting and has to read SST files from disk. It's also possible you could see heavy read traffic during writes if data has to be read from SST files rather than cache. It's possible this

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Stefan Priebe - Profihost AG

Hi Paul, Am 14.01.19 um 21:39 schrieb Paul Emmerich: > What's the output of "ceph daemon osd. status" on one of the OSDs > while it's starting? { "cluster_fsid": "b338193d-39e0-40e9-baba-4965ef3868a3", "osd_fsid": "d95d0e3b-7441-4ab0-869c-fe0551d3bd52", "whoami": 2, "state": "act

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Paul Emmerich

What's the output of "ceph daemon osd. status" on one of the OSDs while it's starting? Is the OSD crashing and being restarted all the time? Anything weird in the log files? Was there recovery or backfill during the upgrade? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contac

[ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Stefan Priebe - Profihost AG

Hi, while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm experience issues with bluestore osds - so i canceled the upgrade and all bluestore osds are stopped now. After starting a bluestore osd i'm seeing a lot of slow requests caused by very high read rates. Device: rrqm/s wr

[ceph-users] slow requests and degraded cluster, but not really ?

2018-10-23 Thread Ben Morrice

Hello all, We have an issue with our ceph cluster where 'ceph -s' shows that several requests are blocked, however querying further with 'ceph health detail' indicates that the PGs affected are either active+clean or do not currently exist. OSD 32 appears to be working fine, and the cluster is

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Jaime Ibar

Hi all, after increasing mon_max_pg_per_osd number ceph starts rebalancing as usual. However, the slow requests warnings are still there, even after setting primary-affinity to 0 beforehand. By the other hand, if I destroy the osd, ceph will start rebalancing unless noout flag is set, am I ri

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Paul Emmerich

You can prevent creation of the PGs on the old filestore OSDs (which seems to be the culprit here) during replacement by replacing the disks the hard way: * ceph osd destroy osd.X * re-create with bluestore under the same id (ceph volume ... --osd-id X) it will then just backfill onto the same di

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Eugen Block

Hi, to reduce impact on clients during migration I would set the OSD's primary-affinity to 0 beforehand. This should prevent the slow requests, at least this setting has helped us a lot with problematic OSDs. Regards Eugen Zitat von Jaime Ibar : Hi all, we recently upgrade from Jewel

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Darius Kasparavičius

Hello, 2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update: 249 PGs pending on creation (PENDING_CREATING_PGS) This error might indicate that you are hitting a PG limit per osd. Here some information on it https://ceph.com/community/new-luminous-pg-overdose-protection/ . You migh

[ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Jaime Ibar

Hi all, we recently upgrade from Jewel 10.2.10 to Luminous 12.2.7, now we're trying to migrate the osd's to Bluestore following this document[0], however when I mark the osd as out, I'm getting warnings similar to these ones 2018-09-20 09:32:46.079630 mon.dri-ceph01 [WRN] Health check fail

Re: [ceph-users] Slow requests from bluestore osds

2018-09-18 Thread Augusto Rodrigues

I solved my slow requests by increasing the size of block.db. Calculate 4% per stored TB and preferably host the DB in NVME. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow requests from bluestore osds

2018-09-06 Thread Marc Schöchlin

Hello Uwe, as described in my mail we are running 4.13.0-39. In conjunction with some later mails of this thread it seems that this problem might related to os/microcode (spectre) updates. I am planning a ceph/ubuntu upgrade in the next week because of various reasons, let's see what happens...

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Tim Bishop

On Sat, Sep 01, 2018 at 12:45:06PM -0400, Brett Chancellor wrote: > Hi Cephers, > I am in the process of upgrading a cluster from Filestore to bluestore, > but I'm concerned about frequent warnings popping up against the new > bluestore devices. I'm frequently seeing messages like this, although

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Brett Chancellor

Mine is currently at 1000 due to the high number of pgs we had coming from Jewel. I do find it odd that only the bluestore OSDs have this issue. Filestore OSDs seem to be unaffected. On Wed, Sep 5, 2018, 3:43 PM Samuel Taylor Liston wrote: > Just a thought - have you looked at increasing your "—

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Samuel Taylor Liston

Just a thought - have you looked at increasing your "—mon_max_pg_per_osd” both on the mons and osds? I was having a similar issue while trying to add more OSDs to my cluster (12.2.27, CentOS7.5, 3.10.0-862.9.1.el7.x86_64). I increased mine to 300 temporarily while adding OSDs and stopped havi

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Daniel Pryor

I've experienced the same thing during scrubbing and/or any kind of expansion activity. *Daniel Pryor* On Mon, Sep 3, 2018 at 2:13 AM Marc Schöchlin wrote: > Hi, > > we are also experiencing this type of behavior for some weeks on our not > so performance critical hdd pools. > We haven't spent

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Brett Chancellor

I'm running Centos 7.5. If I turn off spectre/meltdown protection then a security sweep will disconnect it from the network. -Brett On Wed, Sep 5, 2018 at 2:24 PM, Uwe Sauter wrote: > I'm also experiencing slow requests though I cannot point it to scrubbing. > > Which kernel do you run? Would y

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Uwe Sauter

I'm also experiencing slow requests though I cannot point it to scrubbing. Which kernel do you run? Would you be able to test against the same kernel with Spectre/Meltdown mitigations disabled ("noibrs noibpb nopti nospectre_v2" as boot option)? Uwe Am 05.09.18 um 19:30 schrieb Brett

Re: [ceph-users] Slow requests from bluestore osds

2018-09-05 Thread Brett Chancellor

Marc, As with you, this problem manifests itself only when the bluestore OSD is involved in some form of deep scrub. Anybody have any insight on what might be causing this? -Brett On Mon, Sep 3, 2018 at 4:13 AM, Marc Schöchlin wrote: > Hi, > > we are also experiencing this type of behavior f

[ceph-users] Slow requests from bluestore osds

2018-09-03 Thread Marc Schöchlin

Hi, we are also experiencing this type of behavior for some weeks on our not so performance critical hdd pools. We haven't spent so much time on this problem, because there are currently more important tasks - but here are a few details: Running the following loop results in the following output:

Re: [ceph-users] Slow requests from bluestore osds

2018-09-02 Thread Brett Chancellor

The warnings look like this. 6 ops are blocked > 32.768 sec on osd.219 1 osds have slow requests On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza wrote: > On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor > wrote: > > Hi Cephers, > > I am in the process of upgrading a cluster from Filestore to blue

Re: [ceph-users] Slow requests from bluestore osds

2018-09-02 Thread Alfredo Deza

On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor wrote: > Hi Cephers, > I am in the process of upgrading a cluster from Filestore to bluestore, > but I'm concerned about frequent warnings popping up against the new > bluestore devices. I'm frequently seeing messages like this, although the > sp

[ceph-users] Slow requests from bluestore osds

2018-09-01 Thread Brett Chancellor

Hi Cephers, I am in the process of upgrading a cluster from Filestore to bluestore, but I'm concerned about frequent warnings popping up against the new bluestore devices. I'm frequently seeing messages like this, although the specific osd changes, it's always one of the few hosts I've converted

Re: [ceph-users] Slow requests during OSD maintenance

2018-07-17 Thread Konstantin Shalygin

2. What is the best way to remove an OSD node from the cluster during maintenance? ceph osd set noout is not the way to go, since no OSD's are out during yum update and the node is still part of the cluster and will handle I/O. I think the best way is the combination of "ceph osd set noout" + stop

[ceph-users] Slow requests during OSD maintenance

2018-07-17 Thread sinan

Hi, On one of our OSD nodes I performed a "yum update" with Ceph repositories disabled. So only the OS packages were being updated. During and namely at the end of the yum update, the cluster started to have slow/blocked requests and all VM's with Ceph storage backend had high I/O load. After ~15

[ceph-users] Slow Requests when deep scrubbing PGs that hold Bucket Index

2018-07-10 Thread Christian Wimmer

Hi, I'm using ceph primarily for block storage (which works quite well) and as an object gateway using the S3 API. Here is some info about my system: Ceph: 12.2.4, OS: Ubuntu 18.04 OSD: Bluestore 6 servers in total, about 60 OSDs, 2TB SSDs each, no HDDs, CFQ scheduler 20 GBit private network 20 G

Re: [ceph-users] Slow requests

2018-07-09 Thread Brad Hubbard

On Mon, Jul 9, 2018 at 5:28 PM, Benjamin Naber wrote: > Hi @all, > > Problem seems to be solved, afther downgrading from Kernel 4.17.2 to > 3.10.0-862. > Anyone other have issues with newer Kernels and osd nodes? I'd suggest you pursue that with whoever supports the kernel exhibiting the problem

Re: [ceph-users] Slow requests

2018-07-04 Thread Brad Hubbard

On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber wrote: > Hi @all, > > im currently in testing for setup an production environment based on the > following OSD Nodes: > > CEPH Version: luminous 12.2.5 > > 5x OSD Nodes with following specs: > > - 8 Core Intel Xeon 2,0 GHZ > > - 96GB Ram > > - 10x 1,

Re: [ceph-users] Slow requests

2018-07-04 Thread Benjamin Naber

hi Caspar, ty for the reply. ive updatet all SSDs to actual firmware. Still having the same error. the strange thing is that this issue switches from node to node and from osd to osd. HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 slow requests are blocked > 32 sec 1 ops ar

Re: [ceph-users] Slow requests

2018-07-04 Thread Caspar Smit

Hi Ben, At first glance i would say the CPU's are a bit weak for this setup. Recommended is to have at least 1 core per OSD. Since you have 8 cores and 10 OSD's there isn't much left for other processes. Furthermore, did you upgrade the firmware of those DC S4500's to the latest firmware? (SCV101

[ceph-users] Slow requests

2018-07-04 Thread Benjamin Naber

Hi @all, im currently in testing for setup an production environment based on the following OSD Nodes: CEPH Version: luminous 12.2.5 5x OSD Nodes with following specs: - 8 Core Intel Xeon 2,0 GHZ - 96GB Ram - 10x 1,92 TB Intel DC S4500 connectet via SATA - 4x 10 Gbit NIC 2 bonded via LACP f

Re: [ceph-users] slow requests are blocked

2018-05-16 Thread Paul Emmerich

By looking at the operations that are slow in your dump_*_ops command. We've found that it's best to move all the metadata stuff for RGW onto SSDs, i.e., all pools except the actual data pool. But that depends on your use case and whether the slow requests you are seeing is actually a problem for

Re: [ceph-users] slow requests are blocked

2018-05-16 Thread Grigory Murashov

Hello Paul! Thanks for your answer. How did you understand it's RGW Metadata stuff? No, I don't use any SSDs. Where I can find out more about Metadata pools, using SSD etc?.. Thanks. Grigory Murashov Voximplant 15.05.2018 23:42, Paul Emmerich пишет: Looks like it's mostly RGW metadata stuf

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread David Turner

I've been happening into slow requests with my rgw metadata pools just this week. I tracked it down because the slow requests were on my nmve osds. I haven't solved the issue yet, but I can confirm that no resharding was taking place and that the auto-resharder is working as all of my larger bucket

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread Paul Emmerich

Looks like it's mostly RGW metadata stuff; are you running your non-data RGW pools on SSDs (you should, that can help *a lot*)? Paul 2018-05-15 18:49 GMT+02:00 Grigory Murashov : > Hello guys! > > I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph > daemon osd.16 dump_historic

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread LOPEZ Jean-Charles

Hi Grigory, looks like osd.16 is having a hard time acknowledging the write request (for bucket resharding operations from what it looks like) as it takes about 15 seconds for osd.16 to receive the commit confirmation from osd.21 on subop communication. Have a go and check at the journal devic

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread Grigory Murashov

Hello guys! I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph daemon osd.16 dump_historic_ops. Here is the output of ceph heath details in the moment of problem HEALTH_WARN 20 slow requests are blocked > 32 sec REQUEST_SLOW 20 slow requests are blocked > 32 sec 20 ops a

Re: [ceph-users] slow requests are blocked

2018-05-14 Thread Grigory Murashov

Hello David! 2. I set it up 10/10 3. Thanks, my problem was I did it on host where was no osd.15 daemon. Could you please help to read osd logs? Here is a part from ceph.log 2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553895 : cluster [INF] Cluster is now healt

Re: [ceph-users] slow requests are blocked

2018-05-10 Thread David Turner

2. When logging the 1/5 is what's written to the log file/what's temporarily stored in memory. If you want to increase logging, you need to increase both numbers to 20/20 or 10/10. You can also just set it to 20 or 10 and ceph will set them to the same number. I personally do both numbers to rem

Re: [ceph-users] slow requests are blocked

2018-05-10 Thread Grigory Murashov

Hi JC! Thanks for your answer first. 1. I have added output of ceph health detail to Zabbix in case of warning. So every time I will see with which OSD the problem is. 2. I have default level of all logs. As I see here http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/ d

Re: [ceph-users] slow requests are blocked

2018-05-08 Thread Jean-Charles Lopez

Hi Grigory, are these lines the only lines in your log file for OSD 15? Just for sanity, what are the log levels you have set, if any, in your config file away from the default? If you set all log levels to 0 like some people do you may want to simply go back to the default by commenting out th

Re: [ceph-users] slow requests are blocked

2018-05-08 Thread Grigory Murashov

Hello Jean-Charles! I have finally catch the problem, It was at 13-02. [cephuser@storage-ru1-osd3 ~]$ ceph health detail HEALTH_WARN 18 slow requests are blocked > 32 sec REQUEST_SLOW 18 slow requests are blocked > 32 sec 3 ops are blocked > 65.536 sec 15 ops are blocked > 32.768 sec

Re: [ceph-users] slow requests are blocked

2018-05-07 Thread Jean-Charles Lopez

Hi, ceph health detail This will tell you which OSDs are experiencing the problem so you can then go and inspect the logs and use the admin socket to find out which requests are at the source. Regards JC > On May 7, 2018, at 03:52, Grigory Murashov wrote: > > Hello! > > I'm not much experi

[ceph-users] slow requests are blocked

2018-05-07 Thread Grigory Murashov

Hello! I'm not much experiensed in ceph troubleshouting that why I ask for help. I have multiple warnings coming from zabbix as a result of ceph -s REQUEST_SLOW: HEALTH_WARN : 21 slow requests are blocked > 32 sec I don't see any hardware problems that time. I'm able to find the same strings

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-11 Thread Alex Gorbachev

On Mon, Mar 5, 2018 at 11:20 PM, Brad Hubbard wrote: > On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev > wrote: >> On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: >>> Blocked requests and slow requests are synonyms in ceph. They are 2 names >>> for the exact same thing. >>> >>> >>> On Thu,

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-05 Thread Brad Hubbard

On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev wrote: > On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: >> Blocked requests and slow requests are synonyms in ceph. They are 2 names >> for the exact same thing. >> >> >> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev >> wrote: >>> >>> On Thu,

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-03 Thread Alex Gorbachev

On Fri, Mar 2, 2018 at 9:56 AM, Alex Gorbachev wrote: > > On Fri, Mar 2, 2018 at 4:17 AM Maged Mokhtar wrote: >> >> On 2018-03-02 07:54, Alex Gorbachev wrote: >> >> On Thu, Mar 1, 2018 at 10:57 PM, David Turner >> wrote: >> >> Blocked requests and slow requests are synonyms in ceph. They are 2 n

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-02 Thread Alex Gorbachev

On Fri, Mar 2, 2018 at 4:17 AM Maged Mokhtar wrote: > On 2018-03-02 07:54, Alex Gorbachev wrote: > > On Thu, Mar 1, 2018 at 10:57 PM, David Turner > wrote: > > Blocked requests and slow requests are synonyms in ceph. They are 2 names > for the exact same thing. > > > On Thu, Mar 1, 2018, 10:21 P

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-02 Thread Maged Mokhtar

On 2018-03-02 07:54, Alex Gorbachev wrote: > On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: > Blocked requests and slow requests are synonyms in ceph. They are 2 names > for the exact same thing. > > On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev > wrote: > On Thu, Mar 1, 2018 at 2:47 PM

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread Alex Gorbachev

On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: > Blocked requests and slow requests are synonyms in ceph. They are 2 names > for the exact same thing. > > > On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev wrote: >> >> On Thu, Mar 1, 2018 at 2:47 PM, David Turner >> wrote: >> > `ceph health de

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread David Turner

Blocked requests and slow requests are synonyms in ceph. They are 2 names for the exact same thing. On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev wrote: > On Thu, Mar 1, 2018 at 2:47 PM, David Turner > wrote: > > `ceph health detail` should show you more information about the slow > > requests.

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread Alex Gorbachev

On Thu, Mar 1, 2018 at 2:47 PM, David Turner wrote: > `ceph health detail` should show you more information about the slow > requests. If the output is too much stuff, you can grep out for blocked or > something. It should tell you which OSDs are involved, how long they've > been slow, etc. The

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread David Turner

`ceph health detail` should show you more information about the slow requests. If the output is too much stuff, you can grep out for blocked or something. It should tell you which OSDs are involved, how long they've been slow, etc. The default is for them to show '> 32 sec' but that may very wel

[ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-01 Thread Alex Gorbachev

Is there a switch to turn on the display of specific OSD issues? Or does the below indicate a generic problem, e.g. network and no any specific OSD? 2018-02-28 18:09:36.438300 7f6dead56700 0 mon.roc-vm-sc3c234@0(leader).data_health(46) update_stats avail 56% total 15997 MB, used 6154 MB, avail 9

Re: [ceph-users] slow requests on a specific osd

2018-01-15 Thread lists

Hi Wes, On 15-1-2018 20:57, Wes Dillingham wrote: My understanding is that the exact same objects would move back to the OSD if weight went 1 -> 0 -> 1 given the same Cluster state and same object names, CRUSH is deterministic so that would be the almost certain result. Ok, thanks! So this

1 2 3 >

1 - 100 of 200 matches

Mail list logo