hi, cephers
recenty, I am testing ceph 12.2.12 with bluestore using cosbench.
both SATA osd and ssd osd has slow request.
many slow request occur, and most slow logs after rocksdb delete wal
or table_file_deletion logs
does it means the bottleneck of Rocksdb? if so how to improve. if not
how to fi
Hi Lukasz,
if this is filestore then most probably my comments are irrelevant. The
issue I expected is BlueStore specific
Unfortunately I'm not an expert in filestore hence unable to help in
further investigation. Sorry...
Thanks,
Igor
On 7/9/2019 11:39 AM, Luk wrote:
We have (stil
We have (still) on these OSDs filestore.
Regards
Lukasz
> Hi Igor,
> ThankYoufor Your input, will try Your suggestion with
> ceph-objectstore-tool.
> But for now it looks like main problem is this:
> 2019-07-09 09:29:25.410839 7f5e4b64f700 1 heartbeat_map is_healthy
> 'OSD::o
Hi Igor,
ThankYoufor Your input, will try Your suggestion with
ceph-objectstore-tool.
But for now it looks like main problem is this:
2019-07-09 09:29:25.410839 7f5e4b64f700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f5e20e87700' had timed out after 15
2019-07-09 09:
Hi Lukasz,
I've seen something like that - slow requests and relevant OSD reboots
on suicide timeout at least twice with two different clusters. The root
cause was slow omap listing for some objects which had started to happen
after massive removals from RocksDB.
To verify if this is the cas
On Wed, Jul 3, 2019 at 4:47 PM Luk wrote:
>
>
> this pool is that 'big' :
>
> [root@ceph-mon-01 ~]# rados df | grep -e index -e WR
> POOL_NAME USEDOBJECTS CLONES COPIES
> MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
>
> default.rgw.buckets.index
Hi,
> Den ons 3 juli 2019 kl 09:01 skrev Luk :
> Hello,
> I have strange problem with scrubbing.
> When scrubbing starts on PG which belong to default.rgw.buckets.index
> pool, I can see that this OSD is very busy (see attachment), and starts
> showing many
> slow request, after the
Den ons 3 juli 2019 kl 09:01 skrev Luk :
> Hello,
>
> I have strange problem with scrubbing.
>
> When scrubbing starts on PG which belong to default.rgw.buckets.index
> pool, I can see that this OSD is very busy (see attachment), and starts
> showing many
> slow request, after the scrubbin
Hello,
I have strange problem with scrubbing.
When scrubbing starts on PG which belong to default.rgw.buckets.index
pool, I can see that this OSD is very busy (see attachment), and starts
showing many
slow request, after the scrubbing of this PG stops, slow requests
stops immediately.
Hello Robert,
I did not make any changes, so I'm still using the prio queue.
Regards
Le lun. 10 juin 2019 à 17:44, Robert LeBlanc a
écrit :
> I'm glad it's working, to be clear did you use wpq, or is it still the
> prio queue?
>
> Sent from a mobile device, please excuse any typos.
>
> On Mon, J
I'm glad it's working, to be clear did you use wpq, or is it still the prio
queue?
Sent from a mobile device, please excuse any typos.
On Mon, Jun 10, 2019, 4:45 AM BASSAGET Cédric
wrote:
> an update from 12.2.9 to 12.2.12 seems to have fixed the problem !
>
> Le lun. 10 juin 2019 à 12:25, BASS
an update from 12.2.9 to 12.2.12 seems to have fixed the problem !
Le lun. 10 juin 2019 à 12:25, BASSAGET Cédric
a écrit :
> Hi Robert,
> Before doing anything on my prod env, I generate r/w on ceph cluster using
> fio .
> On my newest cluster, release 12.2.12, I did not manage to get
> the (REQ
Hi Robert,
Before doing anything on my prod env, I generate r/w on ceph cluster using
fio .
On my newest cluster, release 12.2.12, I did not manage to get
the (REQUEST_SLOW) warning, even if my OSD disk usage goes above 95% (fio
ran from 4 diffrent hosts)
On my prod cluster, release 12.2.9, as soo
On Mon, Jun 10, 2019 at 1:00 AM BASSAGET Cédric <
cedric.bassaget...@gmail.com> wrote:
> Hello Robert,
> My disks did not reach 100% on the last warning, they climb to 70-80%
> usage. But I see rrqm / wrqm counters increasing...
>
> Device: rrqm/s wrqm/s r/s w/srkB/swkB/s
Hello Robert,
My disks did not reach 100% on the last warning, they climb to 70-80%
usage. But I see rrqm / wrqm counters increasing...
Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0.00 4.000.00
With the low number of OSDs, you are probably satuarting the disks. Check
with `iostat -xd 2` and see what the utilization of your disks are. A lot
of SSDs don't perform well with Ceph's heavy sync writes and performance is
terrible.
If some of your drives are 100% while others are lower utilizati
Hello,
I see messages related to REQUEST_SLOW a few times per day.
here's my ceph -s :
root@ceph-pa2-1:/etc/ceph# ceph -s
cluster:
id: 72d94815-f057-4127-8914-448dfd25f5bc
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-pa2-1,ceph-pa2-2,ceph-pa2-3
mgr: ceph-pa2-
> >
> > On Tue, May 21, 2019, 4:49 AM Jason Dillaman
> wrote:
> >>
> >> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote:
> >> >
> >> > Hello cephers,
> >> >
> >> > we have a few systems which utilize a rbd-bd map/m
-s43 mon.0 10.23.27.153:6789/0
>> > 173640 : cluster [WRN] Health check update: 395 slow requests are blocked
>> > > 32 sec. Implicated osds 51 (REQUEST_SLOW)
>> > 2019-05-20 00:04:19.234877 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
>> > 173641 : cluster [INF
AM Jason Dillaman wrote:
> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote:
> >
> > Hello cephers,
> >
> > we have a few systems which utilize a rbd-bd map/mount to get access to
> a rbd volume.
> > (This problem seems to be related to "[ceph-users]
get access to a rbd
> volume.
> (This problem seems to be related to "[ceph-users] Slow requests from
> bluestore osds" (the original thread))
>
> Unfortunately the rbd-nbd device of a system crashes three mondays in series
> at ~00:00 when the systemd fstrim timer exec
Hello Jason,
Am 20.05.19 um 23:49 schrieb Jason Dillaman:
> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote:
>> Hello cephers,
>>
>> we have a few systems which utilize a rbd-bd map/mount to get access to a
>> rbd volume.
>> (This problem seems to be relat
On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote:
>
> Hello cephers,
>
> we have a few systems which utilize a rbd-bd map/mount to get access to a rbd
> volume.
> (This problem seems to be related to "[ceph-users] Slow requests from
> bluestore osds" (the orig
Hello cephers,
we have a few systems which utilize a rbd-bd map/mount to get access to a rbd
volume.
(This problem seems to be related to "[ceph-users] Slow requests from bluestore
osds" (the original thread))
Unfortunately the rbd-nbd device of a system crashes three mondays in seri
Quoting Marc Schöchlin (m...@256bit.org):
> Out new setup is now:
> (12.2.10 on Ubuntu 16.04)
>
> [osd]
> osd deep scrub interval = 2592000
> osd scrub begin hour = 19
> osd scrub end hour = 6
> osd scrub load threshold = 6
> osd scrub sleep = 0.3
> osd snap trim sleep = 0.4
> pg max concurrent s
ome seconds(SSD) to minutes(HDD) and
> perform a compact of OMAP database.
>
> Regards,
>
>
>
>
> -Mensaje original-
> De: ceph-users En nombre de Marc Schöchlin
> Enviado el: lunes, 13 de mayo de 2019 6:59
> Para: ceph-users@lists.ceph.com
> Asunto: Re: [ceph-
mayo de 2019 6:59
Para: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Slow requests from bluestore osds
Hello cephers,
one week ago we replaced the bluestore cache size by "osd memory target" and
removed the detail memory settings.
This storage class now runs 42*8GB spinners with a
Hello cephers,
one week ago we replaced the bluestore cache size by "osd memory target" and
removed the detail memory settings.
This storage class now runs 42*8GB spinners with a permanent write workload of
2000-3000 write IOPS, and 1200-8000 read IOPS.
Out new setup is now:
(12.2.10 on Ubuntu
Hello cephers,
as described - we also have the slow requests in our setup.
We recently updated from ceph 12.2.4 to 12.2.10, updated Ubuntu 16.04 to the
latest patchlevel (with kernel 4.15.0-43) and applied dell firmware 2.8.0.
On 12.2.5 (before updating the cluster) we had in a frequency of 10m
On Fri, Jan 18, 2019 at 11:06:54AM -0600, Mark Nelson wrote:
> IE even though you guys set bluestore_cache_size to 1GB, it is being
> overridden by bluestore_cache_size_ssd.
Isn't it vice versa [1]?
[1]
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlueStore.cc#L3976
--
Mykola G
Am 15.01.19 um 12:45 schrieb Marc Roos:
I upgraded this weekend from 12.2.8 to 12.2.10 without such
issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel -
has
solved this issue.
Greets,
Stefan
-Original Message-
From: Stefan Priebe - Profih
;>>> 21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
>>>>>> ec=133405/133405 lis/c 1318472/1278145 les/c/f
>>>>>> 1318473/1278148/1211861 131
>>>>>> 8472/1318472/1318472) [33,3,22] r=0 lpr=1318472
>>>>>> pi
s only in the recovery case.
Greets,
Stefan
Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG:
Am 15.01.19 um 12:45 schrieb Marc Roos:
I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel -
aded m=183 snaptrimq=[ec1a0~1,ec808~1]
>>>> mbc={255={(2+0)=183,(3+0)=3}}] _update_calc_stats ml 183 upset size 3 up 2
>>>>
>>>> Greets,
>>>> Stefan
>>>> Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG:
>>>>> Hi,
>&
;>> Greets,
>>> Stefan
>>> Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG:
>>>> Hi,
>>>>
>>>> no ok it was not. Bug still present. It was only working because the
>>>> osdmap was so far away that it has started backf
I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel - has
solved this issue.
Greets,
Stefan
-Original Message-----
From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
Sent:
> Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG:
>>>>
>>>> Am 15.01.19 um 12:45 schrieb Marc Roos:
>>>>>
>>>>> I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
>>>>> (osd's are
such issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel - has
solved this issue.
Greets,
Stefan
-Original Message-
From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
Sent: 15 January 2019 10:26
To: ceph-users@lists.ceph.com
Cc: n.
the recovery case.
>>
>> Greets,
>> Stefan
>>
>> Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG:
>>>
>>> Am 15.01.19 um 12:45 schrieb Marc Roos:
>>>>
>>>> I upgraded this weekend from 12.2.8 to 12.2.10 without su
t;> it turns out this was a kernel bug. Updating to a newer kernel - has
>> solved this issue.
>>
>> Greets,
>> Stefan
>>
>>
>>> -Original Message-
>>> From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
>>> Sen
-Original Message-
>> From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
>> Sent: 15 January 2019 10:26
>> To: ceph-users@lists.ceph.com
>> Cc: n.fahldi...@profihost.ag
>> Subject: Re: [ceph-users] slow requests and high i/o / read rate on
>
ofihost.ag]
Sent: 15 January 2019 10:26
To: ceph-users@lists.ceph.com
Cc: n.fahldi...@profihost.ag
Subject: Re: [ceph-users] slow requests and high i/o / read rate on
bluestore osds after upgrade 12.2.8 -> 12.2.10
Hello list,
i also tested current upstream/luminous branch and it happens as well. A
From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
> Sent: 15 January 2019 10:26
> To: ceph-users@lists.ceph.com
> Cc: n.fahldi...@profihost.ag
> Subject: Re: [ceph-users] slow requests and high i/o / read rate on
> bluestore osds after upgrade 12.2.8 -> 12.2.10
>
I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
(osd's are idle)
-Original Message-
From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
Sent: 15 January 2019 10:26
To: ceph-users@lists.ceph.com
Cc: n.fahldi...@profihost.ag
Subject: Re: [ceph-
Hello list,
i also tested current upstream/luminous branch and it happens as well. A
clean install works fine. It only happens on upgraded bluestore osds.
Greets,
Stefan
Am 14.01.19 um 20:35 schrieb Stefan Priebe - Profihost AG:
> while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm expe
Hi Stefan,
Any idea if the reads are constant or bursty? One cause of heavy reads
is when rocksdb is compacting and has to read SST files from disk. It's
also possible you could see heavy read traffic during writes if data has
to be read from SST files rather than cache. It's possible this
Hi Paul,
Am 14.01.19 um 21:39 schrieb Paul Emmerich:
> What's the output of "ceph daemon osd. status" on one of the OSDs
> while it's starting?
{
"cluster_fsid": "b338193d-39e0-40e9-baba-4965ef3868a3",
"osd_fsid": "d95d0e3b-7441-4ab0-869c-fe0551d3bd52",
"whoami": 2,
"state": "act
What's the output of "ceph daemon osd. status" on one of the OSDs
while it's starting?
Is the OSD crashing and being restarted all the time? Anything weird
in the log files? Was there recovery or backfill during the upgrade?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contac
Hi,
while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm experience
issues with bluestore osds - so i canceled the upgrade and all bluestore
osds are stopped now.
After starting a bluestore osd i'm seeing a lot of slow requests caused
by very high read rates.
Device: rrqm/s wr
Hello all,
We have an issue with our ceph cluster where 'ceph -s' shows that
several requests are blocked, however querying further with 'ceph health
detail' indicates that the PGs affected are either active+clean or do
not currently exist.
OSD 32 appears to be working fine, and the cluster is
Hi all,
after increasing mon_max_pg_per_osd number ceph starts rebalancing as usual.
However, the slow requests warnings are still there, even after setting
primary-affinity to 0 beforehand.
By the other hand, if I destroy the osd, ceph will start rebalancing unless
noout flag is set, am I ri
You can prevent creation of the PGs on the old filestore OSDs (which
seems to be the culprit here) during replacement by replacing the
disks the hard way:
* ceph osd destroy osd.X
* re-create with bluestore under the same id (ceph volume ... --osd-id X)
it will then just backfill onto the same di
Hi,
to reduce impact on clients during migration I would set the OSD's
primary-affinity to 0 beforehand. This should prevent the slow
requests, at least this setting has helped us a lot with problematic
OSDs.
Regards
Eugen
Zitat von Jaime Ibar :
Hi all,
we recently upgrade from Jewel
Hello,
2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update:
249 PGs pending on creation (PENDING_CREATING_PGS)
This error might indicate that you are hitting a PG limit per osd.
Here some information on it
https://ceph.com/community/new-luminous-pg-overdose-protection/ . You
migh
Hi all,
we recently upgrade from Jewel 10.2.10 to Luminous 12.2.7, now we're
trying to migrate the
osd's to Bluestore following this document[0], however when I mark the
osd as out,
I'm getting warnings similar to these ones
2018-09-20 09:32:46.079630 mon.dri-ceph01 [WRN] Health check fail
I solved my slow requests by increasing the size of block.db. Calculate 4% per
stored TB and preferably host the DB in NVME.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hello Uwe,
as described in my mail we are running 4.13.0-39.
In conjunction with some later mails of this thread it seems that this problem
might related to os/microcode (spectre) updates.
I am planning a ceph/ubuntu upgrade in the next week because of various
reasons, let's see what happens...
On Sat, Sep 01, 2018 at 12:45:06PM -0400, Brett Chancellor wrote:
> Hi Cephers,
> I am in the process of upgrading a cluster from Filestore to bluestore,
> but I'm concerned about frequent warnings popping up against the new
> bluestore devices. I'm frequently seeing messages like this, although
Mine is currently at 1000 due to the high number of pgs we had coming from
Jewel. I do find it odd that only the bluestore OSDs have this issue.
Filestore OSDs seem to be unaffected.
On Wed, Sep 5, 2018, 3:43 PM Samuel Taylor Liston
wrote:
> Just a thought - have you looked at increasing your "—
Just a thought - have you looked at increasing your "—mon_max_pg_per_osd” both
on the mons and osds? I was having a similar issue while trying to add more
OSDs to my cluster (12.2.27, CentOS7.5, 3.10.0-862.9.1.el7.x86_64). I
increased mine to 300 temporarily while adding OSDs and stopped havi
I've experienced the same thing during scrubbing and/or any kind of
expansion activity.
*Daniel Pryor*
On Mon, Sep 3, 2018 at 2:13 AM Marc Schöchlin wrote:
> Hi,
>
> we are also experiencing this type of behavior for some weeks on our not
> so performance critical hdd pools.
> We haven't spent
I'm running Centos 7.5. If I turn off spectre/meltdown protection then a
security sweep will disconnect it from the network.
-Brett
On Wed, Sep 5, 2018 at 2:24 PM, Uwe Sauter wrote:
> I'm also experiencing slow requests though I cannot point it to scrubbing.
>
> Which kernel do you run? Would y
I'm also experiencing slow requests though I cannot point it to scrubbing.
Which kernel do you run? Would you be able to test against the same kernel with Spectre/Meltdown mitigations disabled
("noibrs noibpb nopti nospectre_v2" as boot option)?
Uwe
Am 05.09.18 um 19:30 schrieb Brett
Marc,
As with you, this problem manifests itself only when the bluestore OSD is
involved in some form of deep scrub. Anybody have any insight on what
might be causing this?
-Brett
On Mon, Sep 3, 2018 at 4:13 AM, Marc Schöchlin wrote:
> Hi,
>
> we are also experiencing this type of behavior f
Hi,
we are also experiencing this type of behavior for some weeks on our not
so performance critical hdd pools.
We haven't spent so much time on this problem, because there are
currently more important tasks - but here are a few details:
Running the following loop results in the following output:
The warnings look like this.
6 ops are blocked > 32.768 sec on osd.219
1 osds have slow requests
On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza wrote:
> On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
> wrote:
> > Hi Cephers,
> > I am in the process of upgrading a cluster from Filestore to blue
On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
wrote:
> Hi Cephers,
> I am in the process of upgrading a cluster from Filestore to bluestore,
> but I'm concerned about frequent warnings popping up against the new
> bluestore devices. I'm frequently seeing messages like this, although the
> sp
Hi Cephers,
I am in the process of upgrading a cluster from Filestore to bluestore,
but I'm concerned about frequent warnings popping up against the new
bluestore devices. I'm frequently seeing messages like this, although the
specific osd changes, it's always one of the few hosts I've converted
2. What is the best way to remove an OSD node from the cluster during
maintenance? ceph osd set noout is not the way to go, since no OSD's are
out during yum update and the node is still part of the cluster and will
handle I/O.
I think the best way is the combination of "ceph osd set noout" + stop
Hi,
On one of our OSD nodes I performed a "yum update" with Ceph repositories
disabled. So only the OS packages were being updated.
During and namely at the end of the yum update, the cluster started to
have slow/blocked requests and all VM's with Ceph storage backend had high
I/O load. After ~15
Hi,
I'm using ceph primarily for block storage (which works quite well) and as
an object gateway using the S3 API.
Here is some info about my system:
Ceph: 12.2.4, OS: Ubuntu 18.04
OSD: Bluestore
6 servers in total, about 60 OSDs, 2TB SSDs each, no HDDs, CFQ scheduler
20 GBit private network
20 G
On Mon, Jul 9, 2018 at 5:28 PM, Benjamin Naber
wrote:
> Hi @all,
>
> Problem seems to be solved, afther downgrading from Kernel 4.17.2 to
> 3.10.0-862.
> Anyone other have issues with newer Kernels and osd nodes?
I'd suggest you pursue that with whoever supports the kernel
exhibiting the problem
On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber wrote:
> Hi @all,
>
> im currently in testing for setup an production environment based on the
> following OSD Nodes:
>
> CEPH Version: luminous 12.2.5
>
> 5x OSD Nodes with following specs:
>
> - 8 Core Intel Xeon 2,0 GHZ
>
> - 96GB Ram
>
> - 10x 1,
hi Caspar,
ty for the reply. ive updatet all SSDs to actual firmware. Still having the
same error. the strange thing is that this issue switches from node to node and
from osd to osd.
HEALTH_WARN 4 slow requests are blocked > 32 sec
REQUEST_SLOW 4 slow requests are blocked > 32 sec
1 ops ar
Hi Ben,
At first glance i would say the CPU's are a bit weak for this setup.
Recommended is to have at least 1 core per OSD. Since you have 8 cores and
10 OSD's there isn't much left for other processes.
Furthermore, did you upgrade the firmware of those DC S4500's to the latest
firmware? (SCV101
Hi @all,
im currently in testing for setup an production environment based on the
following OSD Nodes:
CEPH Version: luminous 12.2.5
5x OSD Nodes with following specs:
- 8 Core Intel Xeon 2,0 GHZ
- 96GB Ram
- 10x 1,92 TB Intel DC S4500 connectet via SATA
- 4x 10 Gbit NIC 2 bonded via LACP f
By looking at the operations that are slow in your dump_*_ops command.
We've found that it's best to move all the metadata stuff for RGW onto
SSDs, i.e., all pools except the actual data pool.
But that depends on your use case and whether the slow requests you are
seeing is actually a problem for
Hello Paul!
Thanks for your answer.
How did you understand it's RGW Metadata stuff?
No, I don't use any SSDs. Where I can find out more about Metadata
pools, using SSD etc?..
Thanks.
Grigory Murashov
Voximplant
15.05.2018 23:42, Paul Emmerich пишет:
Looks like it's mostly RGW metadata stuf
I've been happening into slow requests with my rgw metadata pools just this
week. I tracked it down because the slow requests were on my nmve osds. I
haven't solved the issue yet, but I can confirm that no resharding was
taking place and that the auto-resharder is working as all of my larger
bucket
Looks like it's mostly RGW metadata stuff; are you running your non-data
RGW pools on SSDs (you should, that can help *a lot*)?
Paul
2018-05-15 18:49 GMT+02:00 Grigory Murashov :
> Hello guys!
>
> I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph
> daemon osd.16 dump_historic
Hi Grigory,
looks like osd.16 is having a hard time acknowledging the write request (for
bucket resharding operations from what it looks like) as it takes about 15
seconds for osd.16 to receive the commit confirmation from osd.21 on subop
communication.
Have a go and check at the journal devic
Hello guys!
I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph
daemon osd.16 dump_historic_ops.
Here is the output of ceph heath details in the moment of problem
HEALTH_WARN 20 slow requests are blocked > 32 sec
REQUEST_SLOW 20 slow requests are blocked > 32 sec
20 ops a
Hello David!
2. I set it up 10/10
3. Thanks, my problem was I did it on host where was no osd.15 daemon.
Could you please help to read osd logs?
Here is a part from ceph.log
2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0
185.164.149.2:6789/0 553895 : cluster [INF] Cluster is now healt
2. When logging the 1/5 is what's written to the log file/what's
temporarily stored in memory. If you want to increase logging, you need to
increase both numbers to 20/20 or 10/10. You can also just set it to 20 or
10 and ceph will set them to the same number. I personally do both numbers
to rem
Hi JC!
Thanks for your answer first.
1. I have added output of ceph health detail to Zabbix in case of
warning. So every time I will see with which OSD the problem is.
2. I have default level of all logs. As I see here
http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/
d
Hi Grigory,
are these lines the only lines in your log file for OSD 15?
Just for sanity, what are the log levels you have set, if any, in your config
file away from the default? If you set all log levels to 0 like some people do
you may want to simply go back to the default by commenting out th
Hello Jean-Charles!
I have finally catch the problem, It was at 13-02.
[cephuser@storage-ru1-osd3 ~]$ ceph health detail
HEALTH_WARN 18 slow requests are blocked > 32 sec
REQUEST_SLOW 18 slow requests are blocked > 32 sec
3 ops are blocked > 65.536 sec
15 ops are blocked > 32.768 sec
Hi,
ceph health detail
This will tell you which OSDs are experiencing the problem so you can then go
and inspect the logs and use the admin socket to find out which requests are at
the source.
Regards
JC
> On May 7, 2018, at 03:52, Grigory Murashov wrote:
>
> Hello!
>
> I'm not much experi
Hello!
I'm not much experiensed in ceph troubleshouting that why I ask for help.
I have multiple warnings coming from zabbix as a result of ceph -s
REQUEST_SLOW: HEALTH_WARN : 21 slow requests are blocked > 32 sec
I don't see any hardware problems that time.
I'm able to find the same strings
On Mon, Mar 5, 2018 at 11:20 PM, Brad Hubbard wrote:
> On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev
> wrote:
>> On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote:
>>> Blocked requests and slow requests are synonyms in ceph. They are 2 names
>>> for the exact same thing.
>>>
>>>
>>> On Thu,
On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev wrote:
> On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote:
>> Blocked requests and slow requests are synonyms in ceph. They are 2 names
>> for the exact same thing.
>>
>>
>> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev
>> wrote:
>>>
>>> On Thu,
On Fri, Mar 2, 2018 at 9:56 AM, Alex Gorbachev wrote:
>
> On Fri, Mar 2, 2018 at 4:17 AM Maged Mokhtar wrote:
>>
>> On 2018-03-02 07:54, Alex Gorbachev wrote:
>>
>> On Thu, Mar 1, 2018 at 10:57 PM, David Turner
>> wrote:
>>
>> Blocked requests and slow requests are synonyms in ceph. They are 2 n
On Fri, Mar 2, 2018 at 4:17 AM Maged Mokhtar wrote:
> On 2018-03-02 07:54, Alex Gorbachev wrote:
>
> On Thu, Mar 1, 2018 at 10:57 PM, David Turner
> wrote:
>
> Blocked requests and slow requests are synonyms in ceph. They are 2 names
> for the exact same thing.
>
>
> On Thu, Mar 1, 2018, 10:21 P
On 2018-03-02 07:54, Alex Gorbachev wrote:
> On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote:
> Blocked requests and slow requests are synonyms in ceph. They are 2 names
> for the exact same thing.
>
> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev
> wrote:
> On Thu, Mar 1, 2018 at 2:47 PM
On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote:
> Blocked requests and slow requests are synonyms in ceph. They are 2 names
> for the exact same thing.
>
>
> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev wrote:
>>
>> On Thu, Mar 1, 2018 at 2:47 PM, David Turner
>> wrote:
>> > `ceph health de
Blocked requests and slow requests are synonyms in ceph. They are 2 names
for the exact same thing.
On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev wrote:
> On Thu, Mar 1, 2018 at 2:47 PM, David Turner
> wrote:
> > `ceph health detail` should show you more information about the slow
> > requests.
On Thu, Mar 1, 2018 at 2:47 PM, David Turner wrote:
> `ceph health detail` should show you more information about the slow
> requests. If the output is too much stuff, you can grep out for blocked or
> something. It should tell you which OSDs are involved, how long they've
> been slow, etc. The
`ceph health detail` should show you more information about the slow
requests. If the output is too much stuff, you can grep out for blocked or
something. It should tell you which OSDs are involved, how long they've
been slow, etc. The default is for them to show '> 32 sec' but that may
very wel
Is there a switch to turn on the display of specific OSD issues? Or
does the below indicate a generic problem, e.g. network and no any
specific OSD?
2018-02-28 18:09:36.438300 7f6dead56700 0
mon.roc-vm-sc3c234@0(leader).data_health(46) update_stats avail 56%
total 15997 MB, used 6154 MB, avail 9
Hi Wes,
On 15-1-2018 20:57, Wes Dillingham wrote:
My understanding is that the exact same objects would move back to the
OSD if weight went 1 -> 0 -> 1 given the same Cluster state and same
object names, CRUSH is deterministic so that would be the almost certain
result.
Ok, thanks! So this
1 - 100 of 200 matches
Mail list logo