Hi Lukasz,
if this is filestore then most probably my comments are irrelevant. The
issue I expected is BlueStore specific
Unfortunately I'm not an expert in filestore hence unable to help in
further investigation. Sorry...
Thanks,
Igor
On 7/9/2019 11:39 AM, Luk wrote:
We have (stil
We have (still) on these OSDs filestore.
Regards
Lukasz
> Hi Igor,
> ThankYoufor Your input, will try Your suggestion with
> ceph-objectstore-tool.
> But for now it looks like main problem is this:
> 2019-07-09 09:29:25.410839 7f5e4b64f700 1 heartbeat_map is_healthy
> 'OSD::o
Hi Igor,
ThankYoufor Your input, will try Your suggestion with
ceph-objectstore-tool.
But for now it looks like main problem is this:
2019-07-09 09:29:25.410839 7f5e4b64f700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f5e20e87700' had timed out after 15
2019-07-09 09:
Hi Lukasz,
I've seen something like that - slow requests and relevant OSD reboots
on suicide timeout at least twice with two different clusters. The root
cause was slow omap listing for some objects which had started to happen
after massive removals from RocksDB.
To verify if this is the cas
On Wed, Jul 3, 2019 at 4:47 PM Luk wrote:
>
>
> this pool is that 'big' :
>
> [root@ceph-mon-01 ~]# rados df | grep -e index -e WR
> POOL_NAME USEDOBJECTS CLONES COPIES
> MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
>
> default.rgw.buckets.index
Hi,
> Den ons 3 juli 2019 kl 09:01 skrev Luk :
> Hello,
> I have strange problem with scrubbing.
> When scrubbing starts on PG which belong to default.rgw.buckets.index
> pool, I can see that this OSD is very busy (see attachment), and starts
> showing many
> slow request, after the
Den ons 3 juli 2019 kl 09:01 skrev Luk :
> Hello,
>
> I have strange problem with scrubbing.
>
> When scrubbing starts on PG which belong to default.rgw.buckets.index
> pool, I can see that this OSD is very busy (see attachment), and starts
> showing many
> slow request, after the scrubbin
Hello Robert,
I did not make any changes, so I'm still using the prio queue.
Regards
Le lun. 10 juin 2019 à 17:44, Robert LeBlanc a
écrit :
> I'm glad it's working, to be clear did you use wpq, or is it still the
> prio queue?
>
> Sent from a mobile device, please excuse any typos.
>
> On Mon, J
I'm glad it's working, to be clear did you use wpq, or is it still the prio
queue?
Sent from a mobile device, please excuse any typos.
On Mon, Jun 10, 2019, 4:45 AM BASSAGET Cédric
wrote:
> an update from 12.2.9 to 12.2.12 seems to have fixed the problem !
>
> Le lun. 10 juin 2019 à 12:25, BASS
an update from 12.2.9 to 12.2.12 seems to have fixed the problem !
Le lun. 10 juin 2019 à 12:25, BASSAGET Cédric
a écrit :
> Hi Robert,
> Before doing anything on my prod env, I generate r/w on ceph cluster using
> fio .
> On my newest cluster, release 12.2.12, I did not manage to get
> the (REQ
Hi Robert,
Before doing anything on my prod env, I generate r/w on ceph cluster using
fio .
On my newest cluster, release 12.2.12, I did not manage to get
the (REQUEST_SLOW) warning, even if my OSD disk usage goes above 95% (fio
ran from 4 diffrent hosts)
On my prod cluster, release 12.2.9, as soo
On Mon, Jun 10, 2019 at 1:00 AM BASSAGET Cédric <
cedric.bassaget...@gmail.com> wrote:
> Hello Robert,
> My disks did not reach 100% on the last warning, they climb to 70-80%
> usage. But I see rrqm / wrqm counters increasing...
>
> Device: rrqm/s wrqm/s r/s w/srkB/swkB/s
Hello Robert,
My disks did not reach 100% on the last warning, they climb to 70-80%
usage. But I see rrqm / wrqm counters increasing...
Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0.00 4.000.00
With the low number of OSDs, you are probably satuarting the disks. Check
with `iostat -xd 2` and see what the utilization of your disks are. A lot
of SSDs don't perform well with Ceph's heavy sync writes and performance is
terrible.
If some of your drives are 100% while others are lower utilizati
ate: 395 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> >> > 2019-05-20 00:04:19.234877 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173641 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 238 slow
> requests are blocked > 32 sec. Impli
-s43 mon.0 10.23.27.153:6789/0
>> > 173640 : cluster [WRN] Health check update: 395 slow requests are blocked
>> > > 32 sec. Implicated osds 51 (REQUEST_SLOW)
>> > 2019-05-20 00:04:19.234877 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
>> > 173641 : cluster [INF
/0
> 174035 : cluster [INF] overall HEALTH_OK
> >
> > The parameters of our environment:
> >
> > Storage System (OSDs and MONs)
> >
> > Ceph 12.2.11
> > Ubuntu 16.04/1804
> > 30 * 8GB spinners distributed over
> >
> > Client
> >
> &g
On Tue, May 21, 2019 at 11:28 AM Marc Schöchlin wrote:
>
> Hello Jason,
>
> Am 20.05.19 um 23:49 schrieb Jason Dillaman:
>
> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote:
>
> Hello cephers,
>
> we have a few systems which utilize a rbd-bd map/mount to get access to a rbd
> volume.
> (Thi
Hello Jason,
Am 20.05.19 um 23:49 schrieb Jason Dillaman:
> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote:
>> Hello cephers,
>>
>> we have a few systems which utilize a rbd-bd map/mount to get access to a
>> rbd volume.
>> (This problem seems to be related to "[ceph-users] Slow requests f
>
> [client]
> rbd cache = true
> rbd cache size = 536870912
> rbd cache max dirty = 268435456
> rbd cache target dirty = 134217728
> rbd cache max dirty age = 30
> rbd readahead max bytes = 4194304
>
>
> Regards
> Marc
>
> Am 13.05.19 um 07:40 schrieb EDH -
el Rios Fernandez:
> Hi Marc,
>
> Try to compact OSD with slow request
>
> ceph tell osd.[ID] compact
>
> This will make the OSD offline for some seconds(SSD) to minutes(HDD) and
> perform a compact of OMAP database.
>
> Regards,
>
>
>
>
> -----Mens
Quoting Marc Schöchlin (m...@256bit.org):
> Out new setup is now:
> (12.2.10 on Ubuntu 16.04)
>
> [osd]
> osd deep scrub interval = 2592000
> osd scrub begin hour = 19
> osd scrub end hour = 6
> osd scrub load threshold = 6
> osd scrub sleep = 0.3
> osd snap trim sleep = 0.4
> pg max concurrent s
ome seconds(SSD) to minutes(HDD) and
> perform a compact of OMAP database.
>
> Regards,
>
>
>
>
> -Mensaje original-
> De: ceph-users En nombre de Marc Schöchlin
> Enviado el: lunes, 13 de mayo de 2019 6:59
> Para: ceph-users@lists.ceph.com
> Asunto: Re: [ceph-
mayo de 2019 6:59
Para: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Slow requests from bluestore osds
Hello cephers,
one week ago we replaced the bluestore cache size by "osd memory target" and
removed the detail memory settings.
This storage class now runs 42*8GB spinners with a
Hello cephers,
one week ago we replaced the bluestore cache size by "osd memory target" and
removed the detail memory settings.
This storage class now runs 42*8GB spinners with a permanent write workload of
2000-3000 write IOPS, and 1200-8000 read IOPS.
Out new setup is now:
(12.2.10 on Ubuntu
Hello cephers,
as described - we also have the slow requests in our setup.
We recently updated from ceph 12.2.4 to 12.2.10, updated Ubuntu 16.04 to the
latest patchlevel (with kernel 4.15.0-43) and applied dell firmware 2.8.0.
On 12.2.5 (before updating the cluster) we had in a frequency of 10m
On Fri, Jan 18, 2019 at 11:06:54AM -0600, Mark Nelson wrote:
> IE even though you guys set bluestore_cache_size to 1GB, it is being
> overridden by bluestore_cache_size_ssd.
Isn't it vice versa [1]?
[1]
https://github.com/ceph/ceph/blob/luminous/src/os/bluestore/BlueStore.cc#L3976
--
Mykola G
Am 15.01.19 um 12:45 schrieb Marc Roos:
I upgraded this weekend from 12.2.8 to 12.2.10 without such
issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel -
has
solved this issue.
Greets,
Stefan
-Original Message-
From: Stefan Priebe - Profih
;>>> 21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
>>>>>> ec=133405/133405 lis/c 1318472/1278145 les/c/f
>>>>>> 1318473/1278148/1211861 131
>>>>>> 8472/1318472/1318472) [33,3,22] r=0 lpr=1318472
>>>>>> pi
s only in the recovery case.
Greets,
Stefan
Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG:
Am 15.01.19 um 12:45 schrieb Marc Roos:
I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel -
aded m=183 snaptrimq=[ec1a0~1,ec808~1]
>>>> mbc={255={(2+0)=183,(3+0)=3}}] _update_calc_stats ml 183 upset size 3 up 2
>>>>
>>>> Greets,
>>>> Stefan
>>>> Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG:
>>>>> Hi,
>&
;>> Greets,
>>> Stefan
>>> Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG:
>>>> Hi,
>>>>
>>>> no ok it was not. Bug still present. It was only working because the
>>>> osdmap was so far away that it has started backf
I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel - has
solved this issue.
Greets,
Stefan
-Original Message-----
From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
Sent:
> Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG:
>>>>
>>>> Am 15.01.19 um 12:45 schrieb Marc Roos:
>>>>>
>>>>> I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
>>>>> (osd's are
such issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel - has
solved this issue.
Greets,
Stefan
-Original Message-----
From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
Sent: 15 January 2019 10:26
To: ceph-users@lists.ceph.com
Cc: n.
the recovery case.
>>
>> Greets,
>> Stefan
>>
>> Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG:
>>>
>>> Am 15.01.19 um 12:45 schrieb Marc Roos:
>>>>
>>>> I upgraded this weekend from 12.2.8 to 12.2.10 without su
t;> it turns out this was a kernel bug. Updating to a newer kernel - has
>> solved this issue.
>>
>> Greets,
>> Stefan
>>
>>
>>> -Original Message-
>>> From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
>>> Sen
-Original Message-
>> From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
>> Sent: 15 January 2019 10:26
>> To: ceph-users@lists.ceph.com
>> Cc: n.fahldi...@profihost.ag
>> Subject: Re: [ceph-users] slow requests and high i/o / read rate on
>
ofihost.ag]
Sent: 15 January 2019 10:26
To: ceph-users@lists.ceph.com
Cc: n.fahldi...@profihost.ag
Subject: Re: [ceph-users] slow requests and high i/o / read rate on
bluestore osds after upgrade 12.2.8 -> 12.2.10
Hello list,
i also tested current upstream/luminous branch and it happens as well. A
From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
> Sent: 15 January 2019 10:26
> To: ceph-users@lists.ceph.com
> Cc: n.fahldi...@profihost.ag
> Subject: Re: [ceph-users] slow requests and high i/o / read rate on
> bluestore osds after upgrade 12.2.8 -> 12.2.10
>
I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
(osd's are idle)
-Original Message-
From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
Sent: 15 January 2019 10:26
To: ceph-users@lists.ceph.com
Cc: n.fahldi...@profihost.ag
Subject: Re: [ceph-
Hello list,
i also tested current upstream/luminous branch and it happens as well. A
clean install works fine. It only happens on upgraded bluestore osds.
Greets,
Stefan
Am 14.01.19 um 20:35 schrieb Stefan Priebe - Profihost AG:
> while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm expe
Hi Stefan,
Any idea if the reads are constant or bursty? One cause of heavy reads
is when rocksdb is compacting and has to read SST files from disk. It's
also possible you could see heavy read traffic during writes if data has
to be read from SST files rather than cache. It's possible this
Hi Paul,
Am 14.01.19 um 21:39 schrieb Paul Emmerich:
> What's the output of "ceph daemon osd. status" on one of the OSDs
> while it's starting?
{
"cluster_fsid": "b338193d-39e0-40e9-baba-4965ef3868a3",
"osd_fsid": "d95d0e3b-7441-4ab0-869c-fe0551d3bd52",
"whoami": 2,
"state": "act
What's the output of "ceph daemon osd. status" on one of the OSDs
while it's starting?
Is the OSD crashing and being restarted all the time? Anything weird
in the log files? Was there recovery or backfill during the upgrade?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contac
Hi all,
after increasing mon_max_pg_per_osd number ceph starts rebalancing as usual.
However, the slow requests warnings are still there, even after setting
primary-affinity to 0 beforehand.
By the other hand, if I destroy the osd, ceph will start rebalancing unless
noout flag is set, am I ri
You can prevent creation of the PGs on the old filestore OSDs (which
seems to be the culprit here) during replacement by replacing the
disks the hard way:
* ceph osd destroy osd.X
* re-create with bluestore under the same id (ceph volume ... --osd-id X)
it will then just backfill onto the same di
Hi,
to reduce impact on clients during migration I would set the OSD's
primary-affinity to 0 beforehand. This should prevent the slow
requests, at least this setting has helped us a lot with problematic
OSDs.
Regards
Eugen
Zitat von Jaime Ibar :
Hi all,
we recently upgrade from Jewel
Hello,
2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update:
249 PGs pending on creation (PENDING_CREATING_PGS)
This error might indicate that you are hitting a PG limit per osd.
Here some information on it
https://ceph.com/community/new-luminous-pg-overdose-protection/ . You
migh
I solved my slow requests by increasing the size of block.db. Calculate 4% per
stored TB and preferably host the DB in NVME.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hello Uwe,
as described in my mail we are running 4.13.0-39.
In conjunction with some later mails of this thread it seems that this problem
might related to os/microcode (spectre) updates.
I am planning a ceph/ubuntu upgrade in the next week because of various
reasons, let's see what happens...
On Sat, Sep 01, 2018 at 12:45:06PM -0400, Brett Chancellor wrote:
> Hi Cephers,
> I am in the process of upgrading a cluster from Filestore to bluestore,
> but I'm concerned about frequent warnings popping up against the new
> bluestore devices. I'm frequently seeing messages like this, although
Mine is currently at 1000 due to the high number of pgs we had coming from
Jewel. I do find it odd that only the bluestore OSDs have this issue.
Filestore OSDs seem to be unaffected.
On Wed, Sep 5, 2018, 3:43 PM Samuel Taylor Liston
wrote:
> Just a thought - have you looked at increasing your "—
Just a thought - have you looked at increasing your "—mon_max_pg_per_osd” both
on the mons and osds? I was having a similar issue while trying to add more
OSDs to my cluster (12.2.27, CentOS7.5, 3.10.0-862.9.1.el7.x86_64). I
increased mine to 300 temporarily while adding OSDs and stopped havi
I've experienced the same thing during scrubbing and/or any kind of
expansion activity.
*Daniel Pryor*
On Mon, Sep 3, 2018 at 2:13 AM Marc Schöchlin wrote:
> Hi,
>
> we are also experiencing this type of behavior for some weeks on our not
> so performance critical hdd pools.
> We haven't spent
I'm running Centos 7.5. If I turn off spectre/meltdown protection then a
security sweep will disconnect it from the network.
-Brett
On Wed, Sep 5, 2018 at 2:24 PM, Uwe Sauter wrote:
> I'm also experiencing slow requests though I cannot point it to scrubbing.
>
> Which kernel do you run? Would y
I'm also experiencing slow requests though I cannot point it to scrubbing.
Which kernel do you run? Would you be able to test against the same kernel with Spectre/Meltdown mitigations disabled
("noibrs noibpb nopti nospectre_v2" as boot option)?
Uwe
Am 05.09.18 um 19:30 schrieb Brett
Marc,
As with you, this problem manifests itself only when the bluestore OSD is
involved in some form of deep scrub. Anybody have any insight on what
might be causing this?
-Brett
On Mon, Sep 3, 2018 at 4:13 AM, Marc Schöchlin wrote:
> Hi,
>
> we are also experiencing this type of behavior f
The warnings look like this.
6 ops are blocked > 32.768 sec on osd.219
1 osds have slow requests
On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza wrote:
> On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
> wrote:
> > Hi Cephers,
> > I am in the process of upgrading a cluster from Filestore to blue
On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
wrote:
> Hi Cephers,
> I am in the process of upgrading a cluster from Filestore to bluestore,
> but I'm concerned about frequent warnings popping up against the new
> bluestore devices. I'm frequently seeing messages like this, although the
> sp
2. What is the best way to remove an OSD node from the cluster during
maintenance? ceph osd set noout is not the way to go, since no OSD's are
out during yum update and the node is still part of the cluster and will
handle I/O.
I think the best way is the combination of "ceph osd set noout" + stop
On Mon, Jul 9, 2018 at 5:28 PM, Benjamin Naber
wrote:
> Hi @all,
>
> Problem seems to be solved, afther downgrading from Kernel 4.17.2 to
> 3.10.0-862.
> Anyone other have issues with newer Kernels and osd nodes?
I'd suggest you pursue that with whoever supports the kernel
exhibiting the problem
On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber wrote:
> Hi @all,
>
> im currently in testing for setup an production environment based on the
> following OSD Nodes:
>
> CEPH Version: luminous 12.2.5
>
> 5x OSD Nodes with following specs:
>
> - 8 Core Intel Xeon 2,0 GHZ
>
> - 96GB Ram
>
> - 10x 1,
hi Caspar,
ty for the reply. ive updatet all SSDs to actual firmware. Still having the
same error. the strange thing is that this issue switches from node to node and
from osd to osd.
HEALTH_WARN 4 slow requests are blocked > 32 sec
REQUEST_SLOW 4 slow requests are blocked > 32 sec
1 ops ar
Hi Ben,
At first glance i would say the CPU's are a bit weak for this setup.
Recommended is to have at least 1 core per OSD. Since you have 8 cores and
10 OSD's there isn't much left for other processes.
Furthermore, did you upgrade the firmware of those DC S4500's to the latest
firmware? (SCV101
By looking at the operations that are slow in your dump_*_ops command.
We've found that it's best to move all the metadata stuff for RGW onto
SSDs, i.e., all pools except the actual data pool.
But that depends on your use case and whether the slow requests you are
seeing is actually a problem for
Hello Paul!
Thanks for your answer.
How did you understand it's RGW Metadata stuff?
No, I don't use any SSDs. Where I can find out more about Metadata
pools, using SSD etc?..
Thanks.
Grigory Murashov
Voximplant
15.05.2018 23:42, Paul Emmerich пишет:
Looks like it's mostly RGW metadata stuf
I've been happening into slow requests with my rgw metadata pools just this
week. I tracked it down because the slow requests were on my nmve osds. I
haven't solved the issue yet, but I can confirm that no resharding was
taking place and that the auto-resharder is working as all of my larger
bucket
Looks like it's mostly RGW metadata stuff; are you running your non-data
RGW pools on SSDs (you should, that can help *a lot*)?
Paul
2018-05-15 18:49 GMT+02:00 Grigory Murashov :
> Hello guys!
>
> I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph
> daemon osd.16 dump_historic
Hi Grigory,
looks like osd.16 is having a hard time acknowledging the write request (for
bucket resharding operations from what it looks like) as it takes about 15
seconds for osd.16 to receive the commit confirmation from osd.21 on subop
communication.
Have a go and check at the journal devic
Hello guys!
I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph
daemon osd.16 dump_historic_ops.
Here is the output of ceph heath details in the moment of problem
HEALTH_WARN 20 slow requests are blocked > 32 sec
REQUEST_SLOW 20 slow requests are blocked > 32 sec
20 ops a
Hello David!
2. I set it up 10/10
3. Thanks, my problem was I did it on host where was no osd.15 daemon.
Could you please help to read osd logs?
Here is a part from ceph.log
2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0
185.164.149.2:6789/0 553895 : cluster [INF] Cluster is now healt
2. When logging the 1/5 is what's written to the log file/what's
temporarily stored in memory. If you want to increase logging, you need to
increase both numbers to 20/20 or 10/10. You can also just set it to 20 or
10 and ceph will set them to the same number. I personally do both numbers
to rem
Hi JC!
Thanks for your answer first.
1. I have added output of ceph health detail to Zabbix in case of
warning. So every time I will see with which OSD the problem is.
2. I have default level of all logs. As I see here
http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/
d
Hi Grigory,
are these lines the only lines in your log file for OSD 15?
Just for sanity, what are the log levels you have set, if any, in your config
file away from the default? If you set all log levels to 0 like some people do
you may want to simply go back to the default by commenting out th
Hello Jean-Charles!
I have finally catch the problem, It was at 13-02.
[cephuser@storage-ru1-osd3 ~]$ ceph health detail
HEALTH_WARN 18 slow requests are blocked > 32 sec
REQUEST_SLOW 18 slow requests are blocked > 32 sec
3 ops are blocked > 65.536 sec
15 ops are blocked > 32.768 sec
Hi,
ceph health detail
This will tell you which OSDs are experiencing the problem so you can then go
and inspect the logs and use the admin socket to find out which requests are at
the source.
Regards
JC
> On May 7, 2018, at 03:52, Grigory Murashov wrote:
>
> Hello!
>
> I'm not much experi
On Mon, Mar 5, 2018 at 11:20 PM, Brad Hubbard wrote:
> On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev
> wrote:
>> On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote:
>>> Blocked requests and slow requests are synonyms in ceph. They are 2 names
>>> for the exact same thing.
>>>
>>>
>>> On Thu,
On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev wrote:
> On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote:
>> Blocked requests and slow requests are synonyms in ceph. They are 2 names
>> for the exact same thing.
>>
>>
>> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev
>> wrote:
>>>
>>> On Thu,
On Fri, Mar 2, 2018 at 9:56 AM, Alex Gorbachev wrote:
>
> On Fri, Mar 2, 2018 at 4:17 AM Maged Mokhtar wrote:
>>
>> On 2018-03-02 07:54, Alex Gorbachev wrote:
>>
>> On Thu, Mar 1, 2018 at 10:57 PM, David Turner
>> wrote:
>>
>> Blocked requests and slow requests are synonyms in ceph. They are 2 n
On Fri, Mar 2, 2018 at 4:17 AM Maged Mokhtar wrote:
> On 2018-03-02 07:54, Alex Gorbachev wrote:
>
> On Thu, Mar 1, 2018 at 10:57 PM, David Turner
> wrote:
>
> Blocked requests and slow requests are synonyms in ceph. They are 2 names
> for the exact same thing.
>
>
> On Thu, Mar 1, 2018, 10:21 P
On 2018-03-02 07:54, Alex Gorbachev wrote:
> On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote:
> Blocked requests and slow requests are synonyms in ceph. They are 2 names
> for the exact same thing.
>
> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev
> wrote:
> On Thu, Mar 1, 2018 at 2:47 PM
On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote:
> Blocked requests and slow requests are synonyms in ceph. They are 2 names
> for the exact same thing.
>
>
> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev wrote:
>>
>> On Thu, Mar 1, 2018 at 2:47 PM, David Turner
>> wrote:
>> > `ceph health de
Blocked requests and slow requests are synonyms in ceph. They are 2 names
for the exact same thing.
On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev wrote:
> On Thu, Mar 1, 2018 at 2:47 PM, David Turner
> wrote:
> > `ceph health detail` should show you more information about the slow
> > requests.
On Thu, Mar 1, 2018 at 2:47 PM, David Turner wrote:
> `ceph health detail` should show you more information about the slow
> requests. If the output is too much stuff, you can grep out for blocked or
> something. It should tell you which OSDs are involved, how long they've
> been slow, etc. The
`ceph health detail` should show you more information about the slow
requests. If the output is too much stuff, you can grep out for blocked or
something. It should tell you which OSDs are involved, how long they've
been slow, etc. The default is for them to show '> 32 sec' but that may
very wel
Hi Wes,
On 15-1-2018 20:57, Wes Dillingham wrote:
My understanding is that the exact same objects would move back to the
OSD if weight went 1 -> 0 -> 1 given the same Cluster state and same
object names, CRUSH is deterministic so that would be the almost certain
result.
Ok, thanks! So this
My understanding is that the exact same objects would move back to the OSD
if weight went 1 -> 0 -> 1 given the same Cluster state and same object
names, CRUSH is deterministic so that would be the almost certain result.
On Mon, Jan 15, 2018 at 2:46 PM, lists wrote:
> Hi Wes,
>
> On 15-1-2018 20
Hi Wes,
On 15-1-2018 20:32, Wes Dillingham wrote:
I dont hear a lot of people discuss using xfs_fsr on OSDs and going over
the mailing list history it seems to have been brought up very
infrequently and never as a suggestion for regular maintenance. Perhaps
its not needed.
True, it's just some
I dont hear a lot of people discuss using xfs_fsr on OSDs and going over
the mailing list history it seems to have been brought up very infrequently
and never as a suggestion for regular maintenance. Perhaps its not needed.
One thing to consider trying, and to rule out something funky with the XFS
Here's some good reading for you.
https://www.spinics.net/lists/ceph-users/msg32895.html
I really like how Wido puts it, "Loosing two disks at the same time is
something which doesn't happen that much, but if it happens you don't want
to modify any data on the only copy which you still have left.
I disagree.
We have the following setting...
osd pool default size = 3
osd pool default min size = 1
There's maths that need to be conducted for 'osd pool default size'. A
setting of 3 and 1 allows for 2 disks to fail ... at the same time ...
without a loss of data. This is standard storage
Hi David,
What is your min_size in the cache pool? If your min_size is 2, then the
cluster would block requests to that pool due to it having too few copies
available.
this is a little embarassing, but it seems it was the min_size indeed.
I had changed this setting a couple of weeks ago, bu
PPS - or min_size 1 in production
On Wed, Nov 1, 2017 at 10:08 AM David Turner wrote:
> What is your min_size in the cache pool? If your min_size is 2, then the
> cluster would block requests to that pool due to it having too few copies
> available.
>
> PS - Please don't consider using rep_size
What is your min_size in the cache pool? If your min_size is 2, then the
cluster would block requests to that pool due to it having too few copies
available.
PS - Please don't consider using rep_size 2 in production.
On Wed, Nov 1, 2017 at 5:14 AM Eugen Block wrote:
> Hi experts,
>
> we have u
On Fri, Oct 20, 2017 at 8:23 PM, Ольга Ухина wrote:
> I was able to collect dump data during slow request, but this time I saw
> that it was related to high load average and iowait so I keep watching.
> And it was on particular two osds, but yesterday on other osds.
> I see in dump of these two
I was able to collect dump data during slow request, but this time I saw
that it was related to high load average and iowait so I keep watching.
And it was on particular two osds, but yesterday on other osds.
I see in dump of these two osds that operations are stuck on queued_for_pg,
for example:
Hi! Thanks for your help.
How can I increase interval of history for command ceph daemon osd.
dump_historic_ops? It shows only for several minutes.
I see slow requests on random osds each time and on different hosts (there
are three). As I see in logs the problem doesn't relate to scrubbing.
Regar
On Fri, Oct 20, 2017 at 1:09 PM, J David wrote:
> On Thu, Oct 19, 2017 at 9:42 PM, Brad Hubbard wrote:
>> I guess you have both read and followed
>> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests
>>
>> What was the result?
On Thu, Oct 19, 2017 at 9:42 PM, Brad Hubbard wrote:
> I guess you have both read and followed
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests
>
> What was the result?
Not sure if you’re asking Ольга or myself, but in my cas
1 - 100 of 169 matches
Mail list logo