Hi all,
We have a Ceph cluster which has been expanded from 10 to 16 nodes.
Each node has between 14 and 16 OSDs of which 2 are NVMe disks.
Most disks (except NVMe's) are 16TB large.
The expansion of 16 nodes went ok, but we've configured the system to
prevent auto balance towards the new disks (
Hi Anthony,
Thnx for the reply
Average values:
User: 3.5
Idle: 78.4
Wait: 20
System: 1.2
/K.
Op di 6 okt. 2020 om 10:18 schreef Anthony D'Atri :
>
>
> >
> > Diving onto the nodes we could see that the OSD daemons are consuming the
> > CPU power, resulting in average CPU loads going near 10 (!)
Thanks to @Anthony:
Diving further I see that I probably was blinded by the CPU load...
I see that some disks are very slow (so my first observations were
incorrect), and the latency seen using iostat seems more or less the same
as what we see in the dump_historic_ops. (+ 3s for r_await)
So, it l
ov :
> Hi Kristof,
>
> are you seeing high (around 100%) OSDs' disks (main or DB ones)
> utilization along with slow ops?
>
>
> Thanks,
>
> Igor
>
> On 10/6/2020 11:09 AM, Kristof Coucke wrote:
> > Hi all,
> >
> > We have a Ceph cluster which h
using the disk, but I can't find the process
immediately. I've read something that there can be old client processes
that keep on connecting to an OSD for retrieving data for a specific PG
while that PG is no longer available on that disk.
Op di 6 okt. 2020 om 11:41 schreef Kristof Couck
Is there a way that I can check if this process is causing performance
issues?
Can I check somehow if this process is causing the issue?
Op di 6 okt. 2020 om 13:05 schreef Igor Fedotov :
>
> On 10/6/2020 1:04 PM, Kristof Coucke wrote:
>
> Another strange thing is going on:
&
3 0.064% 99.957% ",
"": "( 10, 15 ]2 0.043% 100.000% ",
"": "( 22, 34 ]1 0.021% 100.021% ",
"": "",
"": "",
"": "** DB Stats **"
it ought to be for L4)
> in bluefs section of OSD perf counters dump...
>
>
> On 10/6/2020 3:18 PM, Kristof Coucke wrote:
>
> Ok, I did the compact on 1 osd.
> The utilization is back to normal, so that's good... Thumbs up to you guys!
> Though, one thing I want to
Hi all,
We're now having trouble over a week with our Ceph cluster.
Short info regarding our situation:
- Original cluster had 10 OSD nodes, each having 16 OSDs
- Expansion was necessary, so another 6 nodes have been added
- Version: 14.2.11
Last week we saw heavily loaded OSD servers, after help
Diving into the different logging and searching for answers, I came across
the following:
PG_DEGRADED Degraded data redundancy: 2101057/10339536570 objects degraded
(0.020%), 3 pgs degraded, 3 pgs undersized
pg 1.4b is stuck undersized for 63114.227655, current state
active+undersized+degraded
I'll answer it myself:
When CRUSH fails to find enough OSDs to map to a PG, it will show as a
2147483647 which is ITEM_NONE or no OSD found.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi,
I have a cluster with 182 OSDs, this has been expanded towards 282 OSDs.
Some disks were near full.
The new disks have been added with initial weight = 0.
The original plan was to increase this slowly towards their full weight
using the gentle reweight script. However, this is going way too sl
influence this
> to some degree by using force_recovery commands on PGs on the fullest OSDs.
>
> Best regards and good luck,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
> From: Kristof Couc
er :
> Hi Kristof,
>
> I missed that: why do you need to do manual compaction?
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________
> From: Kristof Coucke
> Sent: 26 Octo
Okay, so far I figured out that the value in the Ceph dashboard is gathered
from a Metric from Prometheus (*ceph_osd_numpg*). Is there anyone here that
knows how this is populated?
Op ma 26 okt. 2020 om 12:52 schreef Kristof Coucke :
> Hi Frank,
>
> We're having a lot of small
Hi,
My cluster is in a warning state as it is rebalancing after I've added a
bunch of disks. (no issue here!)
Though, there are a few things which I just cannot understand... I hope
someone can help me... I'm getting hopeless finding the answers... If you
can answer any question (even one), it wil
Hi,
We are having slow osd's... A hot topic to search on it... I've tried to
dive as deep as I can, but I need to know which debug setting will help me
to dive even deeper...
Okay: situation:
- After expansion lot's of backfill operations are running spread over the
osd's.
- max_backfills is set
Hi all,
I have an issue on my Ceph cluster.
For one of my pools I have 107TiB STORED and 298TiB USED.
This is strange, since I've configured erasure coding (6 data chunks, 3
coding chunks).
So, in an ideal world this should result in approx. 160.5TiB USED.
The question now is why this is the case
Hi all,
I have an issue on my Ceph cluster.
For one of my pools I have 107TiB STORED and 298TiB USED.
This is strange, since I've configured erasure coding (6 data chunks, 3
coding chunks).
So, in an ideal world this should result in approx. 160.5TiB USED.
The question now is why this is the case
compatible?
The current setting is 64KB
Op wo 12 feb. 2020 om 12:57 schreef Kristof Coucke :
> Hi all,
>
> I have an issue on my Ceph cluster.
> For one of my pools I have 107TiB STORED and 298TiB USED.
> This is strange, since I've configured erasure coding (6 data chunks, 3
&g
Hi Simon and Janne,
Thanks for the reply.
It seems indeed related to the bluestore_min_alloc_size.
In an old thread I've also found the following:
*S3 object saving pipeline:*
*- S3 object is divided into multipart shards by client.*
*- Rgw shards each multipart shard into rados objects of siz
Hi all,
A while back, I indicated we had an issue with our cluster filling up too
fast. After checking everything, we've concluded this was because we had a
lot of small files and the allocation size on the bluestore was too high
(64kb).
We are now recreating the OSD's (2 disk at the same time) bu
22 matches
Mail list logo