Hello,
No I don't have osd_scrub_auto_repair, interestingly after about a week after
forgetting about this, an error manifested:
[ERR] OSD_SCRUB_ERRORS: 1 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 4.1d is active+clean+inconsistent, acting [4,2]
which could be
That would have been my next question, did you verify that the
corrupted OSD was a primary? The default deep-scrub config scrubs all
PGs within a week, so yeah, it can take a week until it's detected. It
could have been detected sooner if those objects would have been in
use by clients and
Hi Bailey,
yes, this should be doable using the following steps:
1. Copy the very first block 0~4096 from a different OSD to that
non-working one.
2. Use ceph-bluestore-tool's set-label-key command to modify "osd_uud"
at target OSD
3. Adjust "size" field at target OSD if DB volume size at
Hi all,
My Ceph setup:
- 12 OSD nodes, 4 OSD nodes per rack. Replication of 3, 1 replica per
rack.
- 20 spinning SAS disks per node.
- Some nodes have 256GB RAM, some nodes 128GB.
- CPU varies between Intel E5-2650 and Intel Gold 5317.
- Each node has 10Gbit/s network.
Using rados bench I am g
Hey Igor,
Thanks for the validation, I was also able to validate this in testing on
the weekend myself, though on a db I messed up myself, and it was able to be
restored.
If this ends up being the solution for the customer in this case, I will
follow up here if anyone is curious.
Thanks again Ig
> Hi all,
>
> My Ceph setup:
> - 12 OSD nodes, 4 OSD nodes per rack. Replication of 3, 1 replica per rack.
> - 20 spinning SAS disks per node.
Don't use legacy HDDs if you care about performance.
> - Some nodes have 256GB RAM, some nodes 128GB.
128GB is on the low side for 20 OSDs.
> - CPU
Hi ceph users,
I've seen this happen a couple times and been meaning to ask the group
about it.
Sometimes I get a failed block device and I have to replace it. My normal
process is -
* stop the osd process
* remove the osd from crush map
* rm -rf /var/lib/ceph/osd/-/*
* run mkfs
* start osd proce
Most likely it wasn't, the ceph help or documentation is not very clear about
this:
osd deep-scrub
initiate deep scrub on osd , or use
to deep scrub all
It doesn't say anything like "initiate dee
On 2024-06-10 15:20, Anthony D'Atri wrote:
Hi all,
My Ceph setup:
- 12 OSD nodes, 4 OSD nodes per rack. Replication of 3, 1 replica per
rack.
- 20 spinning SAS disks per node.
Don't use legacy HDDs if you care about performance.
You are right here, but we use Ceph mainly for RBD. It perfor
>>> - 20 spinning SAS disks per node.
>> Don't use legacy HDDs if you care about performance.
>
> You are right here, but we use Ceph mainly for RBD. It performs 'good enough'
> for our RBD load.
You use RBD for archival?
>>> - Some nodes have 256GB RAM, some nodes 128GB.
>> 128GB is on the
Scrubs are of PGs not OSDs, the lead OSD for a PG orchestrates subops to
secondary OSDs. If you can point me to where this is in docs/src I'll clarify
it, ideally if you can put in a tracker ticket and send me a link.
Scrubbing all PGs on an OSD at once or even in sequence would be impactful.
# quincy now past estimated 2024-06-01 end-of-life
will 17.2.8 be the last point release? maybe not, depending on timing
# centos 8 eol
* Casey tried to summarize the fallout in
https://lists.ceph.io/hyperkitty/list/d...@ceph.io/thread/H7I4Q4RAIT6UZQNPPZ5O3YB6AUXLLAFI/
* c8 builds were disabled
> Not the most helpful response, but on a (admittedly well-tuned)
Actually this was the most helpful since you ran the same rados bench
command. I'm trying to stay away from rbd & qemu issues and just test
rados bench on a non-virtualized client.
I have a test instance newer drives, CPUs, and Ce
Eh? cf. Mark and Dan's 1TB/s presentation.
> On Jun 10, 2024, at 13:58, Mark Lehrer wrote:
>
> It
> seems like Ceph still hasn't adjusted to SSD performance.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-user
On 2024-06-10 17:42, Anthony D'Atri wrote:
- 20 spinning SAS disks per node.
Don't use legacy HDDs if you care about performance.
You are right here, but we use Ceph mainly for RBD. It performs 'good
enough' for our RBD load.
You use RBD for archival?
No, storage for (light-weight) virtua
As this is my first submission to the Ceph docs, I want to start by saying
a big thank you to the Ceph team for all the efforts that have been put
into improving the docs. The improvements already made have been many and
have made it easier for me to operate Ceph.
In
https://docs.ceph.com/en/lates
>>> You are right here, but we use Ceph mainly for RBD. It performs 'good
>>> enough' for our RBD load.
>> You use RBD for archival?
>
> No, storage for (light-weight) virtual machines.
I'm surprised that it's enough, I've seen HDDs fail miserably in that role.
> The (CPU) load on the
You could try manually deleting the files from the directory
fragments, using `rados` commands. Make sure to flush your MDS journal
first and take the fs offline (`ceph fs fail`).
On Tue, Jun 4, 2024 at 8:50 AM Stolte, Felix wrote:
>
> Hi Patrick,
>
> it has been a year now and we did not have a
On 2024-06-10 21:37, Anthony D'Atri wrote:
You are right here, but we use Ceph mainly for RBD. It performs
'good enough' for our RBD load.
You use RBD for archival?
No, storage for (light-weight) virtual machines.
I'm surprised that it's enough, I've seen HDDs fail miserably in that
role.
We have a reef 18.2.2 cluster with 6 radosgw servers on Rocky 8.9. The radosgw
servers are not fronted by anything like HAProxy as the clients connect
directly to a DNS name via a round-robin DNS. Each of the radosgw servers have
a certificate using SAN entries for all 6 radosgw servers as well
Joel,
Thank you for this message. This is a model of what in a perfect world user
communication with upstream documentation could be.
I identify four things in your message that I can work on immediately:
1. leader/peon documentation improvement
2. Ceph command-presentation convention standardi
>> To be clear, you don't need more nodes. You can add RGWs to the ones you
>> already have. You have 12 OSD nodes - why not put an RGW on each?
> Might be an option, just don't like the idea to host multiple components on
> nodes. But I'll consider it.
I really don't like mixing mon/mgr wi
If they can do 1 TB/s with a single 16K write thread, that will be
quite impressive :DOtherwise not really applicable. Ceph scaling
has always been good.
More seriously, would you mind sending a link to this?
Thanks!
Mark
On Mon, Jun 10, 2024 at 12:01 PM Anthony D'Atri wrote:
>
> Eh? cf
What specifically are your OSD devices?
> On Jun 10, 2024, at 22:23, Phong Tran Thanh wrote:
>
> Hi ceph user!
>
> I am encountering a problem with IOPS and disk utilization of OSD. Sometimes,
> my disk peaks in IOPS and utilization become too high, which affects my
> cluster and causes slow
24 matches
Mail list logo