from:"Niklas Hambüchen"

[ceph-users] Improving CephFS performance by always putting "default" data pool on SSDs?

2024-02-04 Thread Niklas Hambüchen

https://docs.ceph.com/en/reef/cephfs/createfs/ says: > The data pool used to create the file system is the “default” data pool and > the location for storing all inode backtrace information, which is used for > hard link management and disaster recovery. > For this reason, all CephFS inodes have

[ceph-users] Re: Improving CephFS performance by always putting "default" data pool on SSDs?

2024-02-04 Thread Niklas Hambüchen

Is the answer that easy? Why does CephFS then not store this info on the metadata pool automatically? Why do I have to conclude this info about how to get better performance for replicated pools, from information only discussed for EC pools? ___ ceph-u

[ceph-users] Deep-scrub much slower than HDD speed

2023-04-25 Thread Niklas Hambüchen

I observed that on an otherwise idle cluster, scrubbing cannot fully utilise the speed of my HDDs. `iostat` shows only 8-10 MB/s per disk, instead of the ~100 MB/s most HDDs can easily deliver. Changing scrubbing settings does not help (see below). Environment: * 6 active+clean+scrubbing+deep

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-04-26 Thread Niklas Hambüchen

Hi Marc, thanks for your reply. 100MB/s is sequential, your scrubbing is random. afaik everything is random. Is there any docs that explain this, any code, or other definitive answer? Also wouldn't it make sense that for scrubbing to be able to read the disk linearly, at least to some signi

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-04-26 Thread Niklas Hambüchen

The question you should ask yourself, why you want to change/investigate this? Because if scrubbing takes 10x longer thrashing seeks, my scrubs never finish in time (the default is 1 week). I end with e.g. 267 pgs not deep-scrubbed in time On a 38 TB cluster, if you scrub 8 MB/s on 10 disks

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-05-01 Thread Niklas Hambüchen

Hi Marc, thanks for your numbers, this seems to confirm the suspicions. Oh I get it. Interesting. I think if you will expand the cluster in the future with more disks you will spread the load have more iops, this will disappear. This one I'm not sure about: If I expand the cluster 2x, I'll al

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-05-01 Thread Niklas Hambüchen

Hi all, Scrubs only read data that does exist in ceph as it exists, not every sector of the drive, written or not. Thanks, this does explain it. I just discovered: ZFS had this problem in the past: * https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSNonlinearScrubs?showcomments#comments

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-05-01 Thread Niklas Hambüchen

That one talks about resilvering, which is not the same as neither ZFS scrubs nor ceph scrubs. The commit I linked is titled "Sequential scrub and resilvers". So ZFS scrubs are included. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscrib

[ceph-users] 1 pg inconsistent and does not recover

2023-06-27 Thread Niklas Hambüchen

Hi, I have a 3x-replicated pool with Ceph 12.2.7. One HDD broke, its OSD "2" was automatically marked as "out", the disk was physically replaced by a new one, and that added back in. Now `ceph health detail` continues to permanently show: [ERR] OSD_SCRUB_ERRORS: 1 scrub errors [ERR] P

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-27 Thread Niklas Hambüchen

Hi Alvaro, Can you post the entire Ceph status output? Pasting here since it is short cluster: id: d9000ec0-93c2-479f-bd5d-94ae9673e347 health: HEALTH_ERR 1 scrub errors Possible data damage: 1 pg inconsistent services: mon: 3 daem

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-28 Thread Niklas Hambüchen

I, too have the problem that `ceph pg deep-scrub` does not start the scrub, with Ceph 16.2.7. # ceph pg deep-scrub 2.87 instructing pg 2.87 on osd.33 to deep-scrub However then, on the machine where that osd.33 is: # ceph daemon osd.33 dump_scrubs | jq . | head -n 13 [ {

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-28 Thread Niklas Hambüchen

On 28/06/2023 05:24, Alexander E. Patrakov wrote: What you can do is try extracting the PG from the dead OSD disk I believe this is not possible for me because the dead disk does not turn on at all. ___ ceph-users mailing list -- ceph-users@ceph.io T

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-28 Thread Niklas Hambüchen

Hi Frank, The response to that is not to try manual repair but to issue a deep-scrub. I am a bit confused, because in your script you do issue "ceph pg repair", not a scrub. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an em

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-28 Thread Niklas Hambüchen

Frank, high likelihood that at least one OSD of any PG is part of a scrub at any time already. In that case, if a PG is not eligible for scrubbing because one of its OSDs has already max-scrubs (default=1) scrubs running, the reservation has no observable effect. This is a great hint. I ha

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-29 Thread Niklas Hambüchen

On 28/06/2023 21:26, Niklas Hambüchen wrote: I have increased the number of scrubs per OSD from 1 to 3 using `ceph config set osd osd_max_scrubs 3`. Now the problematic PG is scrubbing in `ceph pg ls`: active+clean+scrubbing+deep+inconsistent This succeeded! The deep-scrub fixed the PG

[ceph-users] Adding datacenter level to CRUSH tree causes rebalancing

2023-07-15 Thread Niklas Hambüchen

Hi Ceph users, I have a Ceph 16.2.7 cluster that so far has been replicated over the `host` failure domain. All `hosts` have been chosen to be in different `datacenter`s, so that was sufficient. Now I wish to add more hosts, including some in already-used data centers, so I'm planning to use

[ceph-users] Re: Adding datacenter level to CRUSH tree causes rebalancing

2023-07-20 Thread Niklas Hambüchen

Thank you both Michel and Christian. Looks like I will have to do the rebalancing eventually. From past experience with Ceph 16 the rebalance will likely take at least a month with my 500 M objects. It seems like a good idea to upgrade to Ceph 17 first as Michel suggests. Unless: I was hoping

[ceph-users] Re: Adding datacenter level to CRUSH tree causes rebalancing

2023-07-24 Thread Niklas Hambüchen

I can believe the month timeframe for a cluster with multiple large spinners behind each HBA. I’ve witnessed such personally. I do have the numbers for this: My original post showed "1167541260/1595506041 objects misplaced (73.177%)". During my last recovery with Ceph 16.2.7, the recovery sp

[ceph-users] Setting temporary CRUSH "constraint" for planned cross-datacenter downtime

2024-11-03 Thread Niklas Hambüchen

My server provider usually does infrastructure maintenance and planned downtimes on a per-datacenter-building granularity, and thus I have a Ceph cluster with that set as the "datacenter" failure domain in CRUSH. However, it now has a planned maintenance that affects two buildings simultaneousl

[ceph-users] Re: Setting temporary CRUSH "constraint" for planned cross-datacenter downtime

2024-11-04 Thread Niklas Hambüchen

Hi Joachim, I'm currently looking for the general methodology and if it's possible without rebalancing everything. But of course I'd also appreciate tips directly for my deployment; here is the info: Ceph 18, Simple 3-replication (osd_pool_default_size = 3, default CRUSH rules Ceph creates fo

[ceph-users] Improving CephFS performance by always putting "default" data pool on SSDs?

[ceph-users] Re: Improving CephFS performance by always putting "default" data pool on SSDs?

[ceph-users] Deep-scrub much slower than HDD speed

[ceph-users] Re: Deep-scrub much slower than HDD speed

[ceph-users] Re: Deep-scrub much slower than HDD speed

[ceph-users] Re: Deep-scrub much slower than HDD speed

[ceph-users] Re: Deep-scrub much slower than HDD speed

[ceph-users] Re: Deep-scrub much slower than HDD speed

[ceph-users] 1 pg inconsistent and does not recover

[ceph-users] Re: 1 pg inconsistent and does not recover

[ceph-users] Re: 1 pg inconsistent and does not recover

[ceph-users] Re: 1 pg inconsistent and does not recover

[ceph-users] Re: 1 pg inconsistent and does not recover

[ceph-users] Re: 1 pg inconsistent and does not recover

[ceph-users] Re: 1 pg inconsistent and does not recover

[ceph-users] Adding datacenter level to CRUSH tree causes rebalancing

[ceph-users] Re: Adding datacenter level to CRUSH tree causes rebalancing

[ceph-users] Re: Adding datacenter level to CRUSH tree causes rebalancing

[ceph-users] Setting temporary CRUSH "constraint" for planned cross-datacenter downtime

[ceph-users] Re: Setting temporary CRUSH "constraint" for planned cross-datacenter downtime

20 matches

Site Navigation

Mail list logo

Footer information