date:20230327

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-27 Thread Nizamudeen A

Dashboard LGTM!

On Sat, Mar 25, 2023 at 1:16 AM Yuri Weinstein  wrote:

> Details of this release are updated here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The slowness we experienced seemed to be self-cured.
> Neha, Radek, and Laura please provide any findings if you have them.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (rerun on Build 2 with
> PRs merged on top of quincy-release)
> rgw - Casey (rerun on Build 2 with PRs merged on top of quincy-release)
> fs - Venky
>
> upgrade/octopus-x - Neha, Laura (package issue Adam Kraitman any updates?)
> upgrade/pacific-x - Neha, Laura, Ilya see
> https://tracker.ceph.com/issues/58914
> upgrade/quincy-p2p
>  - Neha, Laura
> client-upgrade-octopus-quincy-quincy - Neha, Laura (package issue Adam
> Kraitman any updates?)
> powercycle - Brad
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> On Tue, Mar 21, 2023 at 1:04 PM Yuri Weinstein 
> wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59070#note-1
> > Release Notes - TBD
> >
> > The reruns were in the queue for 4 days because of some slowness issues.
> > The core team (Neha, Radek, Laura, and others) are trying to narrow
> > down the root cause.
> >
> > Seeking approvals/reviews for:
> >
> > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> > and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> > the core)
> > rgw - Casey
> > fs - Venky (the fs suite has an unusually high amount of failed jobs,
> > any reason to suspect it in the observed slowness?)
> > orch - Adam King
> > rbd - Ilya
> > krbd - Ilya
> > upgrade/octopus-x - Laura is looking into failures
> > upgrade/pacific-x - Laura is looking into failures
> > upgrade/quincy-p2p - Laura is looking into failures
> > client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> > is looking into it
> > powercycle - Brad
> > ceph-volume - needs a rerun on merged
> > https://github.com/ceph/ceph-ansible/pull/7409
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Also, share any findings or hypnosis about the slowness in the
> > execution of the suite.
> >
> > Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> > RC release - pending major suites approvals.
> >
> > Thx
> > YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Boris Behrens

Hello together,

I've redeployed all OSDs in the cluster and did a blkdiscard before
deploying them again. It looks now a lot better, even better before the
octopus. I am waiting for confirmation from the dev and customer teams as
the value over all OSDs can be misleading, and we still have some OSDs that
have a 5 minute mean between 1-2 ms.

What I also see is that I have three OSDs that have quite a lot of OMAP
data, in compare to other OSDs (~20 time higher). I don't know if this is
an issue:
ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE   DATA  OMAP META
AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
...
 91ssd1.74660   1.0  1.7 TiB   1.1 TiB   1.1 TiB   26 MiB  2.9
GiB  670 GiB  62.52  1.08   59  up  osd.91
 92ssd1.74660   1.0  1.7 TiB   1.0 TiB  1022 GiB  575 MiB  2.6
GiB  764 GiB  57.30  0.99   56  up  osd.92
 93ssd1.74660   1.0  1.7 TiB   986 GiB   983 GiB   25 MiB  3.0
GiB  803 GiB  55.12  0.95   53  up  osd.93
...
130ssd1.74660   1.0  1.7 TiB  1018 GiB  1015 GiB   25 MiB  3.1
GiB  771 GiB  56.92  0.98   53  up  osd.130
131ssd1.74660   1.0  1.7 TiB  1023 GiB  1019 GiB  574 MiB  2.9
GiB  766 GiB  57.17  0.98   54  up  osd.131
132ssd1.74660   1.0  1.7 TiB   1.1 TiB   1.1 TiB   26 MiB  3.1
GiB  675 GiB  62.26  1.07   58  up  osd.132
...
 41ssd1.74660   1.0  1.7 TiB   991 GiB   989 GiB   25 MiB  2.5
GiB  797 GiB  55.43  0.95   52  up  osd.41
 44ssd1.74660   1.0  1.7 TiB   1.1 TiB   1.1 TiB  576 MiB  2.8
GiB  648 GiB  63.75  1.10   60  up  osd.44
 56ssd1.74660   1.0  1.7 TiB   993 GiB   990 GiB   25 MiB  2.9
GiB  796 GiB  55.51  0.95   54  up  osd.56

IMHO this might be due to the blkdiscard. We move a lot of 2TB disks from
the nautilus cluster (c-2) to the, then octopus, pacific cluster (c-1). And
we only removed the LVM data. Doing the blkdiscard took around 10 minutes
on an 8TB SSD on the first run, and around 5s on the second run.
I could imagine, that this might be a problem with SSDs in combination with
bluestore, because there is trimable FS and the information on what the OSD
thinks is free vs the disk controller thinks is free might deviate. But I
am not really deep into storage mechanics so this is just a wild guess.

Nonetheless the IOPS the bench command generates are still VERY low
compared to the nautilus cluster (~150 vs ~250). But this is something I
would pin to this bug: https://tracker.ceph.com/issues/58530

@Igor do you want to me to update the ticket with my findings and the logs
from pastebin?

@marc
If I interpret the linked bug correctly, you might want to have the
metadata on an SSD, because the write aplification might hit very hard on
HDDs. But maybe someone else from the mailing list can say more about it.

Cheers
 Boris

Am Mi., 22. März 2023 um 22:45 Uhr schrieb Boris Behrens :

> Hey Igor,
>
> sadly we do not have the data from the time where c1 was on nautilus.
> The RocksDB warning persisted the recreation.
>
> Here are the measurements.
> I've picked the same SSD models from the clusters to have some
> comparablity.
> For the 8TB disks it's even the same chassis configuration
> (CPU/Memory/Board/Network)
>
> The IOPS seem VERY low for me. Or are these normal values for SSDs? After
> recreation the IOPS are a lot better on the pacific cluster.
>
> I also blkdiscarded the SSDs before recreating them.
>
> Nautilus Cluster
> osd.22  = 8TB
> osd.343 = 2TB
> https://pastebin.com/EfSSLmYS
>
> Pacific Cluster before recreating OSDs
> osd.40  = 8TB
> osd.162 = 2TB
> https://pastebin.com/wKMmSW9T
>
> Pacific Cluster after recreation OSDs
> osd.40  = 8TB
> osd.162 = 2TB
> https://pastebin.com/80eMwwBW
>
> Am Mi., 22. März 2023 um 11:09 Uhr schrieb Igor Fedotov <
> igor.fedo...@croit.io>:
>
>> Hi Boris,
>>
>> first of all I'm not sure if it's valid to compare two different clusters
>> (pacific vs . nautilus, C1 vs. C2 respectively). The perf numbers
>> difference might be caused by a bunch of other factors: different H/W, user
>> load, network etc... I can see that you got ~2x latency increase after
>> Octopus to Pacific upgrade at C1 but Octopus numbers had been much above
>> Nautilus at C2 before the upgrade. Did you observe even lower numbers at C1
>> when it was running Nautilus if any?
>>
>>
>> You might want to try "ceph tell osd.N bench" to compare OSDs performance
>> for both C1 and C2. Would it be that different?
>>
>>
>> Then redeploy a single OSD at C1, wait till rebalance completion and
>> benchmark it again. What would be the new numbers? Please also collect perf
>> counters from the to-be-redeployed OSD beforehand.
>>
>> W.r.t. rocksdb warning - I presume this might be caused by newer RocksDB
>> version running on top of DB with a legacy format.. Perhaps redeployment
>> would fix that...
>>
>>
>> Thanks,
>>
>> Igor
>> On 3/21/2023 5:31 PM, B

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-27 Thread Venky Shankar

On Sat, Mar 25, 2023 at 1:17 AM Yuri Weinstein  wrote:
>
> Details of this release are updated here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The slowness we experienced seemed to be self-cured.
> Neha, Radek, and Laura please provide any findings if you have them.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (rerun on Build 2 with
> PRs merged on top of quincy-release)
> rgw - Casey (rerun on Build 2 with PRs merged on top of quincy-release)
> fs - Venky

fs approved.

>
> upgrade/octopus-x - Neha, Laura (package issue Adam Kraitman any updates?)
> upgrade/pacific-x - Neha, Laura, Ilya see 
> https://tracker.ceph.com/issues/58914
> upgrade/quincy-p2p - Neha, Laura
> client-upgrade-octopus-quincy-quincy - Neha, Laura (package issue Adam
> Kraitman any updates?)
> powercycle - Brad
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> On Tue, Mar 21, 2023 at 1:04 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59070#note-1
> > Release Notes - TBD
> >
> > The reruns were in the queue for 4 days because of some slowness issues.
> > The core team (Neha, Radek, Laura, and others) are trying to narrow
> > down the root cause.
> >
> > Seeking approvals/reviews for:
> >
> > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> > and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> > the core)
> > rgw - Casey
> > fs - Venky (the fs suite has an unusually high amount of failed jobs,
> > any reason to suspect it in the observed slowness?)
> > orch - Adam King
> > rbd - Ilya
> > krbd - Ilya
> > upgrade/octopus-x - Laura is looking into failures
> > upgrade/pacific-x - Laura is looking into failures
> > upgrade/quincy-p2p - Laura is looking into failures
> > client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> > is looking into it
> > powercycle - Brad
> > ceph-volume - needs a rerun on merged
> > https://github.com/ceph/ceph-ansible/pull/7409
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Also, share any findings or hypnosis about the slowness in the
> > execution of the suite.
> >
> > Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> > RC release - pending major suites approvals.
> >
> > Thx
> > YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Igor Fedotov


Hi Boris,

I wouldn't recommend to take absolute "osd bench" numbers too seriously. 
It's definitely not a full-scale quality benchmark tool.


The idea was just to make brief OSDs comparison from c1 and c2.

And for your reference -  IOPS numbers I'm getting in my lab with 
data/DB colocated:


1) OSD on top of Intel S4600 (SATA SSD) - ~110 IOPS

2) OSD on top of Samsung DCT 983 (M.2 NVMe) - 310 IOPS

3) OSD on top of Intel 905p (Optane NVMe) - 546 IOPS.


Could you please provide a bit more info on the H/W and OSD setup?

What are the disk models? NVMe or SATA? Are DB and main disk shared?


Thanks,

Igor

On 3/23/2023 12:45 AM, Boris Behrens wrote:

Hey Igor,

sadly we do not have the data from the time where c1 was on nautilus.
The RocksDB warning persisted the recreation.

Here are the measurements.
I've picked the same SSD models from the clusters to have some comparablity.
For the 8TB disks it's even the same chassis configuration
(CPU/Memory/Board/Network)

The IOPS seem VERY low for me. Or are these normal values for SSDs? After
recreation the IOPS are a lot better on the pacific cluster.

I also blkdiscarded the SSDs before recreating them.

Nautilus Cluster
osd.22  = 8TB
osd.343 = 2TB
https://pastebin.com/EfSSLmYS

Pacific Cluster before recreating OSDs
osd.40  = 8TB
osd.162 = 2TB
https://pastebin.com/wKMmSW9T

Pacific Cluster after recreation OSDs
osd.40  = 8TB
osd.162 = 2TB
https://pastebin.com/80eMwwBW

Am Mi., 22. März 2023 um 11:09 Uhr schrieb Igor Fedotov <
igor.fedo...@croit.io>:


Hi Boris,

first of all I'm not sure if it's valid to compare two different clusters
(pacific vs . nautilus, C1 vs. C2 respectively). The perf numbers
difference might be caused by a bunch of other factors: different H/W, user
load, network etc... I can see that you got ~2x latency increase after
Octopus to Pacific upgrade at C1 but Octopus numbers had been much above
Nautilus at C2 before the upgrade. Did you observe even lower numbers at C1
when it was running Nautilus if any?


You might want to try "ceph tell osd.N bench" to compare OSDs performance
for both C1 and C2. Would it be that different?


Then redeploy a single OSD at C1, wait till rebalance completion and
benchmark it again. What would be the new numbers? Please also collect perf
counters from the to-be-redeployed OSD beforehand.

W.r.t. rocksdb warning - I presume this might be caused by newer RocksDB
version running on top of DB with a legacy format.. Perhaps redeployment
would fix that...


Thanks,

Igor
On 3/21/2023 5:31 PM, Boris Behrens wrote:

Hi Igor,
i've offline compacted all the OSDs and reenabled the bluefs_buffered_io

It didn't change anything and the commit and apply latencies are around
5-10 times higher than on our nautlus cluster. The pacific cluster got a 5
minute mean over all OSDs 2.2ms, while the nautilus cluster is around 0.2 -
0.7 ms.

I also see these kind of logs. Google didn't really help:
2023-03-21T14:08:22.089+ 7efe7b911700  3 rocksdb:
[le/block_based/filter_policy.cc:579] Using legacy Bloom filter with high
(20) bits/key. Dramatic filter space and/or accuracy improvement is
available with format_version>=5.




Am Di., 21. März 2023 um 10:46 Uhr schrieb Igor Fedotov:


Hi Boris,

additionally you might want to manually compact RocksDB for every OSD.


Thanks,

Igor
On 3/21/2023 12:22 PM, Boris Behrens wrote:

Disabling the write cache and the bluefs_buffered_io did not change
anything.
What we see is that larger disks seem to be the leader in therms of
slowness (we have 70% 2TB, 20% 4TB and 10% 8TB SSDs in the cluster), but
removing some of the 8TB disks and replace them with 2TB (because it's by
far the majority and we have a lot of them) disks did also not change
anything.

Are there any other ideas I could try. Customer start to complain about the
slower performance and our k8s team mentions problems with ETCD because the
latency is too high.

Would it be an option to recreate every OSD?

Cheers
  Boris

Am Di., 28. Feb. 2023 um 22:46 Uhr schrieb Boris Behrens
  :


Hi Josh,
thanks a lot for the breakdown and the links.
I disabled the write cache but it didn't change anything. Tomorrow I will
try to disable bluefs_buffered_io.

It doesn't sound that I can mitigate the problem with more SSDs.


Am Di., 28. Feb. 2023 um 15:42 Uhr schrieb Josh Baergen  
:


Hi Boris,

OK, what I'm wondering is whetherhttps://tracker.ceph.com/issues/58530 is 
involved. There are two
aspects to that ticket:
* A measurable increase in the number of bytes written to disk in
Pacific as compared to Nautilus
* The same, but for IOPS

Per the current theory, both are due to the loss of rocksdb log
recycling when using default recovery options in rocksdb 6.8; Octopus
uses version 6.1.2, Pacific uses 6.8.1.

16.2.11 largely addressed the bytes-written amplification, but the
IOPS amplification remains. In practice, whether this results in a
write performance degradation depends on the speed of the underlying
media and the workload, an

[ceph-users] Re: EC profiles where m>k (EC 8+12)

2023-03-27 Thread Clyso GmbH - Ceph Foundation Member


Hi Fabien,

we have also used it several times for 2 DC setups.

However, we always try to use as few chunks as possible, as it is very 
inefficient when storing small files (min alloc size) and it can also 
lead to quite some problems with backfill and recovery in large ceph 
clusters.


Joachim

___
Clyso GmbH - Ceph Foundation Member

Am 24.03.23 um 13:00 schrieb Fabien Sirjean:

Hi Ceph users!

I've been proposed an interesting EC setup I hadn't thought about before.

Scenario is : we have two server rooms and want to store ~4PiB with 
the ability to loose 1 server room without loss of data or RW 
availability.


For the context, performance is not needed (cold storage mostly, used 
as a big filesystem).


The idea is to use EC 8+12 over 24 servers (12 on each server room), 
so if we loose 1 room we still have half of the EC parts (10/20) and 
are able to loose 2 more servers before reaching the point where we 
loose data.


I find this pretty elegant when working on a two-sites context, as 
efficiency is 40% (better than 33% three times replication) and the 
redundancy is good.


What do you think of this setup ? Did you ever used EC profiles with M 
> K ?


Thanks for sharing your thoughts!

Cheers,

Fabien
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Marc

> 
> What I also see is that I have three OSDs that have quite a lot of OMAP
> data, in compare to other OSDs (~20 time higher). I don't know if this
> is an issue:

I have on 2TB ssd's with 2GB - 4GB omap data, while on 8TB hdd's the omap data 
is only 53MB - 100MB.
Should I manually clean this? (how? :))
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Anthony D'Atri




> 
>> 
>> What I also see is that I have three OSDs that have quite a lot of OMAP
>> data, in compare to other OSDs (~20 time higher). I don't know if this
>> is an issue:
> 
> I have on 2TB ssd's with 2GB - 4GB omap data, while on 8TB hdd's the omap 
> data is only 53MB - 100MB.
> Should I manually clean this? (how? :))

The amount of omap data depends on multiple things, especially the use-case.  
If a given OSD is only used for RBD, it will have a different omap experience 
than if it were used for an RGW index pool.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Marc

> 
> >
> >>
> >> What I also see is that I have three OSDs that have quite a lot of
> OMAP
> >> data, in compare to other OSDs (~20 time higher). I don't know if
> this
> >> is an issue:
> >
> > I have on 2TB ssd's with 2GB - 4GB omap data, while on 8TB hdd's the
> omap data is only 53MB - 100MB.
> > Should I manually clean this? (how? :))
> 
> The amount of omap data depends on multiple things, especially the use-
> case.  If a given OSD is only used for RBD, it will have a different
> omap experience than if it were used for an RGW index pool.
> 

This (mine) is mostly an rbd cluster. 

Is it correct that compacting leveldb is addressing 'cleaning omap data'? And 
this can only be done by setting leveldb_compact_on_mount = true in ceph.conf 
and restarting the osd?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph cluster out of balance after adding OSDs

2023-03-27 Thread Pat Vaughan

We setup a small Ceph cluster about 6 months ago with just 6x 200GB OSDs
with one EC 4x2 pool. When we created that pool, we enabled pg_autoscale.
The OSDs stayed pretty well balanced.

After our developers released a new "feature" that caused the storage to
balloon up to over 80%, we added another 6x 200GB OSDs. When we did that,
we looked at the number of PGs for that pool, and found that there was only
1 for the rgw.data and rgw.log pools, and "osd pool autoscale-status"
doesn't return anything, so it looks like that hasn't been working. The
rebalance operation was extremely slow, and wasn't balancing out osd.0, so
we bumped up the PGs for the rgw.data pool to 16. All the OSDs except osd.0
balanced out quickly, but that one OSDs utilization keeps climbing, and the
number of misplaced objects is increasing, rather than decreasing. We set
noscrub and nodeep-scrub so scrubbing wouldn't slow down the process.

At this point, I don't want to do any more tuning to this cluster until we
can get it back to a healthy state, but it's not fixing itself. I'm open to
any ideas.

Here's the output of ceph -s:
  cluster:
id: 159d23e4-2a36-11ed-8b6e-fd27d573fa65
health: HEALTH_WARN
1 pools have many more objects per pg than average
noscrub,nodeep-scrub flag(s) set
1 backfillfull osd(s)
Low space hindering backfill (add storage if this doesn't
resolve itself): 12 pgs backfill_toofull
7 pool(s) backfillfull

  services:
mon: 3 daemons, quorum ceph3,ceph5,ceph6 (age 6h)
mgr: ceph5.ksxevx(active, since 23h), standbys: ceph4.frkyyl,
ceph6.slvpzl
osd: 12 osds: 12 up (since 11h), 12 in (since 11h); 12 remapped pgs
 flags noscrub,nodeep-scrub
rgw: 3 daemons active (3 hosts, 1 zones)

  data:
pools:   7 pools, 161 pgs
objects: 28.61M objects, 211 GiB
usage:   1.5 TiB used, 834 GiB / 2.3 TiB avail
pgs: 91779228/171665865 objects misplaced (53.464%)
 149 active+clean
 12  active+remapped+backfill_toofull

  io:
client:   11 KiB/s rd, 61 KiB/s wr, 11 op/s rd, 27 op/s wr

  progress:
Global Recovery Event (23h)
  [=...] (remaining: 115m)

ceph df:
--- RAW STORAGE ---
CLASS SIZEAVAIL USED  RAW USED  %RAW USED
ssd2.3 TiB  834 GiB  1.5 TiB   1.5 TiB  65.24
TOTAL  2.3 TiB  834 GiB  1.5 TiB   1.5 TiB  65.24

--- POOLS ---
POOL ID  PGS   STORED  OBJECTS USED  %USED  MAX
AVAIL
.mgr  11  897 KiB2  2.6 MiB   0.18
 479 MiB
.rgw.root 2   32  7.1 KiB   18  204 KiB   0.01
 479 MiB
charlotte.rgw.log 3   32   27 KiB  347  2.0 MiB   0.14
 479 MiB
charlotte.rgw.control 4   32  0 B9  0 B  0
 479 MiB
charlotte.rgw.meta5   32  9.7 KiB   16  167 KiB   0.01
 479 MiB
charlotte.rgw.buckets.data6   16  734 GiB   28.61M  1.1 TiB  99.87
 958 MiB
charlotte.rgw.buckets.index   7   16   16 GiB  691   47 GiB  97.12
 479 MiB

ceph osd tree:
ID   CLASS  WEIGHT   TYPE NAME   STATUS  REWEIGHT  PRI-AFF
 -1 2.34357  root default
 -3 0.39059  host ceph1
  0ssd  0.19530  osd.0   up   0.8  1.0
  1ssd  0.19530  osd.1   up   1.0  1.0
 -5 0.39059  host ceph2
  6ssd  0.19530  osd.6   up   1.0  1.0
  7ssd  0.19530  osd.7   up   1.0  1.0
 -7 0.39059  host ceph3
  2ssd  0.19530  osd.2   up   1.0  1.0
  8ssd  0.19530  osd.8   up   1.0  1.0
 -9 0.39059  host ceph4
  3ssd  0.19530  osd.3   up   1.0  1.0
  9ssd  0.19530  osd.9   up   1.0  1.0
-11 0.39059  host ceph5
  4ssd  0.19530  osd.4   up   1.0  1.0
 10ssd  0.19530  osd.10  up   1.0  1.0
-13 0.39059  host ceph6
  5ssd  0.19530  osd.5   up   1.0  1.0
 11ssd  0.19530  osd.11  up   1.0  1.0

ceph osd df:
ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META
AVAIL%USE   VAR   PGS  STATUS
 0ssd  0.19530   0.8  200 GiB  190 GiB  130 GiB   12 GiB   48 GiB
10 GiB  94.94  1.46   52  up
 1ssd  0.19530   1.0  200 GiB  7.3 GiB  9.8 MiB  6.4 GiB  858 MiB
 193 GiB   3.64  0.06   42  up
 6ssd  0.19530   1.0  200 GiB  148 GiB   97 GiB   14 GiB   38 GiB
52 GiB  74.06  1.14   51  up
 7ssd  0.19530   1.0  200 GiB  133 GiB   97 GiB2 KiB   35 GiB
67 GiB  66.47  1.02   43  up
 2ssd  0.19530   1.0  200 GiB  134 GiB   97 GiB   12 KiB   37 GiB
66 GiB  66.94  1.03   40  up
 8ssd  0.19530   1.0  200 GiB  136 GiB   97 GiB  2.2 GiB   36 GiB
64 GiB  67.85  1.04   40  up
 3ssd  0.19530   1.0  200 GiB  134 GiB   97 GiB4 KiB   37 GiB
66 G

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Boris Behrens

Hey Igor,

we are currently using these disks - all SATA attached (is it normal to
have some OSDs without waer counter?):
# ceph device ls | awk '{print $1}' | cut -f 1,2 -d _ | sort | uniq -c
 18 SAMSUNG_MZ7KH3T8 (4TB)
126 SAMSUNG_MZ7KM1T9 (2TB)
 24 SAMSUNG_MZ7L37T6 (8TB)
  1 TOSHIBA_THNSN81Q (2TB) (ceph device ls shows a wear of 16% so maybe
we remove this one)

These are the CPUs in the storage hosts:
# ceph osd metadata | grep -F '"cpu": "' | sort -u
"cpu": "Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz",
"cpu": "Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz",

The hosts have between 128GB and 256GB memory and each got between 20 and
30 OSDs.
DB and OSD are using same device, no extra device for DB/WAL.

Seeing your IOPS it looks like we are around the same level.
I am curious if the performance will stay at the current level or degrade
over time.

Am Mo., 27. März 2023 um 13:42 Uhr schrieb Igor Fedotov <
igor.fedo...@croit.io>:

> Hi Boris,
>
> I wouldn't recommend to take absolute "osd bench" numbers too seriously.
> It's definitely not a full-scale quality benchmark tool.
>
> The idea was just to make brief OSDs comparison from c1 and c2.
>
> And for your reference -  IOPS numbers I'm getting in my lab with data/DB
> colocated:
>
> 1) OSD on top of Intel S4600 (SATA SSD) - ~110 IOPS
>
> 2) OSD on top of Samsung DCT 983 (M.2 NVMe) - 310 IOPS
>
> 3) OSD on top of Intel 905p (Optane NVMe) - 546 IOPS.
>
>
> Could you please provide a bit more info on the H/W and OSD setup?
>
> What are the disk models? NVMe or SATA? Are DB and main disk shared?
>
>
> Thanks,
>
> Igor
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Xiubo Li


Frank,

Sorry for late.

On 24/03/2023 01:56, Frank Schilder wrote:

Hi Xiubo and Gregory,

sorry for the slow reply, I did some more debugging and didn't have too much 
time. First some questions to collecting logs, but please see also below for 
reproducing the issue yourselves.

I can reproduce it reliably but need some input for these:


enabling the kclient debug logs and

How do I do that? I thought the kclient ignores the ceph.conf and I'm not aware of a 
mount option to this effect. Is there a "ceph config set ..." setting I can 
change for a specific client (by host name/IP) and how exactly?


$ echo "module ceph +p" > /sys/kernel/debug/dynamic_debug/control

This will enable the debug logs in kernel ceph. Then please provide the 
message logs.




also the mds debug logs

I guess here I should set a higher loglevel for the MDS serving this directory 
(it is pinned to a single rank) or is it something else?


$ ceph daemon mds.X config set debug_mds 25
$ ceph daemon mds.X config set debug_ms 1



The issue seems to require a certain load to show up. I created a minimal tar 
file mimicking the problem and having 2 directories with a hard link from a 
file in the first to a new name in the second directory. This does not cause 
any problems, so its not that easy to reproduce.

How you can reproduce it:

As an alternative to my limited skills of pulling logs out, I make the 
tgz-archive available to you both. You will receive an e-mail from our 
one-drive with a download link. If you un-tar the archive on an NFS client dir 
that's a re-export of a kclient mount, after some time you should see the 
errors showing up.

I can reliably reproduce these errors on our production- as well as on our test 
cluster. You should be able to reproduce it too with the tgz file.

Here is a result on our set-up:

- production cluster (executed in a sub-dir conda to make cleanup easy):

$ time tar -xzf ../conda.tgz
tar: mambaforge/pkgs/libstdcxx-ng-9.3.0-h6de172a_18/lib/libstdc++.so.6.0.28: 
Cannot hard link to ‘envs/satwindspy/lib/libstdc++.so.6.0.28’: Read-only file 
system
[...]
tar: mambaforge/pkgs/boost-cpp-1.72.0-h9d3c048_4/lib/libboost_log.so.1.72.0: 
Cannot hard link to ‘envs/satwindspy/lib/libboost_log.so.1.72.0’: Read-only 
file system
^C

real1m29.008s
user0m0.612s
sys 0m6.870s

By this time there are already hard links created, so it doesn't fail right 
away:
$ find -type f -links +1
./mambaforge/pkgs/libev-4.33-h516909a_1/share/man/man3/ev.3
./mambaforge/pkgs/libev-4.33-h516909a_1/include/ev++.h
./mambaforge/pkgs/libev-4.33-h516909a_1/include/ev.h
...

- test cluster (octopus latest stable, 3 OSD hosts with 3 HDD OSDs each, simple 
ceph-fs):

# ceph fs status
fs - 2 clients
==
RANK  STATE MDSACTIVITY DNSINOS
  0active  tceph-02  Reqs:0 /s  1807k  1739k
   POOL  TYPE USED  AVAIL
fs-meta1  metadata  18.3G   156G
fs-meta2data   0156G
fs-data data1604G   312G
STANDBY MDS
   tceph-01
   tceph-03
MDS version: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
octopus (stable)

Its the new recommended 3-pool layout with fs-data being a 4+2 EC pool.

$ time tar -xzf / ... /conda.tgz
tar: mambaforge/ssl/cacert.pem: Cannot hard link to 
‘envs/satwindspy/ssl/cacert.pem’: Read-only file system
[...]
tar: mambaforge/lib/engines-1.1/padlock.so: Cannot hard link to 
‘envs/satwindspy/lib/engines-1.1/padlock.so’: Read-only file system
^C

real6m23.522s
user0m3.477s
sys 0m25.792s

Same story here, a large number of hard links has already been created before 
it starts failing:

$ find -type f -links +1
./mambaforge/lib/liblzo2.so.2.0.0
...

Looking at the output of find in both cases it also looks a bit 
non-deterministic when it starts failing.

It would be great if you can reproduce the issue on a similar test setup using 
the archive conda.tgz. If not, I'm happy to collect any type of logs on our 
test cluster.

We have now one user who has problems with rsync to an NFS share and it would 
be really appreciated if this could be sorted.


The ceph qa teuthology test cases have already one similar test, which 
will untar a kernel tarball, but never seen this yet.


I will try this again tomorrow without the NFS client.

Thanks

- Xiubo



Thanks for your help and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Thursday, March 23, 2023 2:41 AM
To: Frank Schilder; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name': 
Read-only file system

Hi Frank,

Could you reproduce it again by enabling the kclient debug logs and also
the mds debug logs ?

I need to know what exactly has happened in kclient and mds side.
Locally I couldn't reproduce it.

Thanks

- Xiubo

On 22/03/2023 23:27, Frank Schilder wrote:

Hi Gregory,

thanks for your reply. First a quick update. Here is how I get

[ceph-users] Re: Ceph cluster out of balance after adding OSDs

2023-03-27 Thread Robert Sander


On 27.03.23 16:04, Pat Vaughan wrote:


we looked at the number of PGs for that pool, and found that there was only
1 for the rgw.data and rgw.log pools, and "osd pool autoscale-status"
doesn't return anything, so it looks like that hasn't been working.


If you are in this situation, have a look at the crush rules of your 
pools. If the cluster has multiple device classes (hdd, ssd) then all 
pools need to use just one device class each.


The autoscaler currently does not work when one pool uses just one 
device class and another pool uses the default crush rule and therefor 
multiple device classes.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Marc





> >
> > And for your reference -  IOPS numbers I'm getting in my lab with
> data/DB
> > colocated:
> >
> > 1) OSD on top of Intel S4600 (SATA SSD) - ~110 IOPS
> >

sata ssd's on Nautilus:
Micron 5100 117
MZ7KM1T9HMJP-5 122

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph cluster out of balance after adding OSDs

2023-03-27 Thread Robert Sander


On 27.03.23 16:34, Pat Vaughan wrote:

Yes, all the OSDs are using the SSD device class.


Do you have multiple CRUSH rules by chance?
Are all pools using the same CRUSH rule?

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-27 Thread Casey Bodley

On Fri, Mar 24, 2023 at 3:46 PM Yuri Weinstein  wrote:
>
> Details of this release are updated here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The slowness we experienced seemed to be self-cured.
> Neha, Radek, and Laura please provide any findings if you have them.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (rerun on Build 2 with
> PRs merged on top of quincy-release)
> rgw - Casey (rerun on Build 2 with PRs merged on top of quincy-release)

rgw approved

> fs - Venky
>
> upgrade/octopus-x - Neha, Laura (package issue Adam Kraitman any updates?)
> upgrade/pacific-x - Neha, Laura, Ilya see 
> https://tracker.ceph.com/issues/58914
> upgrade/quincy-p2p - Neha, Laura
> client-upgrade-octopus-quincy-quincy - Neha, Laura (package issue Adam
> Kraitman any updates?)
> powercycle - Brad
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> On Tue, Mar 21, 2023 at 1:04 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59070#note-1
> > Release Notes - TBD
> >
> > The reruns were in the queue for 4 days because of some slowness issues.
> > The core team (Neha, Radek, Laura, and others) are trying to narrow
> > down the root cause.
> >
> > Seeking approvals/reviews for:
> >
> > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> > and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> > the core)
> > rgw - Casey
> > fs - Venky (the fs suite has an unusually high amount of failed jobs,
> > any reason to suspect it in the observed slowness?)
> > orch - Adam King
> > rbd - Ilya
> > krbd - Ilya
> > upgrade/octopus-x - Laura is looking into failures
> > upgrade/pacific-x - Laura is looking into failures
> > upgrade/quincy-p2p - Laura is looking into failures
> > client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> > is looking into it
> > powercycle - Brad
> > ceph-volume - needs a rerun on merged
> > https://github.com/ceph/ceph-ansible/pull/7409
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Also, share any findings or hypnosis about the slowness in the
> > execution of the suite.
> >
> > Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> > RC release - pending major suites approvals.
> >
> > Thx
> > YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Frank Schilder

> Sorry for late.
No worries.

> The ceph qa teuthology test cases have already one similar test, which
> will untar a kernel tarball, but never seen this yet.
>
> I will try this again tomorrow without the NFS client.

Great. In case you would like to use the archive I sent you a link for, please 
keep it confidential. It contains files not for publication.

I will collect the log information you asked for.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Monday, March 27, 2023 4:15 PM
To: Frank Schilder; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name': 
Read-only file system

Frank,

Sorry for late.

On 24/03/2023 01:56, Frank Schilder wrote:
> Hi Xiubo and Gregory,
>
> sorry for the slow reply, I did some more debugging and didn't have too much 
> time. First some questions to collecting logs, but please see also below for 
> reproducing the issue yourselves.
>
> I can reproduce it reliably but need some input for these:
>
>> enabling the kclient debug logs and
> How do I do that? I thought the kclient ignores the ceph.conf and I'm not 
> aware of a mount option to this effect. Is there a "ceph config set ..." 
> setting I can change for a specific client (by host name/IP) and how exactly?
>
$ echo "module ceph +p" > /sys/kernel/debug/dynamic_debug/control

This will enable the debug logs in kernel ceph. Then please provide the
message logs.


>> also the mds debug logs
> I guess here I should set a higher loglevel for the MDS serving this 
> directory (it is pinned to a single rank) or is it something else?

$ ceph daemon mds.X config set debug_mds 25
$ ceph daemon mds.X config set debug_ms 1

>
> The issue seems to require a certain load to show up. I created a minimal tar 
> file mimicking the problem and having 2 directories with a hard link from a 
> file in the first to a new name in the second directory. This does not cause 
> any problems, so its not that easy to reproduce.
>
> How you can reproduce it:
>
> As an alternative to my limited skills of pulling logs out, I make the 
> tgz-archive available to you both. You will receive an e-mail from our 
> one-drive with a download link. If you un-tar the archive on an NFS client 
> dir that's a re-export of a kclient mount, after some time you should see the 
> errors showing up.
>
> I can reliably reproduce these errors on our production- as well as on our 
> test cluster. You should be able to reproduce it too with the tgz file.
>
> Here is a result on our set-up:
>
> - production cluster (executed in a sub-dir conda to make cleanup easy):
>
> $ time tar -xzf ../conda.tgz
> tar: mambaforge/pkgs/libstdcxx-ng-9.3.0-h6de172a_18/lib/libstdc++.so.6.0.28: 
> Cannot hard link to ‘envs/satwindspy/lib/libstdc++.so.6.0.28’: Read-only file 
> system
> [...]
> tar: mambaforge/pkgs/boost-cpp-1.72.0-h9d3c048_4/lib/libboost_log.so.1.72.0: 
> Cannot hard link to ‘envs/satwindspy/lib/libboost_log.so.1.72.0’: Read-only 
> file system
> ^C
>
> real1m29.008s
> user0m0.612s
> sys 0m6.870s
>
> By this time there are already hard links created, so it doesn't fail right 
> away:
> $ find -type f -links +1
> ./mambaforge/pkgs/libev-4.33-h516909a_1/share/man/man3/ev.3
> ./mambaforge/pkgs/libev-4.33-h516909a_1/include/ev++.h
> ./mambaforge/pkgs/libev-4.33-h516909a_1/include/ev.h
> ...
>
> - test cluster (octopus latest stable, 3 OSD hosts with 3 HDD OSDs each, 
> simple ceph-fs):
>
> # ceph fs status
> fs - 2 clients
> ==
> RANK  STATE MDSACTIVITY DNSINOS
>   0active  tceph-02  Reqs:0 /s  1807k  1739k
>POOL  TYPE USED  AVAIL
> fs-meta1  metadata  18.3G   156G
> fs-meta2data   0156G
> fs-data data1604G   312G
> STANDBY MDS
>tceph-01
>tceph-03
> MDS version: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)
>
> Its the new recommended 3-pool layout with fs-data being a 4+2 EC pool.
>
> $ time tar -xzf / ... /conda.tgz
> tar: mambaforge/ssl/cacert.pem: Cannot hard link to 
> ‘envs/satwindspy/ssl/cacert.pem’: Read-only file system
> [...]
> tar: mambaforge/lib/engines-1.1/padlock.so: Cannot hard link to 
> ‘envs/satwindspy/lib/engines-1.1/padlock.so’: Read-only file system
> ^C
>
> real6m23.522s
> user0m3.477s
> sys 0m25.792s
>
> Same story here, a large number of hard links has already been created before 
> it starts failing:
>
> $ find -type f -links +1
> ./mambaforge/lib/liblzo2.so.2.0.0
> ...
>
> Looking at the output of find in both cases it also looks a bit 
> non-deterministic when it starts failing.
>
> It would be great if you can reproduce the issue on a similar test setup 
> using the archive conda.tgz. If not, I'm happy to collect any type of logs on 
> our test cluster.
>
> We have now one user who has problems with rsync to an NFS share and it would 
> be really appreciated if t

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Igor Fedotov



On 3/27/2023 12:19 PM, Boris Behrens wrote:

Nonetheless the IOPS the bench command generates are still VERY low
compared to the nautilus cluster (~150 vs ~250). But this is something I
would pin to this bug:https://tracker.ceph.com/issues/58530


I've just run "ceph tell bench" against main, octopus and nautilus 
branches (fresh osd deployed with vstart.sh) - I don't see any 
difference between releases - sata drive shows around 110 IOPs in my case..


So I suspect some difference between clusters in your case. E.g. are you 
sure disk caching is off for both?



@Igor do you want to me to update the ticket with my findings and the logs
from pastebin?
Feel free to update if you like but IMO we still lack the understanding 
what was the trigger for perf improvements in you case - OSD 
redeployment, disk trimming or both?

--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web  | LinkedIn  | 
Youtube  | 
Twitter 


Meet us at the SC22 Conference! Learn more 
Technology Fast50 Award Winner by Deloitte 
!



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Question about adding SSDs

2023-03-27 Thread Kyriazis, George

Hello ceph community,

We have a ceph cluster (Proxmox based) with is HDD-based.  We’ve had some 
performance and “slow MDS” issues while doing VM/CT backups from the Proxmox 
cluster, especially when rebalancing is going on at the same time.

My thought is that one of following is going to improve performance / response:
1. Add an M.2 drive for DB store on each node
2. Migrate the cephfs metadata pool to SSDs

We have ~25 nodes with ~3 OSDs per node.

(1) is a lot of work and will cost more.
(2) seems more risky (to me) since the metadata pool would have to be migrated 
(potential loss in transit?)

Which one of the 2 solutions above will give us more bang for the buck, or just 
plain better performance?  I would hate to implement (1) to find out that 
another solution would’ve been better.

Any other solutions that I haven’t thought of?

Thank you!

George

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-27 Thread Laura Flores

Rados review, second round:

Failures:
1. https://tracker.ceph.com/issues/58560
2. https://tracker.ceph.com/issues/58476
3. https://tracker.ceph.com/issues/58475 -- pending Q backport
4. https://tracker.ceph.com/issues/49287
5. https://tracker.ceph.com/issues/58585

Details:
1. test_envlibrados_for_rocksdb.sh failed to subscribe to repo -
Infrastructure
2. test_non_existent_cluster: cluster does not exist - Ceph -
Orchestrator
3. test_dashboard_e2e.sh: Conflicting peer dependency: postcss@8.4.21 -
Ceph - Mgr - Dashboard
4. podman: setting cgroup config for procHooks process caused: Unit
libpod-$hash.scope not found - Ceph - Orchestrator
5. rook: failed to pull kubelet image - Ceph - Orchestrator

@Radoslaw Zarzynski  will give final approval for
rados.

On Mon, Mar 27, 2023 at 10:02 AM Casey Bodley  wrote:

> On Fri, Mar 24, 2023 at 3:46 PM Yuri Weinstein 
> wrote:
> >
> > Details of this release are updated here:
> >
> > https://tracker.ceph.com/issues/59070#note-1
> > Release Notes - TBD
> >
> > The slowness we experienced seemed to be self-cured.
> > Neha, Radek, and Laura please provide any findings if you have them.
> >
> > Seeking approvals/reviews for:
> >
> > rados - Neha, Radek, Travis, Ernesto, Adam King (rerun on Build 2 with
> > PRs merged on top of quincy-release)
> > rgw - Casey (rerun on Build 2 with PRs merged on top of quincy-release)
>
> rgw approved
>
> > fs - Venky
> >
> > upgrade/octopus-x - Neha, Laura (package issue Adam Kraitman any
> updates?)
> > upgrade/pacific-x - Neha, Laura, Ilya see
> https://tracker.ceph.com/issues/58914
> > upgrade/quincy-p2p - Neha, Laura
> > client-upgrade-octopus-quincy-quincy - Neha, Laura (package issue Adam
> > Kraitman any updates?)
> > powercycle - Brad
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> > RC release - pending major suites approvals.
> >
> > On Tue, Mar 21, 2023 at 1:04 PM Yuri Weinstein 
> wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/59070#note-1
> > > Release Notes - TBD
> > >
> > > The reruns were in the queue for 4 days because of some slowness
> issues.
> > > The core team (Neha, Radek, Laura, and others) are trying to narrow
> > > down the root cause.
> > >
> > > Seeking approvals/reviews for:
> > >
> > > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> > > and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> > > the core)
> > > rgw - Casey
> > > fs - Venky (the fs suite has an unusually high amount of failed jobs,
> > > any reason to suspect it in the observed slowness?)
> > > orch - Adam King
> > > rbd - Ilya
> > > krbd - Ilya
> > > upgrade/octopus-x - Laura is looking into failures
> > > upgrade/pacific-x - Laura is looking into failures
> > > upgrade/quincy-p2p - Laura is looking into failures
> > > client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> > > is looking into it
> > > powercycle - Brad
> > > ceph-volume - needs a rerun on merged
> > > https://github.com/ceph/ceph-ansible/pull/7409
> > >
> > > Please reply to this email with approval and/or trackers of known
> > > issues/PRs to address them.
> > >
> > > Also, share any findings or hypnosis about the slowness in the
> > > execution of the suite.
> > >
> > > Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> > > RC release - pending major suites approvals.
> > >
> > > Thx
> > > YuriW
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: rbd cp vs. rbd clone + rbd flatten

2023-03-27 Thread Ilya Dryomov

On Wed, Mar 22, 2023 at 10:51 PM Tony Liu  wrote:
>
> Hi,
>
> I want
> 1) copy a snapshot to an image,
> 2) no need to copy snapshots,
> 3) no dependency after copy,
> 4) all same image format 2.
> In that case, is rbd cp the same as rbd clone + rbd flatten?
> I ran some tests, seems like it, but want to confirm, in case of missing 
> anything.

Hi Tony,

Yes, at a high level it should be the same.

> Also, seems cp is a bit faster and flatten, is that true?

I can't think of anything that would make "rbd cp" faster.  I would
actually expect it to be slower since "rbd cp" also attempts to sparsify
the destination image (see --sparse-size option), making it more space
efficient.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Question about adding SSDs

2023-03-27 Thread Marc

> 
> We have a ceph cluster (Proxmox based) with is HDD-based.  We’ve had
> some performance and “slow MDS” issues while doing VM/CT backups from
> the Proxmox cluster, especially when rebalancing is going on at the same
> time.

I also had to increase the mds cache quite a lot to get rid of 'slow' issues,
ds_cache_memory_limit = 

But after some Luminous(?) update I started using less the cephfs, because of 
issues with the kernel mount.

> My thought is that one of following is going to improve performance /
> response:
> 1. Add an M.2 drive for DB store on each node
> 2. Migrate the cephfs metadata pool to SSDs
> 
> We have ~25 nodes with ~3 OSDs per node.
> 
> (1) is a lot of work and will cost more.
> (2) seems more risky (to me) since the metadata pool would have to be
> migrated (potential loss in transit?)

Can't remember running into issues having done this switch quite a while ago. 
Still have only 7GB on ssd meta data pool vs 45TB / 13kk objects.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-03-27 Thread Ken Dreyer

I hope we don't backport such a big change to Quincy. That will have a
large impact on how we build in restricted environments with no
internet access.

We could get the missing packages into EPEL.

- Ken

On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  wrote:
>
> Hi Casey,
>
> The original idea was to leave this to Reef alone, but given that the CentOS 
> 9 Quincy release is also blocked by missing Python packages, I think that 
> it'd make sense to backport it.
>
> I'm coordinating with Pere (in CC) to expedite this. We may need help to 
> troubleshoot Shaman/rpmbuild issues. Who would be the best one to help with 
> that?
>
> Regarding your last question, I don't know who's the maintainer of those 
> packages in EPEL. There's this BZ (https://bugzilla.redhat.com/2166620) 
> requesting that specific package, but that's only one out of the dozen of 
> missing packages (plus transitive dependencies)...
>
> Kind Regards,
> Ernesto
>
>
> On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  wrote:
>>
>> hi Ernesto and lists,
>>
>> > [1] https://github.com/ceph/ceph/pull/47501
>>
>> are we planning to backport this to quincy so we can support centos 9
>> there? enabling that upgrade path on centos 9 was one of the
>> conditions for dropping centos 8 support in reef, which i'm still keen
>> to do
>>
>> if not, can we find another resolution to
>> https://tracker.ceph.com/issues/58832? as i understand it, all of
>> those python packages exist in centos 8. do we know why they were
>> dropped for centos 9? have we looked into making those available in
>> epel? (cc Ken and Kaleb)
>>
>> On Fri, Sep 2, 2022 at 12:01 PM Ernesto Puerta  wrote:
>> >
>> > Hi Kevin,
>> >
>> >>
>> >> Isn't this one of the reasons containers were pushed, so that the 
>> >> packaging isn't as big a deal?
>> >
>> >
>> > Yes, but the Ceph community has a strong commitment to provide distro 
>> > packages for those users who are not interested in moving to containers.
>> >
>> >> Is it the continued push to support lots of distros without using 
>> >> containers that is the problem?
>> >
>> >
>> > If not a problem, it definitely makes it more challenging. Compiled 
>> > components often sort this out by statically linking deps whose packages 
>> > are not widely available in distros. The approach we're proposing here 
>> > would be the closest equivalent to static linking for interpreted code 
>> > (bundling).
>> >
>> > Thanks for sharing your questions!
>> >
>> > Kind regards,
>> > Ernesto
>> > ___
>> > Dev mailing list -- d...@ceph.io
>> > To unsubscribe send an email to dev-le...@ceph.io
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-03-27 Thread Casey Bodley

i would hope that packaging for epel9 would be relatively easy, given
that the epel8 packages already exist. as a first step, we'd need to
build a full list of the missing packages. the tracker issue only
complains about python3-asyncssh python3-pecan and python3-routes, but
some of their dependencies may be missing too

On Mon, Mar 27, 2023 at 3:06 PM Ken Dreyer  wrote:
>
> I hope we don't backport such a big change to Quincy. That will have a
> large impact on how we build in restricted environments with no
> internet access.
>
> We could get the missing packages into EPEL.
>
> - Ken
>
> On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  wrote:
> >
> > Hi Casey,
> >
> > The original idea was to leave this to Reef alone, but given that the 
> > CentOS 9 Quincy release is also blocked by missing Python packages, I think 
> > that it'd make sense to backport it.
> >
> > I'm coordinating with Pere (in CC) to expedite this. We may need help to 
> > troubleshoot Shaman/rpmbuild issues. Who would be the best one to help with 
> > that?
> >
> > Regarding your last question, I don't know who's the maintainer of those 
> > packages in EPEL. There's this BZ (https://bugzilla.redhat.com/2166620) 
> > requesting that specific package, but that's only one out of the dozen of 
> > missing packages (plus transitive dependencies)...
> >
> > Kind Regards,
> > Ernesto
> >
> >
> > On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  wrote:
> >>
> >> hi Ernesto and lists,
> >>
> >> > [1] https://github.com/ceph/ceph/pull/47501
> >>
> >> are we planning to backport this to quincy so we can support centos 9
> >> there? enabling that upgrade path on centos 9 was one of the
> >> conditions for dropping centos 8 support in reef, which i'm still keen
> >> to do
> >>
> >> if not, can we find another resolution to
> >> https://tracker.ceph.com/issues/58832? as i understand it, all of
> >> those python packages exist in centos 8. do we know why they were
> >> dropped for centos 9? have we looked into making those available in
> >> epel? (cc Ken and Kaleb)
> >>
> >> On Fri, Sep 2, 2022 at 12:01 PM Ernesto Puerta  wrote:
> >> >
> >> > Hi Kevin,
> >> >
> >> >>
> >> >> Isn't this one of the reasons containers were pushed, so that the 
> >> >> packaging isn't as big a deal?
> >> >
> >> >
> >> > Yes, but the Ceph community has a strong commitment to provide distro 
> >> > packages for those users who are not interested in moving to containers.
> >> >
> >> >> Is it the continued push to support lots of distros without using 
> >> >> containers that is the problem?
> >> >
> >> >
> >> > If not a problem, it definitely makes it more challenging. Compiled 
> >> > components often sort this out by statically linking deps whose packages 
> >> > are not widely available in distros. The approach we're proposing here 
> >> > would be the closest equivalent to static linking for interpreted code 
> >> > (bundling).
> >> >
> >> > Thanks for sharing your questions!
> >> >
> >> > Kind regards,
> >> > Ernesto
> >> > ___
> >> > Dev mailing list -- d...@ceph.io
> >> > To unsubscribe send an email to dev-le...@ceph.io
> >>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-03-27 Thread Ken Dreyer

Yeah, unfortunately we had all of these in the Copr, and some
infrastructure change deleted them:
https://bugzilla.redhat.com/show_bug.cgi?id=2143742

So the quickest route back will be to rebuild the missing-from-EPEL
packages with the newer Copr settings, and I have written notes for
that in https://github.com/ktdreyer/ceph-el9

And the longer-term solution is to get the packages into EPEL proper.

- Ken

On Mon, Mar 27, 2023 at 4:04 PM Casey Bodley  wrote:
>
> i would hope that packaging for epel9 would be relatively easy, given
> that the epel8 packages already exist. as a first step, we'd need to
> build a full list of the missing packages. the tracker issue only
> complains about python3-asyncssh python3-pecan and python3-routes, but
> some of their dependencies may be missing too
>
> On Mon, Mar 27, 2023 at 3:06 PM Ken Dreyer  wrote:
> >
> > I hope we don't backport such a big change to Quincy. That will have a
> > large impact on how we build in restricted environments with no
> > internet access.
> >
> > We could get the missing packages into EPEL.
> >
> > - Ken
> >
> > On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  wrote:
> > >
> > > Hi Casey,
> > >
> > > The original idea was to leave this to Reef alone, but given that the 
> > > CentOS 9 Quincy release is also blocked by missing Python packages, I 
> > > think that it'd make sense to backport it.
> > >
> > > I'm coordinating with Pere (in CC) to expedite this. We may need help to 
> > > troubleshoot Shaman/rpmbuild issues. Who would be the best one to help 
> > > with that?
> > >
> > > Regarding your last question, I don't know who's the maintainer of those 
> > > packages in EPEL. There's this BZ (https://bugzilla.redhat.com/2166620) 
> > > requesting that specific package, but that's only one out of the dozen of 
> > > missing packages (plus transitive dependencies)...
> > >
> > > Kind Regards,
> > > Ernesto
> > >
> > >
> > > On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  wrote:
> > >>
> > >> hi Ernesto and lists,
> > >>
> > >> > [1] https://github.com/ceph/ceph/pull/47501
> > >>
> > >> are we planning to backport this to quincy so we can support centos 9
> > >> there? enabling that upgrade path on centos 9 was one of the
> > >> conditions for dropping centos 8 support in reef, which i'm still keen
> > >> to do
> > >>
> > >> if not, can we find another resolution to
> > >> https://tracker.ceph.com/issues/58832? as i understand it, all of
> > >> those python packages exist in centos 8. do we know why they were
> > >> dropped for centos 9? have we looked into making those available in
> > >> epel? (cc Ken and Kaleb)
> > >>
> > >> On Fri, Sep 2, 2022 at 12:01 PM Ernesto Puerta  
> > >> wrote:
> > >> >
> > >> > Hi Kevin,
> > >> >
> > >> >>
> > >> >> Isn't this one of the reasons containers were pushed, so that the 
> > >> >> packaging isn't as big a deal?
> > >> >
> > >> >
> > >> > Yes, but the Ceph community has a strong commitment to provide distro 
> > >> > packages for those users who are not interested in moving to 
> > >> > containers.
> > >> >
> > >> >> Is it the continued push to support lots of distros without using 
> > >> >> containers that is the problem?
> > >> >
> > >> >
> > >> > If not a problem, it definitely makes it more challenging. Compiled 
> > >> > components often sort this out by statically linking deps whose 
> > >> > packages are not widely available in distros. The approach we're 
> > >> > proposing here would be the closest equivalent to static linking for 
> > >> > interpreted code (bundling).
> > >> >
> > >> > Thanks for sharing your questions!
> > >> >
> > >> > Kind regards,
> > >> > Ernesto
> > >> > ___
> > >> > Dev mailing list -- d...@ceph.io
> > >> > To unsubscribe send an email to dev-le...@ceph.io
> > >>
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph orch ps shows version, container and image id as unknown

2023-03-27 Thread Adiga, Anantha

Hi,

Has anybody noticed this?

ceph orch ps shows version, container and image id as unknown only for mon, mgr 
and osds. Ceph health is  OK and all daemons are running fine.
cephadm ls shows values for version, container and image id.

root@cr21meg16ba0101:~# cephadm shell ceph orch ps
Inferring fsid a6f52598-e5cd-4a08-8422-7b6fdb1d5dbe
Using recent ceph image 
ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586
NAME   HOST PORTS   
 STATUSREFRESHED  AGE  MEM USE  MEM LIM  VERSION IMAGE ID   
   CONTAINER ID
crash.cr21meg16ba0101  cr21meg16ba0101  
 running (4d) 2m ago   5w7107k-  16.2.5  
6e73176320aa  1001776a7c02
crash.cr21meg16ba0102  cr21meg16ba0102  
 running (5w) 2m ago   5w63.8M-  16.2.5  
6e73176320aa  ecfd19d15dbb
crash.cr21meg16ba0103  cr21meg16ba0103  
 running (5w) 2m ago   5w8131k-  16.2.5  
6e73176320aa  c508ad3979b0
grafana.cr21meg16ba0101cr21meg16ba0101  *:3000  
 running (4d) 2m ago   4d58.0M-  6.7.4   
be4c69a1aae8  ce2741a091c7
grafana.cr21meg16ba0102cr21meg16ba0102  *:3000  
 running (4d) 2m ago   7d53.0M-  6.7.4   
be4c69a1aae8  c09f53b31999
grafana.cr21meg16ba0103cr21meg16ba0103  *:3000  
 running (4d) 2m ago   7d54.5M-  6.7.4   
be4c69a1aae8  e58f6d9f44a2
haproxy.nfs.nfs-1.cr21meg16ba0101.cwsweq   cr21meg16ba0101  
*:2049,9049  running (4d) 2m ago   5w66.7M-  2.3.21-3ce4ee0  
7ecd3fda00f4  c5f4d94b5354
haproxy.nfs.nfs-1.cr21meg16ba0102.yodyxa   cr21meg16ba0102  
*:2049,9049  running (5w) 2m ago   5w75.9M-  2.3.21-3ce4ee0  
7ecd3fda00f4  0a6629e27463
haproxy.rgw.default.default.cr21meg16ba0101.ecpnxq cr21meg16ba0101  
*:80,9050running (4d) 2m ago   5w 102M-  2.3.21-3ce4ee0  
7ecd3fda00f4  3c61d34b8b7d
haproxy.rgw.default.default.cr21meg16ba0102.nffdzb cr21meg16ba0102  
*:80,9050running (5w) 2m ago   5w 114M-  2.3.21-3ce4ee0  
7ecd3fda00f4  406ee603a311
haproxy.rgw.default.default.cr21meg16ba0103.lvypmb cr21meg16ba0103  
*:80,9050running (5w) 2m ago   5w 108M-  2.3.21-3ce4ee0  
7ecd3fda00f4  a514c26c0a8e
keepalived.nfs.nfs-1.cr21meg16ba0101.qpvesrcr21meg16ba0101  
 running (4d) 2m ago   5w26.9M-  2.0.5   
073e0c3cd1b9  c4003cb45da6
keepalived.nfs.nfs-1.cr21meg16ba0102.hedpuocr21meg16ba0102  
 running (5w) 2m ago   5w38.0M-  2.0.5   
073e0c3cd1b9  b654e661493b
keepalived.rgw.default.default.cr21meg16ba0101.biaqvq  cr21meg16ba0101  
 running (4d) 2m ago   5w39.2M-  2.0.5   
073e0c3cd1b9  020c7cb700c4
keepalived.rgw.default.default.cr21meg16ba0102.dufodx  cr21meg16ba0102  
 running (5w) 2m ago   5w46.0M-  2.0.5   
073e0c3cd1b9  fe218ecaf398
keepalived.rgw.default.default.cr21meg16ba0103.utplxz  cr21meg16ba0103  
 running (5w) 2m ago   5w45.1M-  2.0.5   
073e0c3cd1b9  18a99c36ef29
mds.cephfs.cr21meg16ba0101.tmfknc  cr21meg16ba0101  
 running (4d) 2m ago   5w29.5M-  16.2.5  
6e73176320aa  e753a2498ccf
mds.cephfs.cr21meg16ba0102.vdrcvi  cr21meg16ba0102  
 running (5w) 2m ago   5w 212M-  16.2.5  
6e73176320aa  925f151da4de
mds.cephfs.cr21meg16ba0103.yacxeu  cr21meg16ba0103  
 running (5w) 2m ago   5w38.2M-  16.2.5  
6e73176320aa  79599f7ca3c8
mgr.cr21meg16ba0101cr21meg16ba0101  
 running  2m ago   5w--   
   
mgr.cr21meg16ba0102cr21meg16ba0102  
 running  2m ago   5w--   
   
mgr.cr21meg16ba0103cr21meg16ba0103  
 running  2m ago   5w--   
   
mon.cr21meg16ba0101cr21meg16ba0101  
 running  2m ago   5w-2048M   
   
mon.cr21meg16ba0102cr21meg16ba0102  
 running  2m ago   5w-2048M   
   
mon.cr21meg16ba0103cr21meg16ba0103  
 running  2m ago   5w-2048M   
   
nfs.nfs-1.0.63.cr21meg16ba0102.kkxpfh  cr21meg16ba0102  *:12049 
 running (5w)

[ceph-users] Re: Ceph cluster out of balance after adding OSDs

2023-03-27 Thread Pat Vaughan

Looking at the pools, there are 2 crush rules. Only one pool has a
meaningful amount of data, the  charlotte.rgw.buckets.data pool. This is
the crush rule for that pool.

{
"rule_id": 1,
"rule_name": "charlotte.rgw.buckets.data",
"type": 3,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -2,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}

Everything else is using the replicated_rule:
{
"rule_id": 0,
"rule_name": "replicated_rule",
"type": 1,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}

On Mon, Mar 27, 2023 at 10:59 AM Robert Sander 
wrote:

> On 27.03.23 16:34, Pat Vaughan wrote:
> > Yes, all the OSDs are using the SSD device class.
>
> Do you have multiple CRUSH rules by chance?
> Are all pools using the same CRUSH rule?
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] orphan multipart objects in Ceph cluster

2023-03-27 Thread Ramin Najjarbashi

I hope this email finds you well. I wanted to share a recent experience I
had with our Ceph cluster and get your feedback on a solution I came up
with.

Recently, we had some orphan objects stuck in our cluster that were not
visible by any client like s3cmd, boto3, and mc. This caused some confusion
for our users, as the sum of all objects in their buckets was much less
than what we showed in the panel. We made some adjustments for them, but
the issue persisted.
As we have billions of objects in our cluster, using normal tools to find
orphans was impossible. So, I came up with a tricky way to handle the
situation. I created a bash script that identifies and removes the orphan
objects using radosgw-admin and rados commands. Here is the script:

https://gist.github.com/RaminNietzsche/b9baa06b69fc5f56d907f3c953769182

I am hoping to get some feedback from the community on this solution. Have
any of you faced similar challenges with orphan objects in your Ceph
clusters? Do you have any suggestions or improvements for my script?

Thank you for your time and help.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: rbd cp vs. rbd clone + rbd flatten

2023-03-27 Thread Tony Liu

Thank you Ilya!

Tony

From: Ilya Dryomov 
Sent: March 27, 2023 10:28 AM
To: Tony Liu
Cc: ceph-users@ceph.io; d...@ceph.io
Subject: Re: [ceph-users] rbd cp vs. rbd clone + rbd flatten

On Wed, Mar 22, 2023 at 10:51 PM Tony Liu  wrote:
>
> Hi,
>
> I want
> 1) copy a snapshot to an image,
> 2) no need to copy snapshots,
> 3) no dependency after copy,
> 4) all same image format 2.
> In that case, is rbd cp the same as rbd clone + rbd flatten?
> I ran some tests, seems like it, but want to confirm, in case of missing 
> anything.

Hi Tony,

Yes, at a high level it should be the same.

> Also, seems cp is a bit faster and flatten, is that true?

I can't think of anything that would make "rbd cp" faster.  I would
actually expect it to be slower since "rbd cp" also attempts to sparsify
the destination image (see --sparse-size option), making it more space
efficient.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Frank Schilder

Dear Xiubo,

I managed to collect logs and uploaded them to:

ceph-post-file: 3d4d1419-a11e-4937-b0b1-bd99234d4e57

By the way, if you run the test with the conda.tgz at the link location, be 
careful: it contains a .bashrc file to activate the conda environment. Un-tar 
it only in a dedicated location. Unfortunately, this is default with a conda 
installation. I will remove this file from the archive tomorrow. Well, I hope 
the logs contain what you are looking for.

I enabled dmesg debug logs for both, the kclient and nfsd. However, nfsd seems 
not to log anything, I see only ceph messages. I interrupted the tar command as 
soon as the error showed up for a number of times. There is indeed a change in 
log messages at the end, indicating an issue with ceph client caps under high 
load.

It looks as if instead of waiting for an MDS response the kclient completes a 
request prematurely with insufficient caps. I really hope it is possible to fix 
that.

I will keep the files on the system in case you need FS info for specific 
inodes.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Monday, March 27, 2023 5:22 PM
To: Xiubo Li; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name': 
Read-only file system

> Sorry for late.
No worries.

> The ceph qa teuthology test cases have already one similar test, which
> will untar a kernel tarball, but never seen this yet.
>
> I will try this again tomorrow without the NFS client.

Great. In case you would like to use the archive I sent you a link for, please 
keep it confidential. It contains files not for publication.

I will collect the log information you asked for.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Question about adding SSDs

2023-03-27 Thread Kyriazis, George

Thanks,

I’ll try adjusting mds_cache_memory_limit.  I did get some messages about MDS 
being slow trimming the cache, which implies that it was over its cache size.

I never had any problems with the kernel mount, fortunately.  I am running 
17.2.5 (Quincy)

My metadata pool size if about 15GB with a data pool of 170 TB stored / 85M 
objects.

What switch did you do?  Metadata to SSD, or increased mds_cache_memory_limit?


George


> On Mar 27, 2023, at 1:58 PM, Marc  wrote:
> 
>> 
>> We have a ceph cluster (Proxmox based) with is HDD-based.  We’ve had
>> some performance and “slow MDS” issues while doing VM/CT backups from
>> the Proxmox cluster, especially when rebalancing is going on at the same
>> time.
> 
> I also had to increase the mds cache quite a lot to get rid of 'slow' issues,
> ds_cache_memory_limit = 
> 
> But after some Luminous(?) update I started using less the cephfs, because of 
> issues with the kernel mount.
> 
>> My thought is that one of following is going to improve performance /
>> response:
>> 1. Add an M.2 drive for DB store on each node
>> 2. Migrate the cephfs metadata pool to SSDs
>> 
>> We have ~25 nodes with ~3 OSDs per node.
>> 
>> (1) is a lot of work and will cost more.
>> (2) seems more risky (to me) since the metadata pool would have to be
>> migrated (potential loss in transit?)
> 
> Can't remember running into issues having done this switch quite a while ago. 
> Still have only 7GB on ssd meta data pool vs 45TB / 13kk objects.
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Xiubo Li


Hi Frank,

Thanks very much for your logs.

I will check it.

- Xiubo

On 28/03/2023 06:35, Frank Schilder wrote:

Dear Xiubo,

I managed to collect logs and uploaded them to:

ceph-post-file: 3d4d1419-a11e-4937-b0b1-bd99234d4e57

By the way, if you run the test with the conda.tgz at the link location, be 
careful: it contains a .bashrc file to activate the conda environment. Un-tar 
it only in a dedicated location. Unfortunately, this is default with a conda 
installation. I will remove this file from the archive tomorrow. Well, I hope 
the logs contain what you are looking for.

I enabled dmesg debug logs for both, the kclient and nfsd. However, nfsd seems 
not to log anything, I see only ceph messages. I interrupted the tar command as 
soon as the error showed up for a number of times. There is indeed a change in 
log messages at the end, indicating an issue with ceph client caps under high 
load.

It looks as if instead of waiting for an MDS response the kclient completes a 
request prematurely with insufficient caps. I really hope it is possible to fix 
that.

I will keep the files on the system in case you need FS info for specific 
inodes.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Monday, March 27, 2023 5:22 PM
To: Xiubo Li; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name': 
Read-only file system


Sorry for late.

No worries.


The ceph qa teuthology test cases have already one similar test, which
will untar a kernel tarball, but never seen this yet.

I will try this again tomorrow without the NFS client.

Great. In case you would like to use the archive I sent you a link for, please 
keep it confidential. It contains files not for publication.

I will collect the log information you asked for.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14



--
Best Regards,

Xiubo Li (李秀波)

Email: xiu...@redhat.com/xiu...@ibm.com
Slack: @Xiubo Li
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Adding new server to existing ceph cluster - with separate block.db on NVME

2023-03-27 Thread Robert W. Eckert

Hi,

I am trying to add a new server to an existing cluster, but cannot get the OSDs 
to create correctly
When I try
Cephadm ceph-volume lvm create, it returns nothing but the container info.

[root@hiho ~]# cephadm ceph-volume lvm create --bluestore --data /dev/sdd 
--block.db /dev/nvme0n1p3
Inferring fsid fe3a7cb0-69ca-11eb-8d45-c86000d08867
Using ceph image with id 'cc65afd6173a' and tag '' created on 2022-10-17 
23:41:41 + UTC
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346

so I tried cephadm shell,
and
ceph-volume lvm create --bluestore --data /dev/sdd --block.db /dev/nvme0n1p3


Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
41dafd4d-0579-4119-acca-6db31586a10f
stderr: 2023-03-28T03:32:27.436+ 7fa5d6253700 -1 auth: unable to find a 
keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or 
directory
stderr: 2023-03-28T03:32:27.436+ 7fa5d6253700 -1 
AuthRegistry(0x7fa5d0060d70) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
stderr: 2023-03-28T03:32:27.436+ 7fa5d6253700 -1 auth: unable to find a 
keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or 
directory
stderr: 2023-03-28T03:32:27.436+ 7fa5d6253700 -1 
AuthRegistry(0x7fa5d0063da0) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
stderr: 2023-03-28T03:32:27.437+ 7fa5d6253700 -1 auth: unable to find a 
keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or 
directory
stderr: 2023-03-28T03:32:27.437+ 7fa5d6253700 -1 
AuthRegistry(0x7fa5d6251ea0) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
stderr: 2023-03-28T03:32:27.451+ 7fa5ceffd700 -1 monclient(hunting): 
handle_auth_bad_method server allowed_methods [2] but i only support [1]
stderr: 2023-03-28T03:32:27.453+ 7fa5cf7fe700 -1 monclient(hunting): 
handle_auth_bad_method server allowed_methods [2] but i only support [1]
stderr: 2023-03-28T03:32:27.473+ 7fa5c700 -1 monclient(hunting): 
handle_auth_bad_method server allowed_methods [2] but i only support [1]
stderr: 2023-03-28T03:32:27.474+ 7fa5d6253700 -1 monclient: authenticate 
NOTE: no keyring found; disabled cephx authentication
stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id

I then copy the key ring file into the container using scp, but by that time 
the orchestrator created OSDs on the drives, so I have to delete the OSDs and 
start over.

Then if I get the timing just right, I get this (from within cephadm shell):

[ceph: root@hiho bootstrap-osd]# ceph-volume lvm create --bluestore --data 
/dev/sdd --block.db /dev/nvme0n1p3
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
e6e316d4-670d-4a9b-a50c-bc14d57394a3
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt 
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net 
--uts=/rootfs/proc/1/ns/uts /sbin/vgcreate --force --yes 
ceph-4d95584a-df28-4e21-9480-09a13f1fb804 /dev/sdd
stdout: Physical volume "/dev/sdd" successfully created.
stdout: Volume group "ceph-4d95584a-df28-4e21-9480-09a13f1fb804" successfully 
created
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt 
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net 
--uts=/rootfs/proc/1/ns/uts /sbin/lvcreate --yes -l 953861 -n 
osd-block-e6e316d4-670d-4a9b-a50c-bc14d57394a3 
ceph-4d95584a-df28-4e21-9480-09a13f1fb804
stdout: Logical volume "osd-block-e6e316d4-670d-4a9b-a50c-bc14d57394a3" created.
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt 
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net 
--uts=/rootfs/proc/1/ns/uts /sbin/lvcreate --yes -l 119209 -n 
osd-db-9fc4f199-2c95-4ca7-a35c-ef4b08c86804 
ceph-948a633c-420e-4f55-8515-b33e1c0ef18c
stderr: Volume group "ceph-948a633c-420e-4f55-8515-b33e1c0ef18c" has 
insufficient free space (0 extents): 119209 required.
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.12 
--yes-i-really-mean-it
stderr: purged osd.12
-->  RuntimeError: Unable to find any LV for zapping OSD: 12
[ceph: root@hiho bootstrap-osd]# ceph-volume lvm create --bluestore --data 
/dev/sdd --block.db /dev/nvme0n1p3
-->  RuntimeError: Device /dev/sdd has a filesystem.
[ceph: root@hiho bootstrap-osd]# ceph-volume lvm create --bluestore --data 
/dev/sdd --block.db /dev/nvme0n1p3
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--ke

[ceph-users] Re: deploying Ceph using FQDN for MON / MDS Services

2023-03-27 Thread Lokendra Rathour

Hi All,
Any help in this issue would be appreciated.
Thanks once again.


On Tue, Jan 24, 2023 at 7:32 PM Lokendra Rathour 
wrote:

> Hi Team,
>
>
>
> We have a ceph cluster with 3 storage nodes:
>
> 1. storagenode1 - abcd:abcd:abcd::21
>
> 2. storagenode2 - abcd:abcd:abcd::22
>
> 3. storagenode3 - abcd:abcd:abcd::23
>
>
>
> The requirement is to mount ceph using the domain name of MON node:
>
> Note: we resolved the domain name via DNS server.
>
>
> For this we are using the command:
>
> ```
>
> mount -t ceph [storagenode.storage.com]:6789:/  /backup -o
> name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg==
>
> ```
>
>
>
> We are getting the following logs in /var/log/messages:
>
> ```
>
> Jan 24 17:23:17 localhost kernel: libceph: resolve '
> storagenode.storage.com' (ret=-3): failed
>
> Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip '
> storagenode.storage.com:6789'
>
> ```
>
>
>
> We also tried mounting ceph storage using IP of MON which is working fine.
>
>
>
> Query:
>
>
> Could you please help us out with how we can mount ceph using FQDN.
>
>
>
> My /etc/ceph/ceph.conf is as follows:
>
> [global]
>
> ms bind ipv6 = true
>
> ms bind ipv4 = false
>
> mon initial members = storagenode1,storagenode2,storagenode3
>
> osd pool default crush rule = -1
>
> fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe
>
> mon host =
> [v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:abcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1:[abcd:abcd:abcd::23]:6789]
>
> public network = abcd:abcd:abcd::/64
>
> cluster network = eff0:eff0:eff0::/64
>
>
>
> [osd]
>
> osd memory target = 4294967296
>
>
>
> [client.rgw.storagenode1.rgw0]
>
> host = storagenode1
>
> keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring
>
> log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log
>
> rgw frontends = beast endpoint=[abcd:abcd:abcd::21]:8080
>
> rgw thread pool size = 512
>
> --
> ~ Lokendra
> skype: lokendrarathour
>
>
>

-- 
~ Lokendra
skype: lokendrarathour
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Almalinux 9

2023-03-27 Thread Arvid Picciani

on rocky, which should be identical to alma (?), i had to do this:

https://almalinux.discourse.group/t/nothing-provides-python3-pecan-in-almalinux-9/2017/4

because the rpm has a broken dependency to pecan.

But switching from debian to the official ceph rpm packages was worth
it. The systemd unit now actually works, while it was broken in
several ways on debian.

On Mon, Mar 20, 2023 at 11:12 AM Sere Gerrit  wrote:
>
> Hello,
>
> Has anyone used almalinux 9 to install ceph. Have you encountered problems? 
> Other tips on this installation are also welcome.
>
> Regards,
> Gerrit
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
+4916093821054
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Xiubo Li



On 22/03/2023 23:41, Gregory Farnum wrote:

On Wed, Mar 22, 2023 at 8:27 AM Frank Schilder  wrote:


Hi Gregory,

thanks for your reply. First a quick update. Here is how I get ln to work
after it failed, there seems no timeout:

$ ln envs/satwindspy/include/ffi.h
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h
ln: failed to create hard link
'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h': Read-only file system
$ ls -l envs/satwindspy/include mambaforge/pkgs/libffi-3.3-h58526e2_2
envs/satwindspy/include:
total 7664
-rw-rw-r--.   1 rit rit959 Mar  5  2021 ares_build.h
[...]
$ ln envs/satwindspy/include/ffi.h
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h

After an ls -l on both directories ln works.

To the question: How can I pull out a log from the nfs server? There is
nothing in /var/log/messages.


So you’re using the kernel server and re-exporting, right?

I’m not very familiar with its implementation; I wonder if it’s doing
something strange via the kernel vfs.
AFAIK this isn’t really supportable for general use because nfs won’t
respect the CephFS file consistency protocol. But maybe it’s trying a bit
and that’s causing trouble?


Yeah, I think you are right Greg.

Checked the logs uploaded by Frank. I found that the kclient just send 
one request like:


++

2023-03-27T23:24:37.866+0200 7f0c1a0d1700  7 mds.0.server 
dispatch_client_request client_request(client.186555:475421 link 
#0x1682337/liblz4.so.1.9.3 #0x166d6d8// 
2023-03-27T23:24:37.864907+0200 caller_uid=1000, 
caller_gid=1000{4,24,27,30,46,122,134,135,1000,}) v4
2023-03-27T23:24:37.866+0200 7f0c1a0d1700  7 mds.0.server 
handle_client_link #0x1682337/liblz4.so.1.9.3 to #0x166d6d8//
2023-03-27T23:24:37.866+0200 7f0c1a0d1700 10 mds.0.server 
rdlock_two_paths_xlock_destdn request(client.186555:475421 nref=2 
cr=0x5601bbc60500) #0x1682337/liblz4.so.1.9.3 #0x166d6d8//
2023-03-27T23:24:37.866+0200 7f0c1a0d1700  7 mds.0.server 
reply_client_request -30 ((30) Read-only file system) 
client_request(client.186555:475421 link #0x1682337/liblz4.so.1.9.3 
#0x166d6d8// 2023-03-27T23:24:37.864907+0200 caller_uid=1000, 
caller_gid=1000{4,24,27,30,46,122,134,135,1000,}) v4


--

The kclient just set the src dentry to "#0x166d6d8//". While the mds 
will parse the "//" as a snapdir, which is readonly. This is why mds 
return a -EROFS error.


But from mds logs we can see that the "0x166d6d8" is 
"/data/nfs/envs/satwindspy/lib/liblz4.so.1.9.3":


++

2023-03-27T23:24:37.866+0200 7f0c1a0d1700  7 mds.0.locker issue_caps 
allowed=pAsLsXsFscrl, xlocker allowed=pAsLsXsFscrl on [inode 
0x166d6d8 [...7b,head] /data/nfs/envs/satwindspy/lib/liblz4.so.1.9.3 
auth v7035 snaprealm=0x55fe3785e500 s=215880 nl=2 n(v0 
rc2023-03-27T23:15:22.568391+0200 b215880 1=1+0) (iversion lock) 
caps={186555=pAsXsFscr/-@3} | ptrwaiter=0 request=0 lock=0 caps=1 
remoteparent=1 dirtyparent=0 dirty=0 authpin=0 0x5601b7174800]


--


Then from the kernel debug logs:

++

31358125 [16380611.812642] ceph:  do_request mds0 session 
a66983cb state open
31358126 [16380611.812644] ceph:  __prepare_send_request 
1ebc34fd tid 475421 link (attempt 1)
31358127 [16380611.812647] ceph:   dentry 6cbb0f2e 
1682337/liblz4.so.1.9.3

31358128 [16380611.812649] ceph:   dentry 126d4660 166d6d8//

--

We can see that the kclient set the src dentry to "166d6d8//".

This is incorrect and it should be "166d2e3/liblz4.so.1.9.3", which 
the "166d2e3" is the parent dir's inode and the path is 
"/data/nfs/envs/satwindspy/lib/".


From the fs/ceph/dir.c code, we can see that the ceph_link() will parse 
the src dentry:


2735 static int build_dentry_path(struct dentry *dentry, struct inode *dir,
2736  const char **ppath, int *ppathlen, u64 
*pino,

2737  bool *pfreepath, bool parent_locked)
2738 {
2739 char *path;
2740
2741 rcu_read_lock();
2742 if (!dir)
2743 dir = d_inode_rcu(dentry->d_parent);
2744 if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP 
&& !IS_ENCRYPTED(dir)) {

2745 *pino = ceph_ino(dir);
2746 rcu_read_unlock();
2747 *ppath = dentry->d_name.name;
2748 *ppathlen = dentry->d_name.len;
2749 return 0;
2750 }
2751 rcu_read_unlock();
2752 path = ceph_mdsc_build_path(dentry, ppathlen, pino, 1);
2753 if (IS_ERR(path))
2754 return PTR_ERR(path);
2755 *ppath = path;
2756 *pfreepath = true;
2757 return 0;
2758 }

In Line#2743, the 'dir' was parsed as "liblz4.so.1.9.3" 's  ino# 
"166d6d8", which is incorrect and it should be the parent dir's ino# 
"166d2e3". And in Line#2747 the "ppath" is "/", which is also 
incorrect and it should be "liblz4.so.1.9.3".


That means the nfs client passed a invalidate or corrupted old_dentry t

[ceph-users] Re: Unexpected slow read for HDD cluster (good write speed)

2023-03-27 Thread Arvid Picciani

Yes, during my last adventure of trying to get any reasonable
performance out of ceph, i realized my testing methodology was wrong.
Both the kernel client and qemu have queues everywhere that make the
numbers hard to understand.

fio has rbd support, which gives more useful values.

https://subscription.packtpub.com/book/cloud-&-networking/9781784393502/10/ch10lvl1sec112/benchmarking-ceph-rbd-using-fio

frustratingly, much lower ones, showing just how slow ceph actually is.

On Sat, Mar 18, 2023 at 8:59 PM Rafael Weingartner
 wrote:
>
> Hello guys!
>
> I would like to ask if somebody has already experienced a similar
> situation. We have a new cluster with 5 nodes with the following setup:
>
>- 128 GB of RAM
>- 2 cpus Intel(R) Intel Xeon Silver 4210R
>- 1 NVME of 2 TB for the rocks DB caching
>- 5 HDDs of 14TB
>- 1 NIC dual port of 25GiB in BOND mode.
>
>
> We are starting with a single dual port NIC (the bond has 50GiB in total),
> the design has been prepared so a new NIC can be added, and a new BOND can
> be created, where we intend to offload the cluster network. Therefore,
> logically speaking, we already configured different VLANs and networks for
> public and cluster traffic of Ceph.
>
>
> We are using Ubuntu 20.04 with Ceph Octopus. It is a standard deployment
> that we are used to. During our initial validations and evaluations of the
> cluster, we are reaching write speeds between 250-300MB/s, which would be
> the ballpark for this kind of setup for HDDs with the NVME as Rocks.db
> cache (in our experience). However, the issue is the reading process. While
> reading, we barely hit the mark of 100MB/s; we would expect at least
> something similar to the write speed. These tests are being performed in a
> pool with a replication factor of 3.
>
>
> We have already checked the disks, and they all seem to be reading just
> fine. The network does not seem to be the bottleneck either (checked with
> atop while reading/writing to the cluster).
>
>
> Have you guys ever encountered similar situations? Do you have any tips for
> us to proceed with the troubleshooting?
>
>
> We suspect that we are missing some small tuning detail, which is affecting
> the read performance only, but so far we could not pinpoint it. Any help
> would be much appreciated :)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
+4916093821054
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: quincy v17.2.6 QE Validation status

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

[ceph-users] Re: quincy v17.2.6 QE Validation status

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

[ceph-users] Re: EC profiles where m>k (EC 8+12)

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

[ceph-users] Ceph cluster out of balance after adding OSDs

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

[ceph-users] Re: Ceph cluster out of balance after adding OSDs

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

[ceph-users] Re: Ceph cluster out of balance after adding OSDs

[ceph-users] Re: quincy v17.2.6 QE Validation status

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

[ceph-users] Question about adding SSDs

[ceph-users] Re: quincy v17.2.6 QE Validation status

[ceph-users] Re: rbd cp vs. rbd clone + rbd flatten

[ceph-users] Re: Question about adding SSDs

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

[ceph-users] ceph orch ps shows version, container and image id as unknown

[ceph-users] Re: Ceph cluster out of balance after adding OSDs

[ceph-users] orphan multipart objects in Ceph cluster

[ceph-users] Re: rbd cp vs. rbd clone + rbd flatten

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

[ceph-users] Re: Question about adding SSDs

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

[ceph-users] Adding new server to existing ceph cluster - with separate block.db on NVME

[ceph-users] Re: deploying Ceph using FQDN for MON / MDS Services

[ceph-users] Re: Almalinux 9

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

[ceph-users] Re: Unexpected slow read for HDD cluster (good write speed)

36 matches

Site Navigation

Mail list logo

Footer information