[ceph-users] Re: ceph status not showing correct monitor services

2024-04-02 Thread Eugen Block
You can add a mon manually to the monmap, but that requires a downtime  
of the mons. Here's an example [1] how to modify the monmap (including  
network change which you don't need, of course). But that would be my  
last resort, first I would try to find out why the MON fails to join  
the quorum. What is that mon.a001s016 logging, and what are the other  
two logging?
Do you have another host where you could place a mon daemon to see if  
that works?



[1]  
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#example-procedure


Zitat von "Adiga, Anantha" :


# ceph mon stat
e6: 2 mons at  
{a001s017=[v2:10.45.128.27:3300/0,v1:10.45.128.27:6789/0],a001s018=[v2:10.45.128.28:3300/0,v1:10.45.128.28:6789/0]}, election epoch 162, leader 0 a001s018, quorum 0,1  
a001s018,a001s017


# ceph orch ps | grep mon
mon.a001s016a001s016   running  
(3h)  6m ago   3h 527M2048M  16.2.5   
6e73176320aa  39db8cfba7e1
mon.a001s017a001s017   running  
(22h)47s ago   1h 993M2048M  16.2.5   
6e73176320aa  e5e5cb6c256c
mon.a001s018a001s018   running  
(5w) 48s ago   2y1167M2048M  16.2.5   
6e73176320aa  7d2bb6d41f54


# ceph mgr stat
{
"epoch": 1130365,
"available": true,
"active_name": "a001s016.ctmoay",
"num_standby": 1
}

# ceph orch ps | grep mgr
mgr.a001s016.ctmoay a001s016  *:8443   running  
(18M)   109s ago  23M 518M-  16.2.5   
6e73176320aa  169cafcbbb99
mgr.a001s017.bpygfm a001s017  *:8443   running  
(19M) 5m ago  23M 501M-  16.2.5   
6e73176320aa  97257195158c
mgr.a001s018.hcxnef a001s018  *:8443   running  
(20M) 5m ago  23M 113M-  16.2.5   
6e73176320aa  21ba5896cee2


# ceph orch ls --service_name=mgr --export
service_type: mgr
service_name: mgr
placement:
  count: 3
  hosts:
  - a001s016
  - a001s017
  - a001s018

# ceph orch ls --service_name=mon --export
service_type: mon
service_name: mon
placement:
  count: 3
  hosts:
  - a001s016
  - a001s017
  - a001s018

-Original Message-
From: Adiga, Anantha
Sent: Monday, April 1, 2024 6:06 PM
To: Eugen Block 
Cc: ceph-users@ceph.io
Subject: RE: [ceph-users] Re: ceph status not showing correct  
monitor services


# ceph tell mon.a001s016 mon_status Error ENOENT: problem getting  
command descriptions from mon.a001s016


a001s016 is outside quorum see below

# ceph tell mon.a001s017 mon_status {
"name": "a001s017",
"rank": 1,
"state": "peon",
"election_epoch": 162,
"quorum": [
0,
1
],
"quorum_age": 79938,
"features": {
"required_con": "2449958747317026820",
"required_mon": [
"kraken",
"luminous",
"mimic",
"osdmap-prune",
"nautilus",
"octopus",
"pacific",
"elector-pinging"
],
"quorum_con": "4540138297136906239",
"quorum_mon": [
"kraken",
"luminous",
"mimic",
"osdmap-prune",
"nautilus",
"octopus",
"pacific",
"elector-pinging"
]
},
"outside_quorum": [],
"extra_probe_peers": [
{
"addrvec": [
{
"type": "v2",
"addr": "10.45.128.26:3300",
"nonce": 0
},
{
"type": "v1",
"addr": "10.45.128.26:6789",
"nonce": 0
}
]
}
],
"sync_provider": [],
"monmap": {
"epoch": 6,
"fsid": "604d56db-2fab-45db-a9ea-c418f9a8cca8",
"modified": "2024-03-31T23:54:18.692983Z",
"created": "2021-09-30T16:15:12.884602Z",
"min_mon_release": 16,
"min_mon_release_name": "pacific",
"election_strategy": 1,
"disallowed_leaders: ": "",
"stretch_mode": false,
"features": {
"persistent": [
"kraken",
"luminous",
"mimic",
"osdmap-prune",
"nautilus",
"octopus",
"pacific",
"elector-pinging"
],
"optional": []
},
"mons": [
{
"rank": 0,
"name": "a001s018",
"public_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.45.128.28:3300",
"nonce": 0
},
{
"type": "v1",
"addr": "10.45.128.28:6789",
"nonce": 0

[ceph-users] Re: Drained A Single Node Host On Accident

2024-04-02 Thread Eugen Block

Hi,

without knowing the whole story, to cancel OSD removal you can run  
this command:


ceph orch osd rm stop 

Regards,
Eugen

Zitat von "adam.ther" :


Hello,

I have a single node host with a VM as a backup MON,MGR,ect.

This has caused all OSD's to be pending as 'deleting', can i safely  
cancel this deletion request?


Regards,

Adam
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replace block drives of combined NVME+HDD OSDs

2024-04-02 Thread Eugen Block

Hi,

here's the link to the docs [1] how to replace OSDs.

ceph orch osd rm  --replace --zap [--force]

This should zap both the data drive and db LV (yes, its data is  
useless without the data drive), not sure how it will handle if the  
data drive isn't accessible though.
One thing I'm not sure about is how your spec file will be handled.  
Since the drive letters can change I recommend to use a more generic  
approach, for example the rotational flags and drive sizes instead of  
paths. But if the drive letters won't change for the replaced drives  
it should work. I also don't expect an impact on the rest of the OSDs  
(except for backfilling, of course).


Regards,
Eugen

[1] https://docs.ceph.com/en/latest/cephadm/services/osd/#replacing-an-osd

Zitat von Zakhar Kirpichenko :


Hi,

Unfortunately, some of our HDDs failed and we need to replace these drives
which are parts of "combined" OSDs (DB/WAL on NVME, block storage on HDD).
All OSDs are defined with a service definition similar to this one:

```
service_type: osd
service_id: ceph02_combined_osd
service_name: osd.ceph02_combined_osd
placement:
  hosts:
  - ceph02
spec:
  data_devices:
paths:
- /dev/sda
- /dev/sdb
- /dev/sdc
- /dev/sdd
- /dev/sde
- /dev/sdf
- /dev/sdg
- /dev/sdh
- /dev/sdi
  db_devices:
paths:
- /dev/nvme0n1
- /dev/nvme1n1
  filter_logic: AND
  objectstore: bluestore
```

In the above example, HDDs `sda` and `sdb` are not readable and data cannot
be copied over to new HDDs. NVME partitions of `nvme0n1` with DB/WAL data
are intact, but I guess that data is useless. I think the best approach is
to replace the dead drives and completely rebuild each affected OSD. How
should we go about this, preferably in a way that other OSDs on the node
remain unaffected and operational?

I would appreciate any advice or pointers to the relevant documentation.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs inode backtrace information

2024-04-02 Thread Loïc Tortay

On 29/03/2024 04:18, Niklas Hambüchen wrote:

Hi Loïc, I'm surprised by that high storage amount, my "default" pool uses only 
~512 Bytes per file, not ~32 KiB like in your pool. That's a 64x difference!

(See also my other response to the original post I just sent.)

I'm using Ceph 16.2.1.

>
Hello,
We actually traced the source of this issue: a configuration mistake 
(data pool not set properly on a client directory).


The directories for this client had "a few" large (tens of GiB) files, 
which were stored in the "default" pool and used up a lot of space.


With this client's data moved where they belong:
[ceph: root@NODE /]# ceph df
--- RAW STORAGE ---
CLASS SIZEAVAIL USED  RAW USED  %RAW USED
hdd6.1 PiB  3.8 PiB  2.3 PiB   2.3 PiB  37.15
ssd 52 TiB   49 TiB  3.2 TiB   3.2 TiB   6.04
TOTAL  6.1 PiB  3.9 PiB  2.3 PiB   2.3 PiB  36.89

--- POOLS ---
POOL  ID   PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
device_health_metrics  2 1  710 MiB  664  2.1 GiB  0 15 TiB
cephfs_EC_data 3  8192  1.7 PiB  606.79M  2.1 PiB  38.132.8 PiB
cephfs_metadata4   128  101 GiB   14.55M  304 GiB   0.64 15 TiB
cephfs_default 5   128  0 B  162.90M  0 B  0 15 TiB
[...]

So the "correct" stored value for the default pool should be 0 bytes.


Loïc.
--
|   Loīc Tortay  - IN2P3 Computing Centre  |
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CEPH Quincy installation with multipathd enabled

2024-04-02 Thread youssef . khristo
Greetings community,

we have a setup comprising of 6 servers hosting CentOS 8 Minimal Installation 
with CEPH Quincy version 18.2.2 supported by 20Gbps fiber optics NICs and a 
dual Xeon Intel processors, bootstrapped the installation on the first node 
then expanded to the others using the cephadm method, having the monitor 
services deployed on 5 of these nodes as well as 3 manager nodes. Each server 
has an NVMe boot disk as well as a 1TBs SATA SSD over which the OSDs are 
deployed. An EC profile was created with k=3 and m=3, serving a CephFS 
filesystem on top with NFS exports to serve other servers. Up to this point, 
the setup is quite stable in the sense that upon emergency reboot or network 
connection failure the OSDs did not fail and remain functional/started normally 
after reboot.

At a certain point in our project, we had the need to activate the multipathd 
service, adding the boot drive partition and the CEPH SSD to its blacklist as 
to not be initialized for use by an mpath partition, the blacklist goes like so:

boot blacklist:
===
blacklist {
wwid "eui."
}

SATA SSD blacklist:
===
blacklist {
wwid "naa."
}

The above blacklist configuration ensures that both the boot disk as well as 
CEPH's OSD function properly, with the following being lsblk output:

NAME
  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 
8:00 894.3G  0 disk
└─ceph---osd--block-- 

  252:30 894.3G  0 lvm
nvme0n1 
  259:00 238.5G  0 disk
├─nvme0n1p1 
  259:10   600M  0 part /boot/efi
├─nvme0n1p2 
  259:20 1G  0 part /boot
└─nvme0n1p3 
  259:30 236.9G  0 part
  ├─centos-root 
  252:00   170G  0 lvm  /
  ├─centos-swap 
  252:10  23.4G  0 lvm  [SWAP]
  ├─centos-var_log_audit
  252:20   7.5G  0 lvm  /var/log/audit
  ├─centos-home 
  252:4026G  0 lvm  /home
  └─centos-var_log  
  252:5010G  0 lvm  /var/log

In addition to the above multipathd configuration, we have use_devicesfile=1 in 
/etc/lvm/lvm.conf, with /etc/lvm/devices/system.devices file being like so, 
with PVID used from the output of the pvdisplay command, and the IDNAME value 
extracted from the ouput of "ls -lha /dev/disk/by-id":

VERSION=1.1.1
IDTYPE=sys_wwid IDNAME=eui. DEVNAME=/dev/nvme0n1p3 PVID= PART=3
IDTYPE=sys_wwid IDNAME=naa. DEVNAME=/dev/sda PVID=


Issues started when performing certain tests regarding the system's integrity, 
most important of which is emergency shutdown's and reboot of all the nodes, 
the behavior that follows is that the OSDs are not started automatically as 
well as their respective LVM volumes not properly showing (except on a single 
node for some reason), hence the lsblk ouput changes like the snippet below, 
requiring us rebooting the nodes one by one until all the OSDs are back online:

NAME
  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 
8:00 894.3G  0 disk
nvme0n1 
  259:00 238.5G  0 disk
├─nvme0n1p1 
  259:10   600M  0 part /boot/efi
├─nvme0n1p2 
  259:20 1G  0 part /boot
└─nvme0n1p3 
  259:30 236.9G  0 part
  ├─centos-root 
  252:00   170G  0 lvm  /
  ├─centos-swap 

[ceph-users] Pacific 16.2.15 `osd noin`

2024-04-02 Thread Zakhar Kirpichenko
Hi,

I'm adding a few OSDs to an existing cluster, the cluster is running with
`osd noout,noin`:

  cluster:
id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
health: HEALTH_WARN
noout,noin flag(s) set

Specifically `noin` is documented as "prevents booting OSDs from being
marked in". But freshly added OSDs were immediately marked `up` and `in`:

  services:
...
osd: 96 osds: 96 up (since 5m), 96 in (since 6m); 338 remapped pgs
 flags noout,noin

# ceph osd tree in | grep -E "osd.11|osd.12|osd.26"
 11hdd9.38680  osd.11   up   1.0  1.0
 12hdd9.38680  osd.12   up   1.0  1.0
 26hdd9.38680  osd.26   up   1.0  1.0

Is this expected behavior? Do I misunderstand the purpose of the `noin`
option?

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replace block drives of combined NVME+HDD OSDs

2024-04-02 Thread Zakhar Kirpichenko
Thank you, Eugen.

It was actually very straightforward. I'm happy to report back that there
were no issues with removing and zapping the OSDs whose data devices were
unavailable. I had to manually remove stale dm entries, but that was it.

/Z

On Tue, 2 Apr 2024 at 11:00, Eugen Block  wrote:

> Hi,
>
> here's the link to the docs [1] how to replace OSDs.
>
> ceph orch osd rm  --replace --zap [--force]
>
> This should zap both the data drive and db LV (yes, its data is
> useless without the data drive), not sure how it will handle if the
> data drive isn't accessible though.
> One thing I'm not sure about is how your spec file will be handled.
> Since the drive letters can change I recommend to use a more generic
> approach, for example the rotational flags and drive sizes instead of
> paths. But if the drive letters won't change for the replaced drives
> it should work. I also don't expect an impact on the rest of the OSDs
> (except for backfilling, of course).
>
> Regards,
> Eugen
>
> [1] https://docs.ceph.com/en/latest/cephadm/services/osd/#replacing-an-osd
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > Unfortunately, some of our HDDs failed and we need to replace these
> drives
> > which are parts of "combined" OSDs (DB/WAL on NVME, block storage on
> HDD).
> > All OSDs are defined with a service definition similar to this one:
> >
> > ```
> > service_type: osd
> > service_id: ceph02_combined_osd
> > service_name: osd.ceph02_combined_osd
> > placement:
> >   hosts:
> >   - ceph02
> > spec:
> >   data_devices:
> > paths:
> > - /dev/sda
> > - /dev/sdb
> > - /dev/sdc
> > - /dev/sdd
> > - /dev/sde
> > - /dev/sdf
> > - /dev/sdg
> > - /dev/sdh
> > - /dev/sdi
> >   db_devices:
> > paths:
> > - /dev/nvme0n1
> > - /dev/nvme1n1
> >   filter_logic: AND
> >   objectstore: bluestore
> > ```
> >
> > In the above example, HDDs `sda` and `sdb` are not readable and data
> cannot
> > be copied over to new HDDs. NVME partitions of `nvme0n1` with DB/WAL data
> > are intact, but I guess that data is useless. I think the best approach
> is
> > to replace the dead drives and completely rebuild each affected OSD. How
> > should we go about this, preferably in a way that other OSDs on the node
> > remain unaffected and operational?
> >
> > I would appreciate any advice or pointers to the relevant documentation.
> >
> > Best regards,
> > Zakhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about rbd flatten command

2024-04-02 Thread Anthony D'Atri
Do these RBD volumes have a full feature set?  I would think that fast-diff and 
objectmap would speed this.

> On Apr 2, 2024, at 00:36, Henry lol  wrote:
> 
> I'm not sure, but it seems that read and write operations are
> performed for all objects in rbd.
> If so, is there any method to apply qos for flatten operation?
> 
> 2024년 4월 1일 (월) 오후 11:59, Henry lol 님이 작성:
>> 
>> Hello,
>> 
>> I executed multiple 'rbd flatten' commands simultaneously on a client.
>> The elapsed time of each flatten job increased as the number of jobs
>> increased, and network I/O was nearly full.
>> 
>> so, I have two questions.
>> 1. isn’t the flatten job running within the ceph cluster? Why is
>> client-side network I/O so high?
>> 2. How can I apply qos for each flatten job to reduce network I/O?
>> 
>> Sincerely,
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about rbd flatten command

2024-04-02 Thread Henry lol
Yes, they do.
Actually, the read/write ops will be skipped as you said.

Also, is it possible to limit the max network throughput per flatten
operation or image?
I want to avoid the scenario where the flatten operation consumes
network throughput fully.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replace block drives of combined NVME+HDD OSDs

2024-04-02 Thread Eugen Block

Nice, thanks for the info.

Zitat von Zakhar Kirpichenko :


Thank you, Eugen.

It was actually very straightforward. I'm happy to report back that there
were no issues with removing and zapping the OSDs whose data devices were
unavailable. I had to manually remove stale dm entries, but that was it.

/Z

On Tue, 2 Apr 2024 at 11:00, Eugen Block  wrote:


Hi,

here's the link to the docs [1] how to replace OSDs.

ceph orch osd rm  --replace --zap [--force]

This should zap both the data drive and db LV (yes, its data is
useless without the data drive), not sure how it will handle if the
data drive isn't accessible though.
One thing I'm not sure about is how your spec file will be handled.
Since the drive letters can change I recommend to use a more generic
approach, for example the rotational flags and drive sizes instead of
paths. But if the drive letters won't change for the replaced drives
it should work. I also don't expect an impact on the rest of the OSDs
(except for backfilling, of course).

Regards,
Eugen

[1] https://docs.ceph.com/en/latest/cephadm/services/osd/#replacing-an-osd

Zitat von Zakhar Kirpichenko :

> Hi,
>
> Unfortunately, some of our HDDs failed and we need to replace these
drives
> which are parts of "combined" OSDs (DB/WAL on NVME, block storage on
HDD).
> All OSDs are defined with a service definition similar to this one:
>
> ```
> service_type: osd
> service_id: ceph02_combined_osd
> service_name: osd.ceph02_combined_osd
> placement:
>   hosts:
>   - ceph02
> spec:
>   data_devices:
> paths:
> - /dev/sda
> - /dev/sdb
> - /dev/sdc
> - /dev/sdd
> - /dev/sde
> - /dev/sdf
> - /dev/sdg
> - /dev/sdh
> - /dev/sdi
>   db_devices:
> paths:
> - /dev/nvme0n1
> - /dev/nvme1n1
>   filter_logic: AND
>   objectstore: bluestore
> ```
>
> In the above example, HDDs `sda` and `sdb` are not readable and data
cannot
> be copied over to new HDDs. NVME partitions of `nvme0n1` with DB/WAL data
> are intact, but I guess that data is useless. I think the best approach
is
> to replace the dead drives and completely rebuild each affected OSD. How
> should we go about this, preferably in a way that other OSDs on the node
> remain unaffected and operational?
>
> I would appreciate any advice or pointers to the relevant documentation.
>
> Best regards,
> Zakhar
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph status not showing correct monitor services

2024-04-02 Thread Adiga, Anantha
Hi Eugen,

Currently there are only three nodes, but I can add  a node to the cluster and 
check it out. I will take a look at the mon logs 


Thank you,
Anantha

-Original Message-
From: Eugen Block  
Sent: Tuesday, April 2, 2024 12:19 AM
To: Adiga, Anantha 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph status not showing correct monitor services

You can add a mon manually to the monmap, but that requires a downtime of the 
mons. Here's an example [1] how to modify the monmap (including network change 
which you don't need, of course). But that would be my last resort, first I 
would try to find out why the MON fails to join the quorum. What is that 
mon.a001s016 logging, and what are the other two logging?
Do you have another host where you could place a mon daemon to see if that 
works?


[1]
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#example-procedure

Zitat von "Adiga, Anantha" :

> # ceph mon stat
> e6: 2 mons at
> {a001s017=[v2:10.45.128.27:3300/0,v1:10.45.128.27:6789/0],a001s018=[v2
> :10.45.128.28:3300/0,v1:10.45.128.28:6789/0]}, election epoch 162, 
> leader 0 a001s018, quorum 0,1
> a001s018,a001s017
>
> # ceph orch ps | grep mon
> mon.a001s016a001s016   running  
> (3h)  6m ago   3h 527M2048M  16.2.5   
> 6e73176320aa  39db8cfba7e1
> mon.a001s017a001s017   running  
> (22h)47s ago   1h 993M2048M  16.2.5   
> 6e73176320aa  e5e5cb6c256c
> mon.a001s018a001s018   running  
> (5w) 48s ago   2y1167M2048M  16.2.5   
> 6e73176320aa  7d2bb6d41f54
>
> # ceph mgr stat
> {
> "epoch": 1130365,
> "available": true,
> "active_name": "a001s016.ctmoay",
> "num_standby": 1
> }
>
> # ceph orch ps | grep mgr
> mgr.a001s016.ctmoay a001s016  *:8443   running  
> (18M)   109s ago  23M 518M-  16.2.5   
> 6e73176320aa  169cafcbbb99
> mgr.a001s017.bpygfm a001s017  *:8443   running  
> (19M) 5m ago  23M 501M-  16.2.5   
> 6e73176320aa  97257195158c
> mgr.a001s018.hcxnef a001s018  *:8443   running  
> (20M) 5m ago  23M 113M-  16.2.5   
> 6e73176320aa  21ba5896cee2
>
> # ceph orch ls --service_name=mgr --export
> service_type: mgr
> service_name: mgr
> placement:
>   count: 3
>   hosts:
>   - a001s016
>   - a001s017
>   - a001s018
>
> # ceph orch ls --service_name=mon --export
> service_type: mon
> service_name: mon
> placement:
>   count: 3
>   hosts:
>   - a001s016
>   - a001s017
>   - a001s018
>
> -Original Message-
> From: Adiga, Anantha
> Sent: Monday, April 1, 2024 6:06 PM
> To: Eugen Block 
> Cc: ceph-users@ceph.io
> Subject: RE: [ceph-users] Re: ceph status not showing correct monitor 
> services
>
> # ceph tell mon.a001s016 mon_status Error ENOENT: problem getting 
> command descriptions from mon.a001s016
>
> a001s016 is outside quorum see below
>
> # ceph tell mon.a001s017 mon_status {
> "name": "a001s017",
> "rank": 1,
> "state": "peon",
> "election_epoch": 162,
> "quorum": [
> 0,
> 1
> ],
> "quorum_age": 79938,
> "features": {
> "required_con": "2449958747317026820",
> "required_mon": [
> "kraken",
> "luminous",
> "mimic",
> "osdmap-prune",
> "nautilus",
> "octopus",
> "pacific",
> "elector-pinging"
> ],
> "quorum_con": "4540138297136906239",
> "quorum_mon": [
> "kraken",
> "luminous",
> "mimic",
> "osdmap-prune",
> "nautilus",
> "octopus",
> "pacific",
> "elector-pinging"
> ]
> },
> "outside_quorum": [],
> "extra_probe_peers": [
> {
> "addrvec": [
> {
> "type": "v2",
> "addr": "10.45.128.26:3300",
> "nonce": 0
> },
> {
> "type": "v1",
> "addr": "10.45.128.26:6789",
> "nonce": 0
> }
> ]
> }
> ],
> "sync_provider": [],
> "monmap": {
> "epoch": 6,
> "fsid": "604d56db-2fab-45db-a9ea-c418f9a8cca8",
> "modified": "2024-03-31T23:54:18.692983Z",
> "created": "2021-09-30T16:15:12.884602Z",
> "min_mon_release": 16,
> "min_mon_release_name": "pacific",
> "election_strategy": 1,
> "disallowed_leaders: ": "",
> "stretch_mode": false,
> "features": {
> "persistent": [
> "kraken",
> "luminous",
> "mimic",
> "osdmap-prune",
> "nautilus",
> "octopus",
> 

[ceph-users] "ceph orch daemon add osd" deploys broken OSD

2024-04-02 Thread service . plant
Hi everybody.
I've faced the situation when I cannot redeploy OSD on a new disk

So, I need to replace osd.30 cuz disk always reports about problems with I\O.
I do `ceph orch daemon osd.30 --replace`

Then I zap DB

 ```
root@server-2:/# ceph-volume lvm zap /dev/ceph-db/db-88
--> Zapping: /dev/ceph-db/db-88
Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-db/db-88 bs=1M count=10 
conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0247342 s, 424 MB/s
--> Zapping successful for: 
```

And now zap DATA

```
root@server-2:/# ceph-volume lvm zap /dev/sdn
--> Zapping: /dev/sdn
--> --destroy was not specified, but zapping a whole device will remove the 
partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdn bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 1.35239 s, 7.8 MB/s
--> Zapping successful for: 
```

Okay, now disk is ready and orchestrator confirms it

```
root@server-1:~# ceph orch device ls host server-2 --refresh
server-2  /dev/sdnhdd   ST18000NM008J_5000c500d80398bf   16.3T  
Yes4m ago
```

Now its time for orchestrator to add new osd

```
root@server-1:~# ceph orch daemon add osd 
server-2:data_devices=/dev/sdn,db_devices=/dev/ceph-db/db-88
Created no osd(s) on host server-2; already created?
```
But it gives osd.30 in state down.

If I try to run systemd service manually, it cannot start because

```
Apr 02 12:30:41 server-2 systemd[1]: Started Ceph osd.30 for 
ea98e312-dfd9-11ee-a226-33f018c3a407.
Apr 02 12:30:41 server-2 bash[3316003]: /bin/bash: 
/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/osd.30/unit.run: No such 
file or directory
Apr 02 12:30:41 server-2 systemd[1]: 
ceph-ea98e312-dfd9-11ee-a226-33f018c3a407@osd.30.service: Main process exited, 
code=exited, status=127/n/a
Apr 02 12:30:41 server-2 bash[3316014]: /bin/bash: 
/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/osd.30/unit.poststop: No 
such file or directory
Apr 02 12:30:41 server-2 systemd[1]: 
ceph-ea98e312-dfd9-11ee-a226-33f018c3a407@osd.30.service: Failed with result 
'exit-code'.
Apr 02 12:30:51 server-2 systemd[1]: 
ceph-ea98e312-dfd9-11ee-a226-33f018c3a407@osd.30.service: Scheduled restart 
job, restart counter is at 1.
Apr 02 12:30:51 server-2 systemd[1]: Stopped Ceph osd.30 for 
ea98e312-dfd9-11ee-a226-33f018c3a407.
```


And even if I try to redeploy osd.30 by ceph orch osd redeploy osd.30 then I 
get the error in ceph -W  cephadm

```
2024-04-02T12:41:39.856767+ mgr.server-2.opelxj (mgr.2994187) 5453 : 
cephadm [INF] Reconfiguring daemon osd.30 on server-2
2024-04-02T12:41:41.048352+ mgr.server-2.opelxj (mgr.2994187) 5454 : 
cephadm [ERR] cephadm exited with an error code: 1, stderr: Non-zero exit code 
1 from /usr/bin/docker container inspect --format {{.State.Status}} 
ceph-ea98e312-dfd9-11ee-a226-33f018c3a407-osd-30
/usr/bin/docker: stdout 
/usr/bin/docker: stderr Error response from daemon: No such container: 
ceph-ea98e312-dfd9-11ee-a226-33f018c3a407-osd-30
Non-zero exit code 1 from /usr/bin/docker container inspect --format 
{{.State.Status}} ceph-ea98e312-dfd9-11ee-a226-33f018c3a407-osd.30
/usr/bin/docker: stdout 
/usr/bin/docker: stderr Error response from daemon: No such container: 
ceph-ea98e312-dfd9-11ee-a226-33f018c3a407-osd.30
Reconfig daemon osd.30 ...
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
  File 
"/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py",
 line 10700, in 
  File 
"/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py",
 line 10688, in main
  File 
"/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py",
 line 6620, in command_deploy_from
  File 
"/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py",
 line 6638, in _common_deploy
  File 
"/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py",
 line , in _dispatch_deploy
  File 
"/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py",
 line 3792, in deploy_daemon
  File 
"/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py",
 line 3078, in create_daemon_dirs
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
  File 
"/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/ceph

[ceph-users] Multi-MDS

2024-04-02 Thread quag...@bol.com.br
Hello,
 I did the configuration to activate multimds in ceph. The parameters I 
entered looked like this:
3 assets
1 standby

 I also placed the distributed pinning configuration at the root of the 
mounted dir of the storage:
setfattr -n ceph.dir.pin.distributed -v 1 /

 This configuration is working well, but the balance between the MDS ranks 
is not ok. Look:
RANK  STATE   MDS ACTIVITY DNSINOS   DIRS   
CAPS  
 0active  lovelace.ceph05-ceph.bpqxla  Reqs:   91 /s  1396k  1078k   179k   
176k  
 1active  lovelace.ceph01-ceph.rncaqh  Reqs:   18 /s   862k   571k   110k   
292k  
 2active  lovelace.ceph02-ceph.yarywe  Reqs:  1155 /s  12804k  12830k   
1251k  11672k  

 Is there any extra configuration to improve this balance that I haven't 
done?

Thanks
Rafael.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: daemon osd.x on yyy is in error state

2024-04-02 Thread service . plant
probably `ceph mgr fail` will help.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm shell version not consistent across monitors

2024-04-02 Thread J-P Methot

Hi,

We are still running ceph Pacific with cephadm and we have run into a 
peculiar issue. When we run the `cephadm shell` command on monitor1, the 
container we get runs ceph 16.2.9. However, when we run the same command 
on monitor2, the container runs 16.2.15, which is the current version of 
the cluster. Why does it do that and is there a way to force it to 
16.2.15 on monitor1?


Please note that both monitors have the same configuration. Cephadm has 
been pulled from GitHub for both monitors instead of the package 
manager's version.


--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm shell version not consistent across monitors

2024-04-02 Thread Adam King
From what I can see with the most recent cephadm binary on pacific, unless
you have the CEPHADM_IMAGE env variable set, it does a `podman images
--filter label=ceph=True --filter dangling=false` (or docker) and takes the
first image in the list. It seems to be getting sorted by creation time by
default. If you want to guarantee what you get, you can run `cephadm
--image  shell` and it will try to use the image specified. You
could also try that env variable (although I haven't tried that in a very
long time if I'm honest, so hopefully it works correctly). If nothing else,
just seeing the output of that podman command and removing images that
appear before the 16.2.15 one on the list should work.

On Tue, Apr 2, 2024 at 5:03 PM J-P Methot 
wrote:

> Hi,
>
> We are still running ceph Pacific with cephadm and we have run into a
> peculiar issue. When we run the `cephadm shell` command on monitor1, the
> container we get runs ceph 16.2.9. However, when we run the same command
> on monitor2, the container runs 16.2.15, which is the current version of
> the cluster. Why does it do that and is there a way to force it to
> 16.2.15 on monitor1?
>
> Please note that both monitors have the same configuration. Cephadm has
> been pulled from GitHub for both monitors instead of the package
> manager's version.
>
> --
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific Bug?

2024-04-02 Thread Adam King
https://tracker.ceph.com/issues/64428 should be it. Backports are done for
quincy, reef, and squid and the patch will be present in the next release
for each of those versions. There isn't a pacific backport as, afaik, there
are no more pacific releases planned.

On Fri, Mar 29, 2024 at 6:03 PM Alex  wrote:

> Hi again Adam :-)
>
> Would you happen to have the Bug Tracker issue for label bug?
>
> Thanks.
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failed adding back a node

2024-04-02 Thread Alex
Hi Adam.

Re-deploying didn't work, but `ceph config dump` showed
one of the container_images specified 16.2.10-160.
After we removed that var, it instantly redeployed the OSDs.

Thanks again for your help.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Are we logging IRC channels?

2024-04-02 Thread Alvaro Soto
I'll start working on the needed configurations and let you know.

On Sat, Mar 23, 2024, 12:09 PM Anthony D'Atri  wrote:

> I fear this will raise controversy, but in 2024 what’s the value in
> perpetuating an interface from early 1980s BITnet batch operating systems?
>
> > On Mar 23, 2024, at 5:45 AM, Janne Johansson 
> wrote:
> >
> >> Sure!  I think Wido just did it all unofficially, but afaik we've lost
> >> all of those records now.  I don't know if Wido still reads the mailing
> >> list but he might be able to chime in.  There was a ton of knowledge in
> >> the irc channel back in the day.  With slack, it feels like a lot of
> >> discussions have migrated into different channels, though #ceph still
> >> gets some community traffic (and a lot of hardware design discussion).
> >
> > It's also a bit cumbersome to be on IRC when someone pastes 655 lines
> > of text on slack, then edits a whitespace or comma that ended up wrong
> > and we get a total dump of 655 lines again from the gateway.
> >
> > --
> > May the most significant bit of your life be positive.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph status not showing correct monitor services

2024-04-02 Thread Adiga, Anantha
Hi Eugen,


Noticed this in the config dump:  Why  only   "mon.a001s016 " listed?And 
this is the one that is not listed in "ceph -s" 
 

  mon  advanced  
auth_allow_insecure_global_id_reclaim  false

  mon  advanced  
auth_expose_insecure_global_id_reclaim false

  mon  advanced  mon_compact_on_start   
true 
mon.a001s016   basic container_image

docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586
  *
  mgr  advanced  
mgr/cephadm/container_image_base   docker.io/ceph/daemon

  mgr  advanced  
mgr/cephadm/container_image_node_exporter  docker.io/prom/node-exporter:v0.17.0 



  cluster:
id: 604d56db-2fab-45db-a9ea-c418f9a8cca8
health: HEALTH_OK

  services:
mon: 2 daemons, quorum a001s018,a001s017 (age 45h)
mgr: a001s016.ctmoay(active, since 28h), standbys: a001s017.bpygfm
mds: 1/1 daemons up, 2 standby
osd: 36 osds: 36 up (since 29h), 36 in (since 2y)
rgw: 3 daemons active (3 hosts, 1 zones)

var lib mon unit.image


a001s016: 
# cat /var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s016/unit.image
docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586

a001s017:
# cat /var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s017/unit.image
docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586
a001s018:
# cat /var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s018/unit.image
docker.io/ceph/daemon:latest-pacific

ceph image tag, digest from docker inspect of:  ceph/daemon  
latest-pacific   6e73176320aa   2 years ago 1.27GB
==
a001s016:
"Id": 
"sha256:6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
"RepoTags": [ 
"ceph/daemon:latest-pacific"
"RepoDigests": [

"ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586"

a001s017:
"Id": 
"sha256:6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
"RepoTags": [
"ceph/daemon:latest-pacific"
"RepoDigests": [

"ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586"

a001s018:
"Id": 
"sha256:6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
"RepoTags": [
"ceph/daemon:latest-pacific"
"RepoDigests": [

"ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586"

-Original Message-
From: Adiga, Anantha 
Sent: Tuesday, April 2, 2024 10:42 AM
To: Eugen Block 
Cc: ceph-users@ceph.io
Subject: RE: [ceph-users] Re: ceph status not showing correct monitor services

Hi Eugen,

Currently there are only three nodes, but I can add  a node to the cluster and 
check it out. I will take a look at the mon logs 


Thank you,
Anantha

-Original Message-
From: Eugen Block 
Sent: Tuesday, April 2, 2024 12:19 AM
To: Adiga, Anantha 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph status not showing correct monitor services

You can add a mon manually to the monmap, but that requires a downtime of the 
mons. Here's an example [1] how to modify the monmap (including network change 
which you don't need, of course). But that would be my last resort, first I 
would try to find out why the MON fails to join the quorum. What is that 
mon.a001s016 logging, and what are the other two logging?
Do you have another host where you could place a mon daemon to see if that 
works?


[1]
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#example-procedure

Zitat von "Adiga, Anantha" :

> # ceph mon stat
> e6: 2 mons at
> {a001s017=[v2:10.45.128.27:3300/0,v1:10.45.128.27:6789/0],a001s018=[v2
> :10.45.128.28:3300/0,v1:10.45.128.28:6789/0]}, election epoch 162, 
> leader 0 a001s018, quorum 0,1
> a001s018,a001s017
>
> # ceph orch ps | grep mon
> mon.a001s016a001s016   running  
> (3h)  6m ago   3h 527M2048M  16.2.5   
> 6e73176320aa  39db8cfba7e1
> mon.a001s017a001s017   running  
> (22h)47s ago   1h 993M2048M  16.2.5   
> 6e73176320aa  e5e5cb6c256c
> mon.a001s018a001s018   running  
> (5w) 48s ago   2y1167M2048M  16.2.5   
> 6e73176320aa  7d2bb6d41f54
>
>

[ceph-users] Re: RGW Data Loss Bug in Octopus 15.2.0 through 15.2.6

2024-04-02 Thread xu chenhui
Jonas Nemeiksis wrote:
> Hello,
> 
> Maybe your issue depends to this https://tracker.ceph.com/issues/63642
> 
> 
> 
> On Wed, Mar 27, 2024 at 7:31 PM xu chenhui  
> wrote:
> 
> >   Hi, Eric Ivancich
> >I have similar problem in ceph version 16.2.5. Has this problem been
> >  completely resolved in Pacific version?
> >  Our bucket has no lifecycle rules and no copy operation. This is a very
> >  serious data loss issue for us and It happens occasionally in our
> >  environment.
> > 
> >  Detail describe:
> > 
> > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/XQRUOEPZ7Y…
> > 
> >  thanks.
> >  ___
> >  ceph-users mailing list -- ceph-users(a)ceph.io
> >  To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >
Hi,
   My problem is different from https://tracker.ceph.com/issues/63642.  All 
of multiparts and shadow object had lost and only have head object in our 
environment. Maybe this is new issue that  happened in low probability  and I 
haven't reproduce. 

  Is there other information that can help locate root cause or reduce data 
loss ?

thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD image metric

2024-04-02 Thread Szabo, Istvan (Agoda)
Hi,

Trying to pull out some metrics from ceph about the rbd images sizes but 
haven't found anything only pool related metrics.

Wonder is there any metric about images or I need to create by myself to 
collect it with some third party tool?

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw s3 bucket policies limitations (on users)

2024-04-02 Thread Christian Rohmann

Hey Garcetto,

On 29.03.24 4:13 PM, garcetto wrote:

   i am trying to set bucket policies to allow to different users to access
same bucket with different permissions, BUT it seems that is not yet
supported, am i wrong?

https://docs.ceph.com/en/reef/radosgw/bucketpolicy/#limitations

"We do not yet support setting policies on users, groups, or roles."


Maybe check out my previous, somewhat similar question: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/S2TV7GVFJTWPYA6NVRXDL2JXYUIQGMIN/

And PR https://github.com/ceph/ceph/pull/44434 could also be of interest.

I would love for RGW to support more detailed bucket policies, 
especially with external / Keystone authentication.




Regards


Christian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io