[ceph-users] Re: [Ceph incident] PG stuck in peering.

2024-09-26 Thread Frank Schilder
Hi Loan,

thanks for the detailed post-mortem to the list!

I misread your first message, unfortunately. On our cluster we also had issues 
with 1-2 PGs being stuck in peering resulting in blocked IO and warnings piling 
up. We identified the "bad" OSD by shutting one member-OSD down at a time and 
setting it out, so it was in state down+out. As soon as the bad OSD was 
down+out, the PG recovered and became active. In our case the disks were bad 
and we replaced them.

I thought you had done that, but after re-reading it was restarts only, which 
will not force a remapping. Sorry for the confusion and hopefully our 
experience reports here help other users.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: Backup strategies for rgw s3

2024-09-26 Thread Alex Hussein-Kershaw (HE/HIM)
We have been using Amazon S3 (rclone.org) to copy 
all the data to a filesystem nightly to provide an S3 backup mechanism.

It has Ceph support out the box (added by one of my colleagues a few years ago).


From: Adam Prycki 
Sent: Wednesday, September 25, 2024 7:09 PM
To: Shilpa Manjrabad Jagannath 
Cc: ceph-users@ceph.io 
Subject: [EXTERNAL] [ceph-users] Re: Backup strategies for rgw s3

[You don't often get email from apry...@man.poznan.pl. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Yes, I know. It's just that I would need to define zone wide default
lifecycle.
For example, archivezone stores 30 days of object versions unless
specified otherwise.
Is there a way to do it?

As far as I know lifecycle you linked is configured per bucket.
As a small cloud provide we cannot really configure lifecycle policies
for users.

Adam Prycki

On 25.09.2024 19:10, Shilpa Manjrabad Jagannath wrote:
> starting from quincy, you can define rules for lifecycle to execute on
> Archive zone alone by specifying
>  flag under 
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F53361&data=05%7C02%7Calexhus%40microsoft.com%7C96c97b0ee50b44ab1f7d08dcdd8d52dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638628846286567323%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=0IH5xTWHvQYqdqMtukhoTCwN4hVAQ5ineog2%2B9HKkec%3D&reserved=0
>
>
> On Wed, Sep 25, 2024 at 7:59 AM Adam Prycki  wrote:
>
>> Hi,
>>
>> I'm currently working on a project which requires us to backup 2
>> separate s3 zones/realms and retain it for few months. Requirements were
>> written by someone who doesn't know ceph rgw capabilities.
>> We have to do incremental and full backups. Each type of backup has
>> separate retention period.
>>
>> Is there a way to accomplish this with in a sensible way?
>>
>> My fist idea would be to create multisite replication to archive-zone.
>> But I cannot really enforce data retention on archive zone. It would
>> require us to overwrite lifecycle policies created by our users.
>> As far as I know it's not possible to create zone level lifecycle
>> policy. Users get their accounts are provisioned via openstack swift.
>>
>> Second idea would be to create custom backup script and copy all the
>> buckets in the cluster to different s3 zone. Destination buckets could
>> be all versioned to have desired retention. But this option feels very
>> hackish and messy. Backing up 2 separate s3 zones to one could cause
>> collision in bucket names. Prefixing bucket names with additional
>> information is not safe because buckets have fixed name length.
>> Prefixing object key name is also not ideal.
>>
>> Best regards
>> Adam Prycki
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs +inotify = caps problem?

2024-09-26 Thread Frédéric Nass
Hi Burkhard,

This is a known issue. We ran into it a few months back using VScode containers 
working on Cephfs under Kubernetes.
Tweaking the settings.json file as he suggested here [1] by Dietmar did the 
trick for us.

Regards,
Frédéric.

[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/EUCRAYM3JEGK6KCJND7SZNB3HYHJAB65/

- Le 25 Sep 24, à 9:55, Burkhard Linke 
burkhard.li...@computational.bio.uni-giessen.de a écrit :

> Hi,
> 
> 
> we are currently trying to debug and understand a problem with cephfs
> and inotify watchers. A user is running Visual Studio Code with a
> workspace on a cephfs mount. VSC uses inotify for monitoring files and
> directories in the workspace:
> 
> 
> root@cli:~# ./inotify-info
> --
> INotify Limits:
>   max_queued_events    16,384
>   max_user_instances   128
>   max_user_watches 1,048,576
> --
>    Pid Uid    App   Watches  Instances
>    3599940 1236   node    1,681  1
>  1 0  systemd   106  5
>    3600170 1236   node   54  1
>     874797 0  udevadm    17  1
>    3599118 0  systemd 7  3
>    3599707 1236   systemd 7  3
>    3599918 1236   node    6  1
>   2047 100    dbus-daemon 3  1
>   2054 0  sssd    2  1
>   2139 0  systemd-logind (deleted)    1  1
>   2446 0  agetty  1  1
>    361 1236   node    1  1
> --
> Total inotify Watches:   1886
> Total inotify Instances: 20
> --
> root@cli:~# cat /sys/kernel/debug/ceph/XYZ.client354064780/caps  | wc -l
> 1773083
> 
> root@cli:~# uname -a
> Linux cli 6.1.0-23-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1
> (2024-07-15) x86_64 GNU/Linux
> 
> 
> So roughly 1.700 watchers result in over 1.7 million caps. (some of the
> watchers might for files on different filesystems). I've also checked
> this on the MDS side, it also reports a very high number of caps for
> that client. Running tools like lsof on the host as root only reports
> very few open files (<50). So inotify seems to be responsible for the
> massive caps build up. Terminating VSC results in a sharp drop of the
> caps  (just a few open files / directories left afterwards).
> 
> 
> Is this a known problem?
> 
> 
> Best regards,
> 
> Burkhard Linke
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-09-26 Thread Eugen Block

Hi,

this seems a bit unnecessary to rebuild OSDs just to get them managed.  
If you apply a spec file that targets your hosts/OSDs, they will  
appear as managed. So when you would need to replace a drive, you  
could already utilize the orchestrator to remove and zap the drive.  
That works just fine.
How to get out of your current situation is not entirely clear to me  
yet. I’ll reread your post tomorrow.


Regards,
Eugen

Zitat von Bob Gibson :


Hi,

We recently converted a legacy cluster running Quincy v17.2.7 to  
cephadm. The conversion went smoothly and left all osds unmanaged by  
the orchestrator as expected. We’re now in the process of converting  
the osds to be managed by the orchestrator. We successfully  
converted a few of them, but then the orchestrator somehow got  
confused. `ceph health detail` reports a “stray daemon” for the osd  
we’re trying to convert, and the orchestrator is unable to refresh  
its device list so it doesn’t see any available devices.


From the perspective of the osd node, the osd has been wiped and is  
ready to be reinstalled. We’ve also rebooted the node for good  
measure. `ceph osd tree` shows that the osd has been destroyed, but  
the orchestrator won’t reinstall it because it thinks the device is  
still active. The orchestrator device information is stale, but  
we’re unable to refresh it. The usual recommended workaround of  
failing over the mgr hasn’t helped. We’ve also tried `ceph orch  
device ls —refresh` to no avail. In fact after running that command  
subsequent runs of `ceph orch device ls` produce no output until the  
mgr is failed over again.


Is there a way to force the orchestrator to refresh its list of  
devices when in this state? If not, can anyone offer any suggestions  
on how to fix this problem?


Cheers,
/rjg

P.S. Some additional information in case it’s helpful...

We’re using the following command to replace existing devices so  
that they’re managed by the orchestrator:


```
ceph orch osd rm  --replace —zap
```

and we’re currently stuck on osd 88.

```
ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
stray daemon osd.88 on host ceph-osd31 not managed by cephadm
```

`ceph osd tree` shows that the osd has been destroyed and is ready  
to be replaced:


```
ceph osd tree-from ceph-osd31
ID   CLASS  WEIGHTTYPE NAMESTATUS REWEIGHT  PRI-AFF
-46 34.93088  host ceph-osd31
 84ssd   3.49309  osd.84  up   1.0  1.0
 85ssd   3.49309  osd.85  up   1.0  1.0
 86ssd   3.49309  osd.86  up   1.0  1.0
 87ssd   3.49309  osd.87  up   1.0  1.0
 88ssd   3.49309  osd.88   destroyed 0  1.0
 89ssd   3.49309  osd.89  up   1.0  1.0
 90ssd   3.49309  osd.90  up   1.0  1.0
 91ssd   3.49309  osd.91  up   1.0  1.0
 92ssd   3.49309  osd.92  up   1.0  1.0
 93ssd   3.49309  osd.93  up   1.0  1.0
```

The cephadm log shows a claim on node `ceph-osd31` for that osd:

```
2024-09-25T14:15:45.699348-0400 mgr.ceph-mon3.qzjgws [INF] Found osd  
claims -> {'ceph-osd31': ['88']}
2024-09-25T14:15:45.699534-0400 mgr.ceph-mon3.qzjgws [INF] Found osd  
claims for drivegroup ceph-osd31 -> {'ceph-osd31': ['88']}

```

`ceph orch device ls` shows that the device list isn’t refreshing:

```
ceph orch device ls ceph-osd31
HOSTPATH  TYPE  DEVICE ID 
SIZE  AVAILABLE  REFRESHED  REJECT REASONS
ceph-osd31  /dev/sdc  ssd   INTEL_SSDSC2KG038T8_PHYG039603PE3P8EGN   
3576G  No 22h agoInsufficient space (<10 extents) on  
vgs, LVM detected, locked
ceph-osd31  /dev/sdd  ssd   INTEL_SSDSC2KG038T8_PHYG039600AY3P8EGN   
3576G  No 22h agoInsufficient space (<10 extents) on  
vgs, LVM detected, locked
ceph-osd31  /dev/sde  ssd   INTEL_SSDSC2KG038T8_PHYG039600CW3P8EGN   
3576G  No 22h agoInsufficient space (<10 extents) on  
vgs, LVM detected, locked
ceph-osd31  /dev/sdf  ssd   INTEL_SSDSC2KG038T8_PHYG039600CM3P8EGN   
3576G  No 22h agoInsufficient space (<10 extents) on  
vgs, LVM detected, locked
ceph-osd31  /dev/sdg  ssd   INTEL_SSDSC2KG038T8_PHYG039600UB3P8EGN   
3576G  No 22h agoInsufficient space (<10 extents) on  
vgs, LVM detected, locked
ceph-osd31  /dev/sdh  ssd   INTEL_SSDSC2KG038T8_PHYG039603753P8EGN   
3576G  No 22h agoInsufficient space (<10 extents) on  
vgs, LVM detected, locked
ceph-osd31  /dev/sdi  ssd   INTEL_SSDSC2KG038T8_PHYG039603R63P8EGN   
3576G  No 22h agoInsufficient space (<10 extents) on  
vgs, LVM detected, locked
ceph-osd31  /dev/sdj  ssd   INTEL_SSDSC2KG038TZ_PHYJ4011032M3P8DGN   
3576G  No 22h agoInsufficient space (<10 extents) on 

[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-09-26 Thread Eugen Block
Right, if you need encryption, a rebuild is required. Your procedure  
has already worked 4 times, so I'd say nothing seems wrong with that  
per se.
Regarding the stuck device list, do you see the mgr logging anything  
suspicious? Especially when you say that it only returns output after  
a failover. Those two osd specs are not conflicting since the first is  
"unmanaged" after adoption.
Is there something in 'ceph orch osd rm status'? Can you run 'cephadm  
ceph-volume inventory' locally on that node? Do you see any hints in  
the node's syslog? Maybe try a reboot or something?



Zitat von Bob Gibson :

Thanks for your reply Eugen. I’m fairly new to cephadm so I wasn’t  
aware that we could manage the drives without rebuilding them.  
However, we thought we’d take advantage of this opportunity to also  
encrypt the drives, and that does require a rebuild.


I have a theory on why the orchestrator is confused. I want to  
create an osd service for each osd node so I can manage drives on a  
per-node basis.


I started by creating a spec for the first node:

service_type: osd
service_id: ceph-osd31
placement:
  hosts:
  - ceph-osd31
spec:
  data_devices:
rotational: 0
size: '3TB:'
  encrypted: true
  filter_logic: AND
  objectstore: bluestore

But I also see a default spec, “osd”, which has placement set to “unmanaged”.

`ceph orch ls osd —export` shows the following:

service_type: osd
service_name: osd
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: ceph-osd31
service_name: osd.ceph-osd31
placement:
  hosts:
  - ceph-osd31
spec:
  data_devices:
rotational: 0
size: '3TB:'
  encrypted: true
  filter_logic: AND
  objectstore: bluestore

`ceph orch ls osd` shows that I was able to convert 4 drives using my spec:

NAMEPORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd 95  10m ago-
osd.ceph-osd31   4  10m ago43m  ceph-osd31

Despite being able to convert 4 drives, I’m wondering if these specs  
are conflicting with one another, and that has confused the  
orchestrator. If so, how do I safely get from where I am now to  
where I want to be? :-)


Cheers,
/rjg

On Sep 26, 2024, at 3:31 PM, Eugen Block  wrote:

EXTERNAL EMAIL | USE CAUTION

Hi,

this seems a bit unnecessary to rebuild OSDs just to get them managed.
If you apply a spec file that targets your hosts/OSDs, they will
appear as managed. So when you would need to replace a drive, you
could already utilize the orchestrator to remove and zap the drive.
That works just fine.
How to get out of your current situation is not entirely clear to me
yet. I’ll reread your post tomorrow.

Regards,
Eugen

Zitat von Bob Gibson :

Hi,

We recently converted a legacy cluster running Quincy v17.2.7 to
cephadm. The conversion went smoothly and left all osds unmanaged by
the orchestrator as expected. We’re now in the process of converting
the osds to be managed by the orchestrator. We successfully
converted a few of them, but then the orchestrator somehow got
confused. `ceph health detail` reports a “stray daemon” for the osd
we’re trying to convert, and the orchestrator is unable to refresh
its device list so it doesn’t see any available devices.

From the perspective of the osd node, the osd has been wiped and is
ready to be reinstalled. We’ve also rebooted the node for good
measure. `ceph osd tree` shows that the osd has been destroyed, but
the orchestrator won’t reinstall it because it thinks the device is
still active. The orchestrator device information is stale, but
we’re unable to refresh it. The usual recommended workaround of
failing over the mgr hasn’t helped. We’ve also tried `ceph orch
device ls —refresh` to no avail. In fact after running that command
subsequent runs of `ceph orch device ls` produce no output until the
mgr is failed over again.

Is there a way to force the orchestrator to refresh its list of
devices when in this state? If not, can anyone offer any suggestions
on how to fix this problem?

Cheers,
/rjg

P.S. Some additional information in case it’s helpful...

We’re using the following command to replace existing devices so
that they’re managed by the orchestrator:

```
ceph orch osd rm  --replace —zap
```

and we’re currently stuck on osd 88.

```
ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
   stray daemon osd.88 on host ceph-osd31 not managed by cephadm
```

`ceph osd tree` shows that the osd has been destroyed and is ready
to be replaced:

```
ceph osd tree-from ceph-osd31
ID   CLASS  WEIGHTTYPE NAMESTATUS REWEIGHT  PRI-AFF
-46 34.93088  host ceph-osd31
84ssd   3.49309  osd.84  up   1.0  1.0
85ssd   3.49309  osd.85  up   1.0  1.0
86ssd   3.49309  osd.86  up   1.0  1.0
87ssd   3.49309  osd.87  up

[ceph-users] v19.2.0 Squid released

2024-09-26 Thread Laura Flores
We're very happy to announce the first stable release of the Squid series.

We express our gratitude to all members of the Ceph community who
contributed by proposing pull requests, testing this release, providing
feedback, and offering valuable suggestions.

Highlights:

RADOS
* BlueStore has been optimized for better performance in snapshot-intensive
workloads.
* BlueStore RocksDB LZ4 compression is now enabled by default to improve
average performance and "fast device" space usage.
* Other improvements include more flexible EC configurations, an OpTracker
to help debug mgr module issues, and better scrub scheduling.

Dashboard
* Improved navigation layout

CephFS
* Support for managing CephFS snapshots and clones, as well as snapshot
schedule management
* Manage authorization capabilities for CephFS resources
* Helpers on mounting a CephFS volume

RBD
* diff-iterate can now execute locally, bringing a dramatic performance
improvement for QEMU live disk synchronization and backup use cases.
* Support for cloning from non-user type snapshots is added.
* rbd-wnbd driver has gained the ability to multiplex image mappings.

RGW
* The User Accounts feature unlocks several new AWS-compatible IAM APIs for
the self-service management of users, keys, groups, roles, policy and more.

Crimson/Seastore
* Crimson's first tech preview release! Supporting RBD workloads on
Replicated pools. For more information please visit:
https://ceph.io/en/news/crimson

We encourage you to read the full release notes at
https://ceph.io/en/news/blog/2024/v19-2-0-squid-released/

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-19.2.0.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: 16063ff2022298c9300e49a547a16ffda59baf13

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph can list volumes from a pool but can not remove the volume

2024-09-26 Thread bryansoong21
We have a volume in our cluster:

[r...@ceph-1.lab-a ~]# rbd ls volume-ssd
volume-8a30615b-1c91-4e44-8482-3c7d15026c28

[r...@ceph-1.lab-a ~]# rbd rm 
volume-ssd/volume-8a30615b-1c91-4e44-8482-3c7d15026c28
Removing image: 0% complete...failed.
rbd: error opening image volume-8a30615b-1c91-4e44-8482-3c7d15026c28: (2) No 
such file or directory
rbd: image has snapshots with linked clones - these must be deleted or 
flattened before the image can be removed.

Any ideas on how can I remove the volume? Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph can list volumes from a pool but can not remove the volume

2024-09-26 Thread Anthony D'Atri
https://docs.ceph.com/en/reef/rbd/rbd-snapshot/ should give you everything you 
need.

Sounds like maybe you have snapshots / clones that have left the parent 
lingering as a tombstone?

Start with

rbd children volume-ssd/volume-8a30615b-1c91-4e44-8482-3c7d15026c28
rbd info volume-ssd/volume-8a30615b-1c91-4e44-8482-3c7d15026c28
rbd du volume-ssd/volume-8a30615b-1c91-4e44-8482-3c7d15026c28

That looks like the only volume in that pool?  If targeted cleanup doesn’t 
work, you could just delete the whole pool, but triple check everything before 
taking action here.


> On Sep 25, 2024, at 1:50 PM, bryansoon...@gmail.com wrote:
> 
> We have a volume in our cluster:
> 
> [r...@ceph-1.lab-a ~]# rbd ls volume-ssd
> volume-8a30615b-1c91-4e44-8482-3c7d15026c28
> 
> [r...@ceph-1.lab-a ~]# rbd rm 
> volume-ssd/volume-8a30615b-1c91-4e44-8482-3c7d15026c28
> Removing image: 0% complete...failed.
> rbd: error opening image volume-8a30615b-1c91-4e44-8482-3c7d15026c28: (2) No 
> such file or directory
> rbd: image has snapshots with linked clones - these must be deleted or 
> flattened before the image can be removed.
> 
> Any ideas on how can I remove the volume? Thanks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Quincy: osd_pool_default_crush_rule being ignored?

2024-09-26 Thread Florian Haas

Hello everyone,

my cluster has two CRUSH rules: the default replicated_rule (rule_id 0), 
and another rule named rack-aware (rule_id 1).


Now, if I'm not misreading the config reference, I should be able to 
define that all future-created pools use the rack-aware rule, by setting 
osd_pool_default_crush_rule to 1.


I've verified that this option is defined in 
src/common/options/global.yaml.in, so the "global" configuration section 
should be the applicable one (I did try with "mon" and "osd" also, for 
good measure).


However, setting this option, in Quincy, apparently has no effect:

# ceph config set global osd_pool_default_crush_rule 1
# ceph osd pool create foo
pool 'foo' created
# ceph osd pool ls detail | grep foo
# pool 9 'foo' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 264 flags 
hashpspool stripe_width 0


I am seeing this behaviour in 17.2.7. After an upgrade to Reef (18.2.4) 
it is gone, the option behaves as documented, and new pools are created 
with a crush_rule of 1:


# ceph osd pool create bar
pool 'bar' created
# ceph osd pool ls detail | grep bar
pool 10 'bar' replicated size 3 min_size 2 crush_rule 1 object_hash 
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 302 flags 
hashpspool stripe_width 0 read_balance_score 4.00


However, the documentation at 
https://docs.ceph.com/en/quincy/rados/configuration/pool-pg-config-ref/#confval-osd_pool_default_crush_rule 
asserts that osd_pool_default_crush_rule should already work in Quincy, 
and the Reef release notes at 
https://docs.ceph.com/en/latest/releases/reef/ don't mention a fix 
covering this.


Am I doing something wrong? Is this a documentation bug, and the option 
can't work in Quincy? Was this "accidentally" fixed at some point in the 
Reef cycle?


Thanks in advance for any insight you might be able to share.

Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Dashboard TLS

2024-09-26 Thread matthew
Yeap, that was my issue (forgot to open up port 8443 in the firewall) 

Thanks for the help

PS Oh, and you *can* use ECC TLS Certs - if anyone wanted to know.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephalocon 2024 Developer Summit & New Users Workshop!

2024-09-26 Thread Neha Ojha
Dear Ceph Community,

We are happy to announce two key events, the Ceph Developer Summit and the
Ceph New Users Workshop (limited capacity), the day before Cephalocon 2024.
More details and registration information are now live on our Ceph website
!

Sponsorship opportunities are running out quickly! Check out the full
agenda on the Cephalocon website
 and get in touch with us
at Sponsor | LF Events
 to learn more
about how you can become a sponsor.

Can’t wait to see you in Geneva!
Best regards,
Neha, Josh, & Dan

On Thu, Sep 12, 2024 at 11:59 AM Dan van der Ster 
wrote:

> Dear Ceph Community,
>
> We are excited to announce that the agenda for Cephalocon 2024
>  is now
> live! This year’s event, hosted at CERN, the birthplace of Ceph at scale,
> promises to be a unique opportunity to learn from experts, network with the
> community, and explore the latest innovations in software-defined storage.
> Whether you're new to Ceph or a long-time expert, there will be sessions
> and workshops tailored to all levels.
>
> We also want to highlight that sponsorship opportunities are still
> available. Sponsoring Cephalocon is an excellent way to increase your
> visibility in this influential community while supporting the ongoing
> growth and development of Ceph.
>
> Check out the full agenda on the Cephalocon website
>  and get in touch with us
> at Sponsor | LF Events
>  to learn more
> about how you can become a sponsor.
>
> We look forward to seeing you in Geneva!
> Best regards,
> Neha, Josh, & Dan
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mds daemon damaged - assert failed

2024-09-26 Thread Eugen Block
It could be a bug, sure, but I haven't searched tracker too long,  
maybe there is an existing bug, I'd leave it to the devs to comment on  
that. But the assert alone isn't of much help (to me), more mds logs  
could help track this down.


Zitat von "Kyriazis, George" :


On Sep 25, 2024, at 1:05 AM, Eugen Block  wrote:

Great that you got your filesystem back.


cephfs-journal-tool journal export
cephfs-journal-tool event recover_dentries summary

Both failed


Your export command seems to be missing the output file, or was it  
not the exact command?


Yes I didn’t include the output file in my snippet.  Sorry for the  
confusion.  But the command did in fact complain that the journal  
was corrupted.




Also, I understand that the metadata itself is sitting on the  
disk, but it looks like a single point of failure.  What’s the  
logic behind having a simple metadata location, but multiple mds  
servers?


I think there's a misunderstanding, the metadata is in the cephfs  
metadata pool, not on the local disk of your machine.




By “disk” I meant the concept of permanent storage, ie. Ceph.  Yes,  
our understanding matches.  But the question still remains, as to  
why that assert would trigger.  Is it because of a software issue  
(bug?) that caused the journal to be corrupted, or something else  
corrupted the journal that caused the MDS to throw the assertion?   
Basically, I’m trying to find what could be a possible root-cause..


Thank you!

George




Zitat von "Kyriazis, George" :


I managed to recover my filesystem.

cephfs-journal-tool journal export
cephfs-journal-tool event recover_dentries summary

Both failed

But truncating the journal and following some of the instructions  
in  
https://people.redhat.com/bhubbard/nature/default/cephfs/disaster-recovery-experts/ helped me to get the mds  
up.


Then I scrubbed and repaired the filesystem, and I “believe” I’m  
back in business.


What is weird though is that an assert failed as shown in the  
stack dump below.  Was that a legitimate assertion that indicates  
a bigger issue, or was it a false assertion?


Also, I understand that the metadata itself is sitting on the  
disk, but it looks like a single point of failure.  What’s the  
logic behind having a simple metadata location, but multiple mds  
servers?


Thanks!

George


On Sep 24, 2024, at 5:55 AM, Eugen Block  wrote:

Hi,

I would probably start by inspecting the journal with the  
cephfs-journal-tool [0]:


cephfs-journal-tool [--rank=:{mds-rank|all}] journal inspect

And it could be helful to have the logs prior to the assert.

[0]  
https://docs.ceph.com/en/latest/cephfs/cephfs-journal-tool/#example-journal-inspect


Zitat von "Kyriazis, George" :

Hello ceph users,

I am in the unfortunate situation of having a status of “1 mds  
daemon damaged”.  Looking at the logs, I see that the daemon died  
with an assert as follows:


./src/osdc/Journaler.cc: 1368: FAILED ceph_assert(trim_to > trimming_pos)

ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748)  
reef (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x12a) [0x73a83189d7d9]

2: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
3: (Journaler::_trim()+0x671) [0x57235caa70b1]
4: (Journaler::_finish_write_head(int, Journaler::Header&,  
C_OnFinisher*)+0x171) [0x57235caaa8f1]

5: (Context::complete(int)+0x9) [0x57235c716849]
6: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
7: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
8: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]

   0> 2024-09-23T14:10:26.490-0500 73a822c006c0 -1 *** Caught  
signal (Aborted) **

in thread 73a822c006c0 thread_name:MR_Finisher

ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748)  
reef (stable)

1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x73a83105b050]
2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x73a8310a9e2c]
3: gsignal()
4: abort()
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x185) [0x73a83189d834]

6: /usr/lib/ceph/libceph-common.so.2(+0x29d974) [0x73a83189d974]
7: (Journaler::_trim()+0x671) [0x57235caa70b1]
8: (Journaler::_finish_write_head(int, Journaler::Header&,  
C_OnFinisher*)+0x171) [0x57235caaa8f1]

9: (Context::complete(int)+0x9) [0x57235c716849]
10: (Finisher::finisher_thread_entry()+0x16d) [0x73a83194659d]
11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x73a8310a8134]
12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x73a8311287dc]
NOTE: a copy of the executable, or `objdump -rdS ` is  
needed to interpret this.



As listed above, I am running 18.2.2 on a proxmox cluster with a  
hybrid hdd/sdd setup.  2 cephfs filesystems.  The mds responsible  
for the hdd filesystem is the one that died.


Output of ceph -s follows:

root@vis-mgmt:~/bin# ceph -s
cluster:
  id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452
  health: HEALTH_ERR
  1 filesystem is degraded
  1 filesystem is offline
  

[ceph-users] Re: Quincy: osd_pool_default_crush_rule being ignored?

2024-09-26 Thread Eugen Block
Hm, I don't know much about ceph-ansible. Did you check if there was  
any config set for a specific daemon or something, which would  
override the config set? For example, 'ceph config show-with-defaults  
mon.' for each mon, and then also check 'ceph config dump | grep  
rule'. I would also probably grep for crush_rule in all the usual  
places.


Zitat von Florian Haas :


On 25/09/2024 15:21, Eugen Block wrote:

Hm, do you have any local ceph.conf on your client which has an
override for this option as well?


No.


By the way, how do you bootstrap your cluster? Is it cephadm based?


This one is bootstrapped (on Quincy) with ceph-ansible. And when the  
"ceph config set" change didn't make a difference, I did also make a  
point of cycling all my mons and osds (which shouldn't be necessary,  
but I figured I'd try that, just in case).


And I also confirmed this same issue, in Quincy, after the cluster  
was adopted into cephadm management. At that point, the behaviour  
was still unchanged.


It was only after I upgraded the cluster to Reef, with  
cephadm/ceph orch, that the problem went away.


Cheers,
Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-09-26 Thread Bob Gibson
Thanks for your reply Eugen. I’m fairly new to cephadm so I wasn’t aware that 
we could manage the drives without rebuilding them. However, we thought we’d 
take advantage of this opportunity to also encrypt the drives, and that does 
require a rebuild.

I have a theory on why the orchestrator is confused. I want to create an osd 
service for each osd node so I can manage drives on a per-node basis.

I started by creating a spec for the first node:

service_type: osd
service_id: ceph-osd31
placement:
  hosts:
  - ceph-osd31
spec:
  data_devices:
rotational: 0
size: '3TB:'
  encrypted: true
  filter_logic: AND
  objectstore: bluestore

But I also see a default spec, “osd”, which has placement set to “unmanaged”.

`ceph orch ls osd —export` shows the following:

service_type: osd
service_name: osd
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: ceph-osd31
service_name: osd.ceph-osd31
placement:
  hosts:
  - ceph-osd31
spec:
  data_devices:
rotational: 0
size: '3TB:'
  encrypted: true
  filter_logic: AND
  objectstore: bluestore

`ceph orch ls osd` shows that I was able to convert 4 drives using my spec:

NAMEPORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd 95  10m ago-
osd.ceph-osd31   4  10m ago43m  ceph-osd31

Despite being able to convert 4 drives, I’m wondering if these specs are 
conflicting with one another, and that has confused the orchestrator. If so, 
how do I safely get from where I am now to where I want to be? :-)

Cheers,
/rjg

On Sep 26, 2024, at 3:31 PM, Eugen Block  wrote:

EXTERNAL EMAIL | USE CAUTION

Hi,

this seems a bit unnecessary to rebuild OSDs just to get them managed.
If you apply a spec file that targets your hosts/OSDs, they will
appear as managed. So when you would need to replace a drive, you
could already utilize the orchestrator to remove and zap the drive.
That works just fine.
How to get out of your current situation is not entirely clear to me
yet. I’ll reread your post tomorrow.

Regards,
Eugen

Zitat von Bob Gibson :

Hi,

We recently converted a legacy cluster running Quincy v17.2.7 to
cephadm. The conversion went smoothly and left all osds unmanaged by
the orchestrator as expected. We’re now in the process of converting
the osds to be managed by the orchestrator. We successfully
converted a few of them, but then the orchestrator somehow got
confused. `ceph health detail` reports a “stray daemon” for the osd
we’re trying to convert, and the orchestrator is unable to refresh
its device list so it doesn’t see any available devices.

From the perspective of the osd node, the osd has been wiped and is
ready to be reinstalled. We’ve also rebooted the node for good
measure. `ceph osd tree` shows that the osd has been destroyed, but
the orchestrator won’t reinstall it because it thinks the device is
still active. The orchestrator device information is stale, but
we’re unable to refresh it. The usual recommended workaround of
failing over the mgr hasn’t helped. We’ve also tried `ceph orch
device ls —refresh` to no avail. In fact after running that command
subsequent runs of `ceph orch device ls` produce no output until the
mgr is failed over again.

Is there a way to force the orchestrator to refresh its list of
devices when in this state? If not, can anyone offer any suggestions
on how to fix this problem?

Cheers,
/rjg

P.S. Some additional information in case it’s helpful...

We’re using the following command to replace existing devices so
that they’re managed by the orchestrator:

```
ceph orch osd rm  --replace —zap
```

and we’re currently stuck on osd 88.

```
ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
   stray daemon osd.88 on host ceph-osd31 not managed by cephadm
```

`ceph osd tree` shows that the osd has been destroyed and is ready
to be replaced:

```
ceph osd tree-from ceph-osd31
ID   CLASS  WEIGHTTYPE NAMESTATUS REWEIGHT  PRI-AFF
-46 34.93088  host ceph-osd31
84ssd   3.49309  osd.84  up   1.0  1.0
85ssd   3.49309  osd.85  up   1.0  1.0
86ssd   3.49309  osd.86  up   1.0  1.0
87ssd   3.49309  osd.87  up   1.0  1.0
88ssd   3.49309  osd.88   destroyed 0  1.0
89ssd   3.49309  osd.89  up   1.0  1.0
90ssd   3.49309  osd.90  up   1.0  1.0
91ssd   3.49309  osd.91  up   1.0  1.0
92ssd   3.49309  osd.92  up   1.0  1.0
93ssd   3.49309  osd.93  up   1.0  1.0
```

The cephadm log shows a claim on node `ceph-osd31` for that osd:

```
2024-09-25T14:15:45.699348-0400 mgr.ceph-mon3.qzjgws [INF] Found osd
claims -> {'ceph-osd31': ['88']}
2024-09-25T14:15:45.699534-040

[ceph-users] Re: CephFS snaptrim bug?

2024-09-26 Thread Linkriver Technology
Hello,

We recently upgraded to Quincy (17.2.7) and I can see in the ceph logs
many messages of the form:

1713256584.3135679 osd.28 (osd.28) 66398 : cluster 4 osd.28 found snap
mapper error on pg 7.284 oid 7:214b503b:::100125de9b8.:5c snaps
in mapper: {}, oi: {5a} ...repaired
1713256584.3136106 osd.28 (osd.28) 66399 : cluster 4 osd.28 found snap
mapper error on pg 7.284 oid 7:214b4f95:::1001654390d.:5c snaps
in mapper: {}, oi: {5a} ...repaired
1713256584.3136535 osd.28 (osd.28) 66400 : cluster 4 osd.28 found snap
mapper error on pg 7.284 oid 7:214b4f3f:::1001549ed54.:5c snaps
in mapper: {}, oi: {5a} ...repaired
1713256584.9496887 osd.29 (osd.29) 70001 : cluster 4 osd.29 found snap
mapper error on pg 7.b4 oid 7:2d089bdc:::10016105140.:5c snaps
in mapper: {}, oi: {5a} ...repaired
1713256590.9785151 osd.28 (osd.28) 66401 : cluster 4 osd.28 found snap
mapper error on pg 7.284 oid 7:214b5179:::100128b85a0.0cfe:5c snaps
in mapper: {}, oi: {5a} ...repaired
1713256598.6286905 osd.29 (osd.29) 70002 : cluster 4 osd.29 found snap
mapper error on pg 7.17c oid 7:3e877f95:::100151d8670.:5c snaps
in mapper: {}, oi: {5a} ...repaired
...

A cursory reading of the code involved suggests that the scrubber in
Quincy has acquired the capacity of detecting and removing the lost
snapshots from Octopus, if I understand it correctly.

Cheers,

Linkriver Technology

On Sat, 2022-06-25 at 19:36 +, Kári Bertilsson wrote:
> Hello
> 
> I am also having this issue after having
> set osd_pg_max_concurrent_snap_trims = 0 previously to pause the
> snaptrim. I upgraded to ceph 17.2.0. Have tried restarting,
> repeering, deep-scrubbing all OSD's, so far nothing works.
> 
> For one of the affected pools `cephfs_10k` I have tested removing
> ALL data and it's still showing 26% usage. All snapshots have been
> deleted and all pg's for the pool remain at SNAPTRIMQ_LEN = 0. All
> pg's are active+clean.
> 
> The pool still shows 589k object usage. When testing `rados get
> object` on all the objects, it only works for 2.420 of them. The rest
> seem to be in some kind of limbo and can not be read or deleted using
> rados.
> 
> # rados -p cephfs_10k listsnaps 10010539c22.
>  
> 
> 10010539c22.:
> cloneid snaps   size    overlap
> 288 288 30767656    []
> 
> # rados -p cephfs_10k get 10010539c22. 10010539c22.
> error getting cephfs_10k/10010539c22.: (2) No such file or
> directory
> 
> # rados -p cephfs_10k rm 10010539c22.  
> error removing cephfs_10k>10010539c22.: (2) No such file or
> directory
> 
> Is there some way to make the snap trimmer rediscover these objects
> and remove them ?
> 
> On Fri, Mar 18, 2022 at 2:21 PM Linkriver Technology
>  wrote:
> > Hello,
> > 
> > If I understand my issue correctly, it is in fact unrelated to
> > CephFS itself,
> > rather the problem happens at a lower level (in Ceph itself). IOW,
> > it affects
> > all kind of snapshots, not just CephFS ones. I believe my FS is
> > healthy
> > otherwise. In any case, here is the output of the command you
> > asked:
> > 
> > I ran it a few hours ago:
> > 
> >         "num_strays": 235,
> >         "num_strays_delayed": 38,
> >         "num_strays_enqueuing": 0,
> >         "strays_created": 5414436,
> >         "strays_enqueued": 5405983,
> >         "strays_reintegrated": 17892,
> >         "strays_migrated": 0,
> > 
> > And just now:
> > 
> >         "num_strays": 186,
> >         "num_strays_delayed": 0,
> >         "num_strays_enqueuing": 0,
> >         "strays_created": 5540016,
> >         "strays_enqueued": 5531494,
> >         "strays_reintegrated": 18128,
> >         "strays_migrated": 0,
> > 
> > 
> > Regards,
> > 
> > LRT
> > 
> > -Original Message-
> > From: Arnaud M 
> > To: Linkriver Technology 
> > Cc: Dan van der Ster , Ceph Users
> > 
> > Subject: [ceph-users] Re: CephFS snaptrim bug?
> > Date: Thu, 17 Mar 2022 21:48:18 +0100
> > 
> > Hello Linkriver
> > 
> > I might have an issue close to your
> > 
> > Can you tell us if your strays dirs are full ?
> > 
> > What does this command output to you ?
> > 
> > ceph tell mds.0 perf dump | grep strays
> > 
> > Does the value change over time ?
> > 
> > All the best
> > 
> > Arnaud
> > 
> > Le mer. 16 mars 2022 à 15:35, Linkriver Technology <
> > technol...@linkriver-capital.com> a écrit :
> > 
> > > Hi,
> > > 
> > > Has anyone figured whether those "lost" snaps are rediscoverable
> > /
> > > trimmable?
> > > All pgs in the cluster have been deep scrubbed since my previous
> > email and
> > > I'm
> > > not seeing any of that wasted space being recovered.
> > > 
> > > Regards,
> > > 
> > > LRT
> > > 
> > > -Original Message-
> > > From: Dan van der Ster 
> > > To: technol...@linkriver-capital.com
> > > Cc: Ceph Users , Neha Ojha 
> > > Subject: Re: [ceph-users] CephFS snaptrim bug?
> > > Date: 

[ceph-users] Re: Quincy: osd_pool_default_crush_rule being ignored?

2024-09-26 Thread Florian Haas

On 25/09/2024 09:05, Eugen Block wrote:

Hi,

for me this worked in a 17.2.7 cluster just fine


Huh, interesting!


(except for erasure-coded pools).


Okay, *that* bit is expected. 
https://docs.ceph.com/en/quincy/rados/configuration/pool-pg-config-ref/#confval-osd_pool_default_crush_rule 
does say that the option sets the "default CRUSH rule to use when 
creating a replicated pool".



quincy-1:~ # ceph osd crush rule create-replicated new-rule default osd hdd


Mine was a rule created with "create-simple"; would that make a difference?

Cheers,
Florian



OpenPGP_signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm bootstrap ignoring --skip-firewalld

2024-09-26 Thread Kozakis, Anestis
As I mentioned in my earlier e-mail, new to Ceph, and trying to set up 
automation to deploy, configure, and manage a Ceph cluster.

We configure our Firewall rules through SaltStack.

I am passing the -skip-firewalld option to the cephadm bootstrap command, but 
cephadm seems to ignore the option and configures the firewall anyway.

I have even reconfigured the options order to be the same as cephadm boostrap 
-help but it still ignores the option and configures the firewall.  This 
creates issues as it configures the public zone, which we don't want changed.

Below is the command we are using (with obvious settings changed/removed).

cephadm bootstrap --mon-ip 10.0.0.0 --mgr-id host.domain.name --fsid [fsid] 
--ssh-private-key id_rsa --ssh-public-key id_rsa.pub --ssh-user [user]  
--cluster-network 192.168.0.0/25 --allow-fqdn-hostname --config=./ceph.conf 
--initial-dashboard-user admin --initial-dashboard-password SuperS3cr3tPassw0rd 
--dashboard-password-noupdate --skip-firewalld --with-centralized-logging 
--apply-spec=spec.yaml

What am I missing?

Anestis Kozakis
Systems Administrator  - Multi-Level Security Solutions

P: + 61 2 6122 0205
M: +61 4 88 376 339
anestis.koza...@raytheon.com.au

Raytheon Australia
Cybersecurity and Information Assurance
4 Brindabella Cct
Brindabella Business Park
Canberra Airport, ACT 2609

www.raytheonaustralia.com.au
LinkedIn | 
Twitter | 
Facebook | 
Instagram

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io