[ceph-users] Re: Squid Manager Daemon: balancer crashing orchestrator and dashboard

2024-10-28 Thread Laura Flores
Hey all,

I believe I have found the issue. You can follow updates on
https://tracker.ceph.com/issues/68657.

Thanks,
Laura

On Fri, Oct 25, 2024 at 9:29 AM Kristaps Čudars 
wrote:

> Experiencing the same problem.
> Disabling balancer helps.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Steering Committee Meeting 2024-10-28

2024-10-28 Thread Laura Flores
Hi all,

Topics we discussed today included an explanation of the CVE fix process,
an evaluation of the OS recommendation documentation, a discussion on some
options about releasing el8 packages for the upcoming quincy point release,
and CEC election results.

Here are the highlights from today's meeting:

   - Existing CVEs and fix/release process [Ernesto + Sage + Yaarit]:


   - Sage: Better to build things into the process instead of a hotfix
   (exception for serious situation)


   - Revising CVE reporting process to use github


   - https://docs.ceph.com/en/latest/security/process/


   - [zdover] os-recommendations.rst possibly out of date -- who is the
   authority on which distros are supported?


   - add squid: centos 9 (AH), ubuntu 22 (AH), debian 12 (C)


   - Is Debian 12 supported in Reef?


   - i think so, based on
   https://download.ceph.com/debian-reef/dists/bookworm/ --[zdover again]:
   that was exactly the source upon which I based the question. So I think I
   have all the information I need.


   - [cbodley] should quincy release block on centos 8 or no?


   - Blocked on adding back a centos 8 builder to our matrix


   - Community members have managed to build packages- could we point to
   these?


   - CERN archives is another option


   - If we drop in packages/containers built outside of the usual upstream
   build process, how confident are we that these artifacts will be
   functionally identical to what we would have built ourselves?  cf. the
   issues seen in the past with Debian/Ubuntu-channel builds.  A build done by
   a community member might have different compilation options, directory
   paths, non-stock libraries etc.


   - Will meet later on this with Dan Mick


   - Election:


   - Results:
   
https://vote.heliosvoting.org/helios/elections/e03494ce-e04c-41d0-bb05-ec5ccc632ce4/view


   - Governance Update: https://github.com/ceph/ceph/pull/60518


   - https://pad.ceph.com/p/css-vote-2024q4


Thanks,
Laura
-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-10-28 Thread Bob Gibson
I enabled debug logging with `ceph config set mgr 
mgr/cephadm/log_to_cluster_level debug` and viewed the logs with `ceph -W 
cephadm --watch-debug`. I can see the orchestrator refreshing the device list, 
and this is reflected in the `ceph-volume.log` file on the target osd nodes. 
When I restart the mgr, `ceph orch device ls` reports each device with “5w ago” 
under the “REFRESHED” column. After the orchestrator attempts to refresh the 
device list, `ceph orch device ls` stops outputting any data at all until I 
restart the mgr again.

I discovered that I can query the cached device data using `ceph config-key 
dump`. On the problematic cluster, the `created` attribute is stale, e.g.

ceph config-key dump | jq -r .'"mgr/cephadm/host.ceph-osd31.devices.0"' | jq 
.devices[].created
"2024-09-23T17:56:44.914535Z"
"2024-09-23T17:56:44.914569Z"
"2024-09-23T17:56:44.914591Z"
"2024-09-23T17:56:44.914612Z"
"2024-09-23T17:56:44.914632Z"
"2024-09-23T17:56:44.914652Z"
"2024-09-23T17:56:44.914672Z"
"2024-09-23T17:56:44.914692Z"
"2024-09-23T17:56:44.914711Z"
"2024-09-23T17:56:44.914732Z"

whereas on working clusters the `created` attribute is set to the time the 
device information was last cached, e.g.

ceph config-key dump | jq -r .'"mgr/cephadm/host.ceph-osd1.devices.0"' | jq 
.devices[].created
"2024-10-28T21:49:29.510593Z"
"2024-10-28T21:49:29.510635Z"
"2024-10-28T21:49:29.510657Z"
"2024-10-28T21:49:29.510678Z"

It appears that the orchestrator is polling the devices but failing to update 
the cache for some reason. It would be interesting to see what happens if I 
removed one of these device entries from the cache, but the cluster is in 
production so I’m hesitant to poke at it.

We have a maintenance window scheduled in December which will provide an 
opportunity to perform a complete restart of the cluster. Hopefully that will 
clean things up. In the meantime, I’ve set all devices to be unmanaged, and the 
cluster is otherwise healthy, so unless anyone has any other ideas to offer I 
guess I’ll just leave things as-is until the maintenance window.

Cheers,
/rjg

On Oct 25, 2024, at 10:31 AM, Bob Gibson  wrote:

[…]
My hunch is that some persistent state is corrupted, or there’s something else 
preventing the orchestrator from successfully refreshing its device status, but 
I don’t know how to troubleshoot this. Any ideas?

I don't think this is related to the 'osd' service. As suggested by Tobi, 
enabling cephadm debug will tell you more.

Agreed. I’ll dig through the logs some more today to see if I can spot any 
problems.

Cheers,
/rjg

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Strange container restarts?

2024-10-28 Thread Eugen Block

Hi,

I haven't looked too deep into it yet, but I think it's the regular  
cephadm check. The timestamps should match those in the  
/var/log/ceph/cephadm.log, where you can see something like that:


cephadm ['--image', '{YOUR_REGISTRY}', 'ls']

It goes through your inventory and runs several 'gather-facts'  
commands and a couple more. I don't think you need to worry about this.


Regards,
Eugen

Zitat von Jan Marek :


Hello,

we have ceph cluster which consists of 12 host, on every host we
have 12 NVMe "disks".

On most of these host (9 of 12) we have in logs errors, see
attached file.

We tried to check this problem, and we have these points:

1) On every host there is only one OSD. Thus it's not problem in
version 18.2.2 generally, because there will be on another OSD,
not only one of host?

2) Sometimes one of this OSD crashed :-( It seems, that crashed
OSD are from set of OSDs, which have this problem.

3) ceph cluster goes OK and it "doesn't know" about any problem
with these OSD. It seem's, that this new instance of ceph-osd
daemon tried to start either podman or conmon itself. We've tried
to control PID files for conman, but they're seems to be OK?

4) We tried to check 'ceph orch' command, but it does not try to
start these containers, because it know, that they exists and run
('ceph orch ps' list these containers as running).

5) I've tried to pause ochestrator, but I've still found in syslog
these entries... :-(

Please, is there any possibility to find out, where is problem
and stop this?

We have all of the ceph host prepared by ansible, thus there is
the same environment.

On every machine we have podman version 4.3.1+ds1-8+deb12u1 and
conmon version 2.1.6+ds1-1. OS is Debian bookworm.

Attached logs was prepared by:

grep exec_died /var/log/syslog

Sincerely
Jan Marek
--
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-10-28 Thread Eugen Block
You're right about deleting the service, of course. I wasn't very  
clear in my statement, what I actually meant was that it won't be  
removed entirely until all OSDs report a different spec in their  
unit.meta file. I forgot to add that info in my last response, that's  
actually how I've done it several times after adopting a cluster.  
Thanks for clearing that up! :-)


Zitat von Frédéric Nass :

- Le 25 Oct 24, à 18:21, Frédéric Nass  
frederic.n...@univ-lorraine.fr a écrit :



- Le 25 Oct 24, à 16:31, Bob Gibson r...@oicr.on.ca a écrit :


HI Frédéric,

I think this message shows up as this very specific post adoption  
'osd' service
has already been marked as 'deleted'. Maybe when you ran the  
command for the

first time.
The only reason it still shows up on 'ceph orch ls' is that 95  
OSDs are still

referencing this service in their configuration.

Once you'll have edited all OSDs /var/lib/ceph/$(ceph  
fsid)/osd.xxx/unit.meta
files (changed their service_name) and restarted all OSDs (or  
recreated these
95 OSDs encrypted under another service_name), the 'osd' service  
will disappear
by itself and won't show up anymore on 'ceph orch ls' output. At  
least this is

what I've observed in the past.


Yes, as Eugen pointed out, it doesn’t make sense to try to delete  
an unmanaged

service using the orchestrator.


Well actually you **can** delete a service whatever it's status (managed or
unmanaged).


To explain a bit more, see below:

$ ceph orch ls --export osd osd.delete
service_type: osd
service_id: delete
service_name: osd.delete
placement:
  hosts:
  - test-mom02h01
unmanaged: true
spec:
  data_devices:
size: :11G
  db_devices:
size: '12G:'
  db_slots: 2
  filter_logic: AND
  objectstore: bluestore

This is what you should expect:

$ ceph orch rm osd.delete
Error EINVAL: If osd.delete is removed then the following OSDs will  
remain, --force to proceed anyway

host test-mom02h01: osd.11

$ ceph orch rm osd.delete --force<--- ok, let's force it
Removed service osd.delete

$ ceph orch ls | grep osd
osd.delete 1  95s ago- 
 <--- still here because used by 1 OSD

osd.standard  12  9m ago 8w   label:osds

$ ceph orch rm osd.delete
Invalid service 'osd.delete'. Use 'ceph orch ls' to list available  
services.  <--- but not for the orchestrator


$ sed -i 's/osd.delete/osd.standard/g' /var/lib/ceph/$(ceph  
fsid)/osd.11/unit.meta  <--- remove this service from osd.11  
configuration


$ ceph orch daemon restart osd.11
Scheduled to restart osd.11 on host 'test-mom02h01'

$ ceph orch ls | grep osd
osd.standard  13  8m ago 8w
label:osds<--- osd.delete service finally gone


The osd.delete service is finally gone right after the last OSD  
stopped referencing it.


With the very specific post adoption 'osd' service, when you try to  
delete it, it doesn't complain about existing OSDs referencing it  
(when it should...) and doesn't require you to use the --force  
argument.
It just deletes the service (that will finally be removed when no  
more OSDs are using it).


The fact that the 'ceph orch rm' output is not consistent when  
deleting the 'osd' post adoption service and deleting any other osd  
service that you create looks like a bug to me.


But anyways, that was just to say you can delete an osd service  
whatever it's status (managed or unmanaged).


Cheers,
Frédéric.

It works just fine with any osd service other than this specific  
post adoption

'osd' service. Don't know why.

Frédéric.



My hunch is that some persistent state is corrupted, or there’s  
something else
preventing the orchestrator from successfully refreshing its  
device status, but

I don’t know how to troubleshoot this. Any ideas?


I don't think this is related to the 'osd' service. As suggested by Tobi,
enabling cephadm debug will tell you more.


Agreed. I’ll dig through the logs some more today to see if I can spot any
problems.

Cheers,
/rjg

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph native clients

2024-10-28 Thread Burkhard Linke

Hi,

On 10/26/24 18:45, Tim Holloway wrote:

On the whole, I prefer to use NFS for my clients to use Ceph
filesystem. It has the advantage that NFS client/mount is practically
guaranteed to be pre-installed on all my client systems.

On the other hand, there are downsides. NFS (Ceph/NFS-Ganesha) has been
known to be cranky on my network when it comes to serving resources on
occasion, and should the NFS host go down, the NFS client cannot
fallback to the alternate NFS host without assistance. Which I should
probably add, since I run keepalive anyway, but that's another matter.

My issue, then is that if I do a native/FUSE mount of Ceph FS on my
desktop it hurts Ceph. The desktop hibernates when not in use and that
causes the network communications between it and the Ceph servers to
break until it wakes up again. In the mean time, I get "laggy OSD"
errors.

In extreme cases, where the desktop doesn't wake up clean and requires
a hardware reset, I think it even ends up with orphan connections,
since at one point I recall having to forcible terminate close to 20
connections.

Ideally, then, since systemd is handling hibernation, there should be a
hook in the Ceph client to ensure it and the Ceph servers are at peace
with each other while it's down. Probably better reporting on the Ceph
admin side as to the hostnames of clients with connection issues, but
that's just icing.



Do not use a network file system with hibernating machines. It will not 
work reliably.


Both NFS and especially CephFS can delegate responsibility to the 
clients. In case of NFSv4 these are the delegates, in case of CephFS it 
are cephfs capabilities. So the filesystem needs to have a back channel 
to the clients in case capabilities / delegates have to be revoked.


This won't work if the client is not reachable. Access to the 
files/directory protected by a capability might not be possible 
depending on the type of capability granted. E.g. in case of a write 
capability read access might be stalled; in case of a read capability 
any other client requesting write access will be blocked.


To make things worse, the cephfs capabilities also cover cached files. 
So even files not actively used by a client might be blocked


A hook might reduce the impact of the problem, but it won't resolve it. 
At least the mountpoint itself will be as active capability on the 
client. This is the cap list of a client without any active file access 
and after dropping the caches:



# cat caps
total        8193
avail        8192
used        1
reserved    0
min        8192

ino  mds  issued   implemented
--
0x1    0  pAsLsXsFs    pAsLsXsFs


Waiters:

tgid ino    need want
-

(see /sys/kernel/debug/ceph/./caps)


TL;DR: do not use cephfs with hibernating clients.


We had the same problem with our desktops a while ago, and decided to 
switch to a NFS re-export of the CephFS filesystem. This has proven to 
be much more reliable in case of hibernation. But as you already 
mentioned NFS also has other problems...



Regards,

Burkhard Linke

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Install on Ubuntu Noble on Arm64?

2024-10-28 Thread Robert Sander

Hi

On 10/25/24 19:57, Daniel Brown wrote:


Think I’ve asked this before but — has anyone attempted to use a cephadm type 
install with Debian Nobel running on Arm64? Have tried both Reef and Squid, 
neither gets very far. Do I need to file a request for it?


You mean Ubuntu Noble, right?

For Ceph Squid there are packages available for Ubuntu Jammy and Debian 
Bookworm: https://download.ceph.com/debian-squid/dists/


For Ceph Reef there are packages available for Ubuntu Focal, Ubuntu 
Jammy and Debian Bookworm: https://download.ceph.com/debian-reef/dists/


I do not think that the Ceph project builds packages for non-LTS 
distribution versions.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Influencing the osd.id when creating or replacing an osd

2024-10-28 Thread Robert Sander

On 10/18/24 03:14, Shain Miley wrote:


I know that it really shouldn’t matter what osd number gets assigned to the 
disk but as the number of osd increases it is much easier to keep track of 
where things are if you can control the id when replacing failed disks or 
adding new nodes.


My advice: Do not try to manually number your OSDs.

Use commands like "ceph osd tree-from HOST" to list the OSDs of a host 
or "ceph osd ok-to-stop ID" to see if an OSD may be stopped or "ceph osd 
metadata ID" to see where the OSD is running.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Influencing the osd.id when creating or replacing an osd

2024-10-28 Thread Robert Sander

On 10/28/24 17:41, Dave Hall wrote:


However, it would be nice to have something like 'ceph osd location {id}'
from the command line.  If such exists, I haven't seen it.


"ceph osd metadata {id} | jq -r .hostname" will give you the hostname

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Procedure for temporary evacuation and replacement

2024-10-28 Thread Frank Schilder
Hi everybody,

apparently, I forgot to report back. The evacuation completed without problems 
and we are replacing disks at the moment. This procedure worked like a charm 
(please read the thread to see why we didn't just shut down OSDs and used 
recovery for rebuild):

1.) For all OSDs: ceph osd out ID # just set them out, this is sticky and does 
what you want
2.) Wait for rebalance to finish
3.) Replace disks.
4.) Deploy OSDs with the same IDs as before per host.
5.) Start OSDs and let rebalance back.

During the evacuation you might want to consider setting "osd_delete_sleep" to 
a high value to avoid issues due to PG removal reported in this thread; see 
messages by Joshua Baergen in this thread.

The only wish I have is that after setting the OSDs "out" it would be great to 
have an option to have recovery kick in in addition to speed up movement of 
data. Instead of just reading shard by shard from the out-OSDs, shards should 
also be reconstructed by recovery from all other OSDs. Our evacuation lasted 
for about 2 weeks. If recovery would kick in, this time would go down to 2-3 
days.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Szabo, Istvan (Agoda) 
Sent: Monday, October 28, 2024 4:41 AM
To: Frank Schilder
Subject: Re: [ceph-users] Re: Procedure for temporary evacuation and replacement

Hi Frank,

Finally what was the best way to do this evacuation replacement?
I want to destroy all my osds node by node in my cluster due to high 
fragmentation so might follow your method.

Thank you
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW lifecycle wrongly removes NOT expired delete-markers which have a objects

2024-10-28 Thread Александр Руденко
Hi,

We have very strange behavior, when LC "restores" previous versions of
objects.

It looks like:

1. We have latest object:

*object1*

2. We remove object and have delete-marker on top of it. We can't see
object1 in bucket listing:

*marker (latest)*
*object1 (not latests, version1)*

3. LC decides to remove marker as Expired:

*RGW log:
DELETED::REDACTED-bucket-REDACTED[default.1471649938.1]):REDACTED-object-REDACTED[N3gFGHIZLylFcO9RdnlChEfIAL076VC]
(delete marker expiration) wp_thrd: 0, 0*

4. and we can see object1 in bucket listing:

*object1 (latest, version1)*

But, it is wrong behavior because ExpiredObjectDeleteMarker - it's marker
that doesn't have ANY version under it.
Unfortunately we can't reproduce this behavior in our test environment
(proof that it's not a normal behavior). But it occurs regularly in our
production in one customer's bucket. It crashes the customer's application
because for it, it looks like an "unknown file".

We are 100% sure that it's LC. Because we have logs when an object was
uploaded, removed (no version specified), and we have LC log which removes
DM but object1 exists.
In this case we have unique object names like sha256. It means that we cant
have name duplication in this bucket, it can't be a new object with the
same name.
And we have mtime of "restored" object1 which in the past, for example 20
days past. All our findings are matched together.

LC policy:
{
"Rules": [
{
"Status": "Enabled",
"Prefix": "",
"NoncurrentVersionExpiration": {
"NoncurrentDays": 30
},
"Expiration": {
"ExpiredObjectDeleteMarker": true
},
"ID": "Remove all NOT latest version"
}
]
}

Ceph 16.2.13
We have about 20 RGW instances and only two of them have LC thread enabled.
The bucket has about 1.5M objects and 701 shards. I know that it's too many
shards for this bucket, but it's another story)

Where can be a problem?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephfs snapshot

2024-10-28 Thread s . dhivagar . cse
Hi..

If we enable cephfs snapshot will we face performance issue? And does snapshots 
take up any specific storage?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Install on Ubuntu Noble on Arm64?

2024-10-28 Thread Daniel Brown
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Influencing the osd.id when creating or replacing an osd

2024-10-28 Thread Anthony D'Atri


> Yes, but it's irritating. Ideally, I'd like my OSD IDs and hostnames to track 
> so that if a server going pong I can find it and fix it ASAP

`ceph osd tree down` etc. (including alertmanager rules and Grafana panels) 
arguably make that faster and easier than everyone having to memorize OSD 
numbers, especially as the clusters grow.

> But it doesn't take much maintenance to break that scheme and the only thing 
> more painful than renaming a Ceph host is re-numbering an OSD.

Yep!

>> My advice: Do not try to manually number your OSDs.

This.  I’ve been there myself, but it’s truly a sisyphean goal.  

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Destroyed OSD clinging to wrong disk

2024-10-28 Thread Dave Hall
Hello.

The following is on a Reef Podman installation:

In attempting to deal over the weekend with a failed OSD disk, I have
somehow managed to have two OSDs pointing to the same HDD, as shown below.

[image: image.png]

To be sure, the failure occurred on OSD.12, which was pointing to
/dev/sdi.

I disabled the systemd unit for OSD.12 because it kept restarting.  I then
destroyed it.

When I physically removed the failed disk and rebooted the system, the disk
enumeration changed.  So, before the reboot, OSD.12 was using /dev/sdi.
After the reboot, OSD.9 moved to /dev/sdi.

I didn't know that I had an issue until 'ceph-volume lvm prepare' failed.
It was in the process of investigating this that I found the above.  Right
now I have reinserted the failed disk and rebooted, hoping that OSD.12
would find its old disk by some other means, but no joy.

My concern is that if I run 'ceph osd rm' I could take out OSD.9.  I could
take the precaution of marking OSD.9 out and let it drain, but I'd rather
not.  I am, perhaps, more inclined to manually clear the lingering
configuration associated with OSD.12 if someone could send me the list of
commands. Otherwise, I'm open to suggestions.

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Influencing the osd.id when creating or replacing an osd

2024-10-28 Thread Anthony D'Atri
Well sure, if you want to do it the EASY way :rolleyes:

> On Oct 28, 2024, at 1:02 PM, Eugen Block  wrote:
> 
> Or:
> 
> ceph osd find {ID}
> 
> :-)
> 
> Zitat von Robert Sander :
> 
>> On 10/28/24 17:41, Dave Hall wrote:
>> 
>>> However, it would be nice to have something like 'ceph osd location {id}'
>>> from the command line.  If such exists, I haven't seen it.
>> 
>> "ceph osd metadata {id} | jq -r .hostname" will give you the hostname
>> 
>> Regards
>> -- 
>> Robert Sander
>> Heinlein Consulting GmbH
>> Schwedter Str. 8/9b, 10119 Berlin
>> 
>> https://www.heinlein-support.de
>> 
>> Tel: 030 / 405051-43
>> Fax: 030 / 405051-19
>> 
>> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
>> Geschäftsführer: Peer Heinlein - Sitz: Berlin
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Influencing the osd.id when creating or replacing an osd

2024-10-28 Thread Dave Hall
On Mon, Oct 28, 2024 at 9:22 AM Anthony D'Atri 
wrote:

>
>
> > Yes, but it's irritating. Ideally, I'd like my OSD IDs and hostnames to
> track so that if a server going pong I can find it and fix it ASAP
>
> `ceph osd tree down` etc. (including alertmanager rules and Grafana
> panels) arguably make that faster and easier than everyone having to
> memorize OSD numbers, especially as the clusters grow.
>
> > But it doesn't take much maintenance to break that scheme and the only
> thing more painful than renaming a Ceph host is re-numbering an OSD.
>
> Yep!
>
> >> My advice: Do not try to manually number your OSDs.
>
> This.  I’ve been there myself, but it’s truly a sisyphean goal.
>
> As compared to Nautilus, at least the Reef Dashboard has a 'Physical
Devices' page and a Devices tab on the Hosts page that should make it
easier to know which OSD is on which host.  That plus 'ipmitool chassis
identify' and other such tools should make it easy to track down the
correct box and drive bay.

However, it would be nice to have something like 'ceph osd location {id}'
from the command line.  If such exists, I haven't seen it.

> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Influencing the osd.id when creating or replacing an osd

2024-10-28 Thread Eugen Block

Or:

ceph osd find {ID}

:-)

Zitat von Robert Sander :


On 10/28/24 17:41, Dave Hall wrote:


However, it would be nice to have something like 'ceph osd location {id}'
from the command line.  If such exists, I haven't seen it.


"ceph osd metadata {id} | jq -r .hostname" will give you the hostname

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Influencing the osd.id when creating or replacing an osd

2024-10-28 Thread Tim Holloway
Yes, but it's irritating. Ideally, I'd like my OSD IDs and hostnames to 
track so that if a server going pong I can find it and fix it ASAP. But 
it doesn't take much maintenance to break that scheme and the only thing 
more painful than renaming a Ceph host is re-numbering an OSD.


On 10/28/24 06:29, Robert Sander wrote:

On 10/18/24 03:14, Shain Miley wrote:

I know that it really shouldn’t matter what osd number gets assigned 
to the disk but as the number of osd increases it is much easier to 
keep track of where things are if you can control the id when 
replacing failed disks or adding new nodes.


My advice: Do not try to manually number your OSDs.

Use commands like "ceph osd tree-from HOST" to list the OSDs of a host 
or "ceph osd ok-to-stop ID" to see if an OSD may be stopped or "ceph 
osd metadata ID" to see where the OSD is running.


Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Install on Ubuntu Noble on Arm64?

2024-10-28 Thread Alex Closs
Ubuntu noble *is* an LTS release, 24.04

> On Oct 28, 2024, at 06:40, Robert Sander  wrote:
> 
> Hi
> 
>> On 10/25/24 19:57, Daniel Brown wrote:
>> Think I’ve asked this before but — has anyone attempted to use a cephadm 
>> type install with Debian Nobel running on Arm64? Have tried both Reef and 
>> Squid, neither gets very far. Do I need to file a request for it?
> 
> You mean Ubuntu Noble, right?
> 
> For Ceph Squid there are packages available for Ubuntu Jammy and Debian 
> Bookworm: https://download.ceph.com/debian-squid/dists/
> 
> For Ceph Reef there are packages available for Ubuntu Focal, Ubuntu Jammy and 
> Debian Bookworm: https://download.ceph.com/debian-reef/dists/
> 
> I do not think that the Ceph project builds packages for non-LTS distribution 
> versions.
> 
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
> 
> https://www.heinlein-support.de
> 
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
> 
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How Ceph cleans stale object on primary OSD failure?

2024-10-28 Thread Frédéric Nass


- Le 23 Oct 24, à 16:31, Maged Mokhtar mmokh...@petasan.org a écrit :

> On 23/10/2024 16:40, Gregory Farnum wrote:
>> On Wed, Oct 23, 2024 at 5:44 AM Maged Mokhtar 
>> wrote:
>>
>>
>> This is tricky but i think you are correct, B and C will keep the new
>> object copy  and not revert to old object version (or not revert to
>> object removal if it had not existed).
>> To be 100% sure you have to dig into code to verify this.
>>
>> My understanding..what is guaranteed is:
>> 1) On success ack to client: obviously the last version of data
>> written
>> is what will be read back.
>> 2) On either success or failure: data is consistent, if you repeat
>> reads
>> you will get the same data even if the acting set changes.
>> 3) rados objects are transactional at the OSD level, they are either
>> updated to new version or reverted to old version on failure at OSD
>> level. Objects cannot be corrupt half way due to write failure.
>>
>> I think 2) is good enough. Ideally in case of client ack failure
>> you may
>> want the old object version to be reverted back or the object
>> removed in
>> case it had not existed , this is the condition you are asking for
>> B and
>> C to revert.This would be a more complex requirement than 2) and
>> may not
>> be supported or practical to support.
>> I think if  clients want to guarantee data is reverted on failure,
>> clients need to add transactional code themselves, like database
>> journaling...etc. This is at least with block and filesystem clients.
>> These clients would typically have such transactional code anyway to
>> handle physical hardware failures..etc.
>>
>> Even if rados did provide this extra transactional level and had B
>> and C
>> do revert back to old object version, this could still corrupt client
>> data if client write spanned to 2 rados objects, one was success
>> and one
>> failed. Even though both rados objects are consistent, client data is
>> corrupt as some of it is new and some old and so transactional
>> code at
>> the client will always be needed to revert data on failures.
>>
>>
>> All of this is transparent to RADOS users. Written objects will
>> generally remain written, and librados will retry operations (and get
>> the same response on retry as they should have gotten on first submit)
>> without returning errors to the caller.
>> The only way you can get the torn write you describe here is if the
>> client had simultaneous outstanding writes and itself fails while they
>> are in-progress — and you can see this same torn write on a local FS
>> too if there is a crash while simultaneous writes are in progress.
>> -Greg
>>
>>
> Thanks Greg for the clarifications. It makes a lot of sense. So to
> answer the original question, both B and C will have new copy of object,
> A crashes without sending client ack, the librados layer will keep
> retrying writing the same copy again that B and C already have + new
> member D of active set. Client will never receive error, just experience
> delay. It is also interesting if the client itself crashes while
> retrying, still the new version of the object will kept...i think :)
> 

Hi,

In such a scenario where the client crashes and the primary OSD could not send 
it the ACK, will the primary OSD consider the data as orphan (or stale) and 
delete it (or revert to the old copy) and ask B and C to do the same, or will 
this new object or its new version persist in the cluster without the client 
knowing?

Regards,
Frédéric.

> The case of client receiving error is probably corner cases, like
> timeouts at levels above like iSCSI/SMB/NFS gateways if set too low or
> timeouts set at application levels.
> 
> 
>>
>> /maged
>>
>> On 23/10/2024 06:54, Vigneshwar S wrote:
>> > Hi Frédéric,
>> >
>> > 5a section states that that the divergent events would be
>> tracked and
>> > deleted. But in the scenario I’ve mentioned, both B and C will
>> have same
>> > history.
>> >
>> > So if the new primary is B, then when peering, it peers with C
>> and looks at
>> > the history and thinks they’ve no divergent events and keep the
>> object
>> > right?
>> >
>> > Regards,
>> > Vigneshwar
>> >
>> > On Wed, 23 Oct 2024 at 9:09 AM, Frédéric Nass <
>> > frederic.n...@univ-lorraine.fr> wrote:
>> >
>> >> Hi Vigneshwar,
>> >>
>> >> You might want to check '5a' section from the peering process
>> >> documentation [1].
>> >>
>> >> Regards,
>> >> Frédéric.
>> >>
>> >> [1]
>> >>
>> 
>> https://docs.ceph.com/en/reef/dev/peering/#description-of-the-peering-process
>> >>
>> >> --
>> >> *De :* Vigneshwar S 
>> >> *Envoyé :* mardi 22 octobre 2024 11:05
>> >> *À :* ceph-users@ceph.io
>> >> *Objet :* [ceph-users] How

[ceph-users] Re: Ceph native clients

2024-10-28 Thread Tim Holloway
That's unfortunate. For one, it says "some filesystems are more equal 
than others".


Back in ancient days, when dinosaurs used time-sharing, you could mount 
a remote fileystem at login and logout, whether explicit or timeout 
would unmount it.  But virtually nobody time-shares anymore, but lots of 
people hibernate, either because they want to be more energy efficient 
or to prolong the life of a battery.


And unlike a login, awaking from hibernation doesn't run a login script. 
Offhand, I'm not entirely sure if you can easily add any sort of wakeup 
script at the user level.


My system turns off the office fan when it goes to sleep. The timeout 
interval is long enough that I'm probably no longer in the room at that 
point. In theory I could handle unmount/mount there, but I'm uncertain 
that I have the full range of system services at that level, and, as you 
pointed out, there might be open files that would expect to remain open 
when the machine came back up.


Hence my wish that the ceph client and servers - especially linked OSD's 
- could "make their peace". which is to say quiesce their I/O, 
disconnect from the servers and possibly have Ceph mark the session as 
hibernating in order to come back up faster/more intelligently.


Because it's a bloody royal PITA to have to manually mount Ceph every 
time I need to edit or run provisioning or use any of the other myriad 
systems that I keep in Ceph.


   Tim


On 10/28/24 04:54, Burkhard Linke wrote:

Hi,

On 10/26/24 18:45, Tim Holloway wrote:

On the whole, I prefer to use NFS for my clients to use Ceph
filesystem. It has the advantage that NFS client/mount is practically
guaranteed to be pre-installed on all my client systems.

On the other hand, there are downsides. NFS (Ceph/NFS-Ganesha) has been
known to be cranky on my network when it comes to serving resources on
occasion, and should the NFS host go down, the NFS client cannot
fallback to the alternate NFS host without assistance. Which I should
probably add, since I run keepalive anyway, but that's another matter.

My issue, then is that if I do a native/FUSE mount of Ceph FS on my
desktop it hurts Ceph. The desktop hibernates when not in use and that
causes the network communications between it and the Ceph servers to
break until it wakes up again. In the mean time, I get "laggy OSD"
errors.

In extreme cases, where the desktop doesn't wake up clean and requires
a hardware reset, I think it even ends up with orphan connections,
since at one point I recall having to forcible terminate close to 20
connections.

Ideally, then, since systemd is handling hibernation, there should be a
hook in the Ceph client to ensure it and the Ceph servers are at peace
with each other while it's down. Probably better reporting on the Ceph
admin side as to the hostnames of clients with connection issues, but
that's just icing.



Do not use a network file system with hibernating machines. It will 
not work reliably.


Both NFS and especially CephFS can delegate responsibility to the 
clients. In case of NFSv4 these are the delegates, in case of CephFS 
it are cephfs capabilities. So the filesystem needs to have a back 
channel to the clients in case capabilities / delegates have to be 
revoked.


This won't work if the client is not reachable. Access to the 
files/directory protected by a capability might not be possible 
depending on the type of capability granted. E.g. in case of a write 
capability read access might be stalled; in case of a read capability 
any other client requesting write access will be blocked.


To make things worse, the cephfs capabilities also cover cached files. 
So even files not actively used by a client might be blocked


A hook might reduce the impact of the problem, but it won't resolve 
it. At least the mountpoint itself will be as active capability on the 
client. This is the cap list of a client without any active file 
access and after dropping the caches:



# cat caps
total        8193
avail        8192
used        1
reserved    0
min        8192

ino  mds  issued   implemented
--
0x1    0  pAsLsXsFs    pAsLsXsFs


Waiters:

tgid ino    need want
-

(see /sys/kernel/debug/ceph/./caps)


TL;DR: do not use cephfs with hibernating clients.


We had the same problem with our desktops a while ago, and decided to 
switch to a NFS re-export of the CephFS filesystem. This has proven to 
be much more reliable in case of hibernation. But as you already 
mentioned NFS also has other problems...



Regards,

Burkhard Linke

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-use