[ceph-users] ceph Nautilus: device health management, no infos in: ceph device ls
Hello, I run a ceph Nautilus 14.2.22 cluster with 144 OSDs. In order to be able to see if a disk has hardware trouble and might fail soon I activated health management. The cluster is running on Ubuntu 18.04 and the first task was to install a newer smartctl version. I used smartctl 7.0. Device monitoring ist activated (ceph device monitoring on). Using ceph device get-health-metrics I see the results of smartctl runs for the device with the given ID like this: "product": "ST4000NM0295", "revision": "DT31", "rotation_rate": 7200, "scsi_error_counter_log": { "read": { "correction_algorithm_invocations": 20, "errors_corrected_by_eccdelayed": 20, "errors_corrected_by_eccfast": 3457558131, So this seems to run just fine. For failure prediction I selected the "local" method (ceph config set global device_failure_prediction_mode local). Whats missing for me is the prediction output in ceph device ls. The column "LIFE EXPECTANCY" is always empty and I have no idea why: # ceph device ls DEVICEHOST:DEV DAEMONS LIFE EXPECTANCY SEAGATE_ST4000NM017A_WS23WKJ4 ceph4:sdb osd.49 SEAGATE_ST4000NM0295_ZC13XK9P ceph6:sdo osd.92 SEAGATE_ST4000NM0295_ZC141B3S ceph6:sdj osd.89 Anyone an idea what might be missing in my setup? Is the "LIFE EXPECTANCY" perhaps only populated if the local predictor predicts a failure or should I find something like "good" there if the disk is ok for the moment? Recently I even had a disk that died but I did not see anything in ceph-device ls for the died OSD-disk. So I am really unsure if failure prediction is working at all on my ceph system? Thanks Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312 Web: http://userpages.uni-koblenz.de/~krienke PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph on 2 servers
Am 29.04.22 um 10:57 schrieb Александр Пивушков: Hello, is there any theoretical possibility to use ceph on two servers? It is necessary that ceph works when one of any of the servers fails. Each server only has 2 SSDs for ceph. With only two servers I would look at DRBD, not Ceph. Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 220009 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RGW/S3 losing multipart upload objects
Hi, I just tried again on a Quincy 17.2.0. Same procedure, same problem. I just wonder if nobody else sees that problem? Ciao, Uli > On 18. 03 2022, at 12:18, Ulrich Klein wrote: > > I tried it on a mini-cluster (4 Raspberries) with 16.2.7. > Same procedure, same effect. I just can’t get rid of these objects. > > Is there any method that would allow me to delete these objects without > damaging RGW? > > Ciao, Uli > >> On 17. 03 2022, at 15:30, Soumya Koduri wrote: >> >> On 3/17/22 17:16, Ulrich Klein wrote: >>> Hi, >>> >>> My second attempt to get help with a problem I'm trying to solve for about >>> 6 month now. >>> >>> I have a Ceph 16.2.6 test cluster, used almost exclusively for providing >>> RGW/S3 service. similar to a production cluster. >>> >>> The problem I have is this: >>> A client uploads (via S3) a bunch of large files into a bucket via >>> multiparts >>> The upload(s) get interrupted and retried >>> In the end from a client's perspective all the files are visible and >>> everything looks fine. >>> But on the cluster there are many more objects in the buckets >>> Even after cleaning out the incomplete multipart uploads there are too many >>> objects >>> Even after deleting all the visible objects from the bucket there are still >>> objects in the bucket >>> I have so far found no way to get rid of those left-over objects. >>> It's screwing up space accounting and I'm afraid I'll eventually have a >>> cluster full of those lost objects. >>> The only way to clean up seems to be to copy te contents of a bucket to a >>> new bucket and delete the screwed-up bucket. But on a production system >>> that's not always a real option. >>> >>> I've found a variety of older threads that describe a similar problem. None >>> of them decribing a solution :( >>> >>> >>> >>> I can pretty easily reproduce the problem with this sequence: >>> >>> On a client system create a directory with ~30 200MB files. (On a faster >>> system I'd probably need bigger or more files) >>> tstfiles/tst01 - tst29 >>> >>> run >>> $ rclone mkdir tester:/test-bucket # creates a bucket on the test system >>> with user tester >>> Run >>> $ rclone sync -v tstfiles tester:/test-bucket/tstfiles >>> a couple of times (6-8), interrupting each one via CNTRL-C >>> Eventually let one finish. >>> >>> Now I can use s3cmd to see all the files: >>> $ s3cmd ls -lr s3://test-bucket/tstfiles >>> 2022-03-16 17:11 200M ecb28853bd18eeae185b0b12bd47333c-40 STANDARD >>> s3://test-bucket/tstfiles/tst01 >>> ... >>> 2022-03-16 17:13 200M ecb28853bd18eeae185b0b12bd47333c-40 STANDARD >>> s3://test-bucket/tstfiles/tst29 >>> >>> ... and to list incomplete uploads: >>> $ s3cmd multipart s3://test-bucket >>> s3://test-bucket/ >>> Initiated PathId >>> 2022-03-16T17:11:19.074Zs3://test-bucket/tstfiles/tst05 >>> 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g >>> ... >>> 2022-03-16T17:12:41.583Zs3://test-bucket/tstfiles/tst28 >>> 2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa >>> >>> I can abort the uploads with >>> $ s3cmd abortmp s3://test-bucket/tstfiles/tst05 >>> 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g >>> ... >> >> >> >> On the latest master, I see that these objects are deleted immediately post >> abortmp. I believe this issue may have beenn fixed as part of [1], >> backported to v16.2.7 [2]. Maybe you could try upgrading your cluster and >> recheck. >> >> >> Thanks, >> >> Soumya >> >> >> [1] https://tracker.ceph.com/issues/53222 >> >> [2] https://tracker.ceph.com/issues/53291 >> >> >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: df shows wrong size of cephfs share when a subdirectory is mounted
On Fri, Apr 22, 2022 at 03:39:04PM +0100, Luís Henriques wrote: > On Thu, Apr 21, 2022 at 08:53:48PM +, Ryan Taylor wrote: > > > > Hi Luís, > > > > I did just that: > > > > [fedora@cephtest ~]$ sudo ./debug.sh > ... > > [94831.006412] ceph: release inode 3bb3ccb2 dir file > > b0b84d82 > > [94831.006573] ceph: do_getattr inode 3bb3ccb2 mask AsXsFs mode > > 040755 > > [94831.006575] ceph: __ceph_caps_issued_mask ino 0x1001b45c2fa cap > > 0cde56f9 issued pAsLsXsFs (mask AsXsFs) > > [94831.006576] ceph: __touch_cap 3bb3ccb2 cap 0cde56f9 mds0 > > [94831.006581] ceph: statfs > > > OK, this was the point where I expected to see something useful. > Unfortunately, it looks like the quota code doesn't have good enough debug > info here :-( > > I've spent a lot of hours today trying to reproduce it: recompiled > v14.4.22, tried Fedora as a client, but nothing. I should be able to > debug this problem but I'd need to be able to reproduce the issue. > > I'm adding Jeff and Xiubo to CC, maybe they have some further ideas. I > must confess I'm clueless at this point. I *think* I've figured it out (and in between I fixed a somewhat related bug in the kernel client quotas code). I've update the tracker [1] with what I've found. The TL;DR is that this is a mix of Linux security modules with authentication capabilities configuration. Please have a look at the comments there and see if any of the workarounds work for you. [1] https://tracker.ceph.com/issues/55090 Cheers, -- Luís > > Cheers, > -- > Luís > > > > > Thanks, > > -rt > > > > Ryan Taylor > > Research Computing Specialist > > Research Computing Services, University Systems > > University of Victoria > > > > > > From: Luís Henriques > > Sent: April 21, 2022 1:35 PM > > To: Ryan Taylor > > Cc: Hendrik Peyerl; Ramana Venkatesh Raja; ceph-users@ceph.io > > Subject: Re: [ceph-users] Re: df shows wrong size of cephfs share when a > > subdirectory is mounted > > > > Notice: This message was sent from outside the University of Victoria email > > system. Please be cautious with links and sensitive information. > > > > > > On Thu, Apr 21, 2022 at 07:28:19PM +, Ryan Taylor wrote: > > > > > > Hi Luís, > > > > > > dmesg looks normal I think: > > > > Yep, I don't see anything suspicious either. > > > > > > > > [ 265.269450] Key type ceph registered > > > [ 265.270914] libceph: loaded (mon/osd proto 15/24) > > > [ 265.303764] FS-Cache: Netfs 'ceph' registered for caching > > > [ 265.305460] ceph: loaded (mds proto 32) > > > [ 265.513616] libceph: mon0 (1)10.30.201.3:6789 session established > > > [ 265.520982] libceph: client3734313 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 265.539710] ceph: mds0 rejected session > > > [ 265.544592] libceph: mon1 (1)10.30.202.3:6789 session established > > > [ 265.549564] libceph: client3698116 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 265.552624] ceph: mds0 rejected session > > > [ 316.849402] libceph: mon0 (1)10.30.201.3:6789 session established > > > [ 316.855077] libceph: client3734316 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 316.886834] ceph: mds0 rejected session > > > [ 372.064685] libceph: mon2 (1)10.30.203.3:6789 session established > > > [ 372.068731] libceph: client3708026 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 372.071651] ceph: mds0 rejected session > > > [ 372.074641] libceph: mon0 (1)10.30.201.3:6789 session established > > > [ 372.080435] libceph: client3734319 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 372.083270] ceph: mds0 rejected session > > > [ 443.855530] libceph: mon2 (1)10.30.203.3:6789 session established > > > [ 443.863231] libceph: client3708029 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 555.889186] libceph: mon2 (1)10.30.203.3:6789 session established > > > [ 555.893677] libceph: client3708032 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 1361.181405] libceph: mon0 (1)10.30.201.3:6789 session established > > > [ 1361.187230] libceph: client3734325 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 1415.463391] libceph: mon2 (1)10.30.203.3:6789 session established > > > [ 1415.467663] libceph: client3708038 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 2018.707478] libceph: mon0 (1)10.30.201.3:6789 session established > > > [ 2018.712834] libceph: client3734337 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 2276.564841] libceph: mon1 (1)10.30.202.3:6789 session established > > > [ 2276.568899] libceph: client3698128 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [ 2435.596579] libceph: mon2 (1)10.30.203.3:6789 session established > > > [ 2435.600599] libceph: client3708050 fsid > > > 50004482-d5e3-4b76-9a4c-abd0626c9882 > > > [89805.777644] libceph: mon0 (1)10.30.201.3:6789 session established > > > [89805.782455] libceph: clien
[ceph-users] Re: Upgrading Ceph from 17.0 to 17.2 with cephadm orch
Hi, I never got a reply on my question. I can't seem to find how I upgrade the cephadm shell docker container. Any ideas? Greetings, Dominique. > -Oorspronkelijk bericht- > Van: Dominique Ramaekers > Verzonden: woensdag 27 april 2022 11:24 > Aan: ceph-users@ceph.io > Onderwerp: [ceph-users] Upgrading Ceph from 17.0 to 17.2 with cephadm > orch > > Hi, > > I've upgraded my cluster using 'ceph orch upgrade start --image > quay.io/ceph/ceph:v17' in cephadm shell. > > All went great. 'Ceph tell osd.N version' reports the updated version 17.2.0 > (Quincy, Stable). > > Only it seems that the ceph docker image on the initiated host is updated, > the other aren't. > > 'ceph -v' on host 1: ceph version 17.2.0 > (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable) > > 'ceph -v' on other hosts: ceph version 17.0.0-11466-g05d49126 > (05d4912683434694ddcdd683773ee5a3e0466249) quincy (dev) > > Initiating the upgrade on the other hosts doesn't upgrade the ceph docker > image... > > Please advise. > > Thanks in advance. > > Dominique. > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email > to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Upgrading Ceph from 17.0 to 17.2 with cephadm orch
Can you check what "ceph versions" reports? On Fri, Apr 29, 2022 at 9:15 AM Dominique Ramaekers wrote: > > Hi, > > I never got a reply on my question. I can't seem to find how I upgrade the > cephadm shell docker container. > > Any ideas? > > Greetings, > > Dominique. > > > > -Oorspronkelijk bericht- > > Van: Dominique Ramaekers > > Verzonden: woensdag 27 april 2022 11:24 > > Aan: ceph-users@ceph.io > > Onderwerp: [ceph-users] Upgrading Ceph from 17.0 to 17.2 with cephadm > > orch > > > > Hi, > > > > I've upgraded my cluster using 'ceph orch upgrade start --image > > quay.io/ceph/ceph:v17' in cephadm shell. > > > > All went great. 'Ceph tell osd.N version' reports the updated version 17.2.0 > > (Quincy, Stable). > > > > Only it seems that the ceph docker image on the initiated host is updated, > > the other aren't. > > > > 'ceph -v' on host 1: ceph version 17.2.0 > > (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable) > > > > 'ceph -v' on other hosts: ceph version 17.0.0-11466-g05d49126 > > (05d4912683434694ddcdd683773ee5a3e0466249) quincy (dev) > > > > Initiating the upgrade on the other hosts doesn't upgrade the ceph docker > > image... > > > > Please advise. > > > > Thanks in advance. > > > > Dominique. > > ___ > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email > > to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io