date:20220429

[ceph-users] ceph Nautilus: device health management, no infos in: ceph device ls

2022-04-29 Thread Rainer Krienke


Hello,

I run a ceph Nautilus 14.2.22 cluster with 144 OSDs. In order to be able 
to see if a disk has hardware trouble and might fail soon I activated 
health management. The cluster is running on Ubuntu 18.04 and the first 
task was to install a newer smartctl version. I used smartctl 7.0.


Device monitoring ist activated (ceph device monitoring on). Using ceph 
device get-health-metrics  I see the results of smartctl runs 
for the device with the given ID like this:



 "product": "ST4000NM0295",
"revision": "DT31",
"rotation_rate": 7200,
"scsi_error_counter_log": {
"read": {
"correction_algorithm_invocations": 20,
"errors_corrected_by_eccdelayed": 20,
"errors_corrected_by_eccfast": 3457558131,


So this seems to run just fine. For failure prediction I selected the 
"local" method (ceph config set global device_failure_prediction_mode 
local).


Whats missing for me is the prediction output in ceph device ls. The 
column  "LIFE EXPECTANCY" is always empty and I have no idea why:


# ceph device ls
DEVICEHOST:DEV  DAEMONS LIFE EXPECTANCY
SEAGATE_ST4000NM017A_WS23WKJ4 ceph4:sdb osd.49
SEAGATE_ST4000NM0295_ZC13XK9P ceph6:sdo osd.92
SEAGATE_ST4000NM0295_ZC141B3S ceph6:sdj osd.89


Anyone an idea what might be missing in my setup? Is the "LIFE 
EXPECTANCY" perhaps only populated if the local predictor predicts a 
failure or should I find something like "good" there if the disk is ok 
for the moment? Recently I even had a disk that died but I did not see 
anything in ceph-device ls for the died OSD-disk. So I am really unsure 
if failure prediction is working at all on my ceph system?


Thanks
Rainer

--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
Web: http://userpages.uni-koblenz.de/~krienke
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph on 2 servers

2022-04-29 Thread Robert Sander


Am 29.04.22 um 10:57 schrieb Александр Пивушков:


Hello, is there any theoretical possibility to use ceph on two servers? It is 
necessary that ceph works when one of any of the servers fails. Each server 
only has 2 SSDs for ceph.


With only two servers I would look at DRBD, not Ceph.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RGW/S3 losing multipart upload objects

2022-04-29 Thread Ulrich Klein

Hi,

I just tried again on a Quincy 17.2.0.
Same procedure, same problem. 
I just wonder if nobody else sees that problem?

Ciao, Uli

> On 18. 03 2022, at 12:18, Ulrich Klein  wrote:
> 
> I tried it on a mini-cluster (4 Raspberries) with 16.2.7. 
> Same procedure, same effect. I just can’t get rid of these objects.
> 
> Is there any method that would allow me to delete these objects without 
> damaging RGW?
> 
> Ciao, Uli 
> 
>> On 17. 03 2022, at 15:30, Soumya Koduri  wrote:
>> 
>> On 3/17/22 17:16, Ulrich Klein wrote:
>>> Hi,
>>> 
>>> My second attempt to get help with a problem I'm trying to solve for about 
>>> 6 month now.
>>> 
>>> I have a Ceph 16.2.6 test cluster, used almost exclusively for providing 
>>> RGW/S3 service. similar to a production cluster.
>>> 
>>> The problem I have is this:
>>> A client uploads (via S3) a bunch of large files into a bucket via 
>>> multiparts
>>> The upload(s) get interrupted and retried
>>> In the end from a client's perspective all the files are visible and 
>>> everything looks fine.
>>> But on the cluster there are many more objects in the buckets
>>> Even after cleaning out the incomplete multipart uploads there are too many 
>>> objects
>>> Even after deleting all the visible objects from the bucket there are still 
>>> objects in the bucket
>>> I have so far found no way to get rid of those left-over objects.
>>> It's screwing up space accounting and I'm afraid I'll eventually have a 
>>> cluster full of those lost objects.
>>> The only way to clean up seems to be to copy te contents of a bucket to a 
>>> new bucket and delete the screwed-up bucket. But on a production system 
>>> that's not always a real option.
>>> 
>>> I've found a variety of older threads that describe a similar problem. None 
>>> of them decribing a solution :(
>>> 
>>> 
>>> 
>>> I can pretty easily reproduce the problem with this sequence:
>>> 
>>> On a client system create a directory with ~30 200MB files. (On a faster 
>>> system I'd probably need bigger or more files)
>>> tstfiles/tst01 - tst29
>>> 
>>> run
>>> $ rclone mkdir tester:/test-bucket # creates a bucket on the test system 
>>> with user tester
>>> Run
>>> $ rclone sync -v tstfiles tester:/test-bucket/tstfiles
>>> a couple of times (6-8), interrupting each one via CNTRL-C
>>> Eventually let one finish.
>>> 
>>> Now I can use s3cmd to see all the files:
>>> $ s3cmd ls -lr s3://test-bucket/tstfiles
>>> 2022-03-16 17:11   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD 
>>> s3://test-bucket/tstfiles/tst01
>>> ...
>>> 2022-03-16 17:13   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD 
>>> s3://test-bucket/tstfiles/tst29
>>> 
>>> ... and to list incomplete uploads:
>>> $ s3cmd multipart s3://test-bucket
>>> s3://test-bucket/
>>> Initiated   PathId
>>> 2022-03-16T17:11:19.074Zs3://test-bucket/tstfiles/tst05 
>>> 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>>> ...
>>> 2022-03-16T17:12:41.583Zs3://test-bucket/tstfiles/tst28 
>>> 2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa
>>> 
>>> I can abort the uploads with
>>> $  s3cmd abortmp s3://test-bucket/tstfiles/tst05 
>>> 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>>> ...
>> 
>> 
>> 
>> On the latest master, I see that these objects are deleted immediately post 
>> abortmp. I believe this issue may have beenn fixed as part of [1], 
>> backported to v16.2.7 [2]. Maybe you could try upgrading your cluster and 
>> recheck.
>> 
>> 
>> Thanks,
>> 
>> Soumya
>> 
>> 
>> [1] https://tracker.ceph.com/issues/53222
>> 
>> [2] https://tracker.ceph.com/issues/53291
>> 
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: df shows wrong size of cephfs share when a subdirectory is mounted

2022-04-29 Thread Luís Henriques

On Fri, Apr 22, 2022 at 03:39:04PM +0100, Luís Henriques wrote:
> On Thu, Apr 21, 2022 at 08:53:48PM +, Ryan Taylor wrote:
> > 
> > Hi  Luís,
> > 
> > I did just that:
> > 
> > [fedora@cephtest ~]$ sudo ./debug.sh 
> ...
> > [94831.006412] ceph:  release inode 3bb3ccb2 dir file 
> > b0b84d82
> > [94831.006573] ceph:  do_getattr inode 3bb3ccb2 mask AsXsFs mode 
> > 040755
> > [94831.006575] ceph:  __ceph_caps_issued_mask ino 0x1001b45c2fa cap 
> > 0cde56f9 issued pAsLsXsFs (mask AsXsFs)
> > [94831.006576] ceph:  __touch_cap 3bb3ccb2 cap 0cde56f9 mds0
> > [94831.006581] ceph:  statfs
> 
> 
> OK, this was the point where I expected to see something useful.
> Unfortunately, it looks like the quota code doesn't have good enough debug
> info here :-(
> 
> I've spent a lot of hours today trying to reproduce it: recompiled
> v14.4.22, tried Fedora as a client, but nothing.  I should be able to
> debug this problem but I'd need to be able to reproduce the issue.
> 
> I'm adding Jeff and Xiubo to CC, maybe they have some further ideas.  I
> must confess I'm clueless at this point.

I *think* I've figured it out (and in between I fixed a somewhat related
bug in the kernel client quotas code).  I've update the tracker [1] with
what I've found.  The TL;DR is that this is a mix of Linux security
modules with authentication capabilities configuration.  Please have a
look at the comments there and see if any of the workarounds work for
you.

[1] https://tracker.ceph.com/issues/55090

Cheers,
--
Luís

> 
> Cheers,
> --
> Luís
> 
> > 
> > Thanks,
> > -rt
> > 
> > Ryan Taylor
> > Research Computing Specialist
> > Research Computing Services, University Systems
> > University of Victoria
> > 
> > 
> > From: Luís Henriques 
> > Sent: April 21, 2022 1:35 PM
> > To: Ryan Taylor
> > Cc: Hendrik Peyerl; Ramana Venkatesh Raja; ceph-users@ceph.io
> > Subject: Re: [ceph-users] Re: df shows wrong size of cephfs share when a 
> > subdirectory is mounted
> > 
> > Notice: This message was sent from outside the University of Victoria email 
> > system. Please be cautious with links and sensitive information.
> > 
> > 
> > On Thu, Apr 21, 2022 at 07:28:19PM +, Ryan Taylor wrote:
> > >
> > >  Hi Luís,
> > >
> > > dmesg looks normal I think:
> > 
> > Yep, I don't see anything suspicious either.
> > 
> > >
> > > [  265.269450] Key type ceph registered
> > > [  265.270914] libceph: loaded (mon/osd proto 15/24)
> > > [  265.303764] FS-Cache: Netfs 'ceph' registered for caching
> > > [  265.305460] ceph: loaded (mds proto 32)
> > > [  265.513616] libceph: mon0 (1)10.30.201.3:6789 session established
> > > [  265.520982] libceph: client3734313 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [  265.539710] ceph: mds0 rejected session
> > > [  265.544592] libceph: mon1 (1)10.30.202.3:6789 session established
> > > [  265.549564] libceph: client3698116 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [  265.552624] ceph: mds0 rejected session
> > > [  316.849402] libceph: mon0 (1)10.30.201.3:6789 session established
> > > [  316.855077] libceph: client3734316 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [  316.886834] ceph: mds0 rejected session
> > > [  372.064685] libceph: mon2 (1)10.30.203.3:6789 session established
> > > [  372.068731] libceph: client3708026 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [  372.071651] ceph: mds0 rejected session
> > > [  372.074641] libceph: mon0 (1)10.30.201.3:6789 session established
> > > [  372.080435] libceph: client3734319 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [  372.083270] ceph: mds0 rejected session
> > > [  443.855530] libceph: mon2 (1)10.30.203.3:6789 session established
> > > [  443.863231] libceph: client3708029 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [  555.889186] libceph: mon2 (1)10.30.203.3:6789 session established
> > > [  555.893677] libceph: client3708032 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [ 1361.181405] libceph: mon0 (1)10.30.201.3:6789 session established
> > > [ 1361.187230] libceph: client3734325 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [ 1415.463391] libceph: mon2 (1)10.30.203.3:6789 session established
> > > [ 1415.467663] libceph: client3708038 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [ 2018.707478] libceph: mon0 (1)10.30.201.3:6789 session established
> > > [ 2018.712834] libceph: client3734337 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [ 2276.564841] libceph: mon1 (1)10.30.202.3:6789 session established
> > > [ 2276.568899] libceph: client3698128 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [ 2435.596579] libceph: mon2 (1)10.30.203.3:6789 session established
> > > [ 2435.600599] libceph: client3708050 fsid 
> > > 50004482-d5e3-4b76-9a4c-abd0626c9882
> > > [89805.777644] libceph: mon0 (1)10.30.201.3:6789 session established
> > > [89805.782455] libceph: clien

[ceph-users] Re: Upgrading Ceph from 17.0 to 17.2 with cephadm orch

2022-04-29 Thread Dominique Ramaekers

Hi,

I never got a reply on my question. I can't seem to find how I upgrade the 
cephadm shell docker container.

Any ideas?

Greetings,

Dominique.


> -Oorspronkelijk bericht-
> Van: Dominique Ramaekers 
> Verzonden: woensdag 27 april 2022 11:24
> Aan: ceph-users@ceph.io
> Onderwerp: [ceph-users] Upgrading Ceph from 17.0 to 17.2 with cephadm
> orch
> 
> Hi,
> 
> I've upgraded my cluster using 'ceph orch upgrade start --image
> quay.io/ceph/ceph:v17' in cephadm shell.
> 
> All went great. 'Ceph tell osd.N version' reports the updated version 17.2.0
> (Quincy, Stable).
> 
> Only it seems that the ceph docker image on the initiated host is updated,
> the other aren't.
> 
> 'ceph -v' on host 1: ceph version 17.2.0
> (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)
> 
> 'ceph -v' on other hosts: ceph version 17.0.0-11466-g05d49126
> (05d4912683434694ddcdd683773ee5a3e0466249) quincy (dev)
> 
> Initiating the upgrade on the other hosts doesn't upgrade the ceph docker
> image...
> 
> Please advise.
> 
> Thanks in advance.
> 
> Dominique.
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email
> to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgrading Ceph from 17.0 to 17.2 with cephadm orch

2022-04-29 Thread Neha Ojha

Can you check what "ceph versions" reports?

On Fri, Apr 29, 2022 at 9:15 AM Dominique Ramaekers
 wrote:
>
> Hi,
>
> I never got a reply on my question. I can't seem to find how I upgrade the 
> cephadm shell docker container.
>
> Any ideas?
>
> Greetings,
>
> Dominique.
>
>
> > -Oorspronkelijk bericht-
> > Van: Dominique Ramaekers 
> > Verzonden: woensdag 27 april 2022 11:24
> > Aan: ceph-users@ceph.io
> > Onderwerp: [ceph-users] Upgrading Ceph from 17.0 to 17.2 with cephadm
> > orch
> >
> > Hi,
> >
> > I've upgraded my cluster using 'ceph orch upgrade start --image
> > quay.io/ceph/ceph:v17' in cephadm shell.
> >
> > All went great. 'Ceph tell osd.N version' reports the updated version 17.2.0
> > (Quincy, Stable).
> >
> > Only it seems that the ceph docker image on the initiated host is updated,
> > the other aren't.
> >
> > 'ceph -v' on host 1: ceph version 17.2.0
> > (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)
> >
> > 'ceph -v' on other hosts: ceph version 17.0.0-11466-g05d49126
> > (05d4912683434694ddcdd683773ee5a3e0466249) quincy (dev)
> >
> > Initiating the upgrade on the other hosts doesn't upgrade the ceph docker
> > image...
> >
> > Please advise.
> >
> > Thanks in advance.
> >
> > Dominique.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email
> > to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph Nautilus: device health management, no infos in: ceph device ls

[ceph-users] Re: ceph on 2 servers

[ceph-users] Re: RGW/S3 losing multipart upload objects

[ceph-users] Re: df shows wrong size of cephfs share when a subdirectory is mounted

[ceph-users] Re: Upgrading Ceph from 17.0 to 17.2 with cephadm orch

[ceph-users] Re: Upgrading Ceph from 17.0 to 17.2 with cephadm orch

6 matches

Site Navigation

Mail list logo

Footer information