[ceph-users] Re: replacing OSD nodes

2022-07-20 Thread Janne Johansson
Den tis 19 juli 2022 kl 13:09 skrev Jesper Lykkegaard Karlsen :
>
> Hi all,
> Setup: Octopus - erasure 8-3
> I had gotten to the point where I had some rather old OSD nodes, that I 
> wanted to replace with new ones.
> The procedure was planned like this:
>
>   *   add new replacement OSD nodes
>   *   set all OSDs on the retiring nodes to out.
>   *   wait for everything to rebalance
>   *   remove retiring nodes

> After around 50% misplaced objects remaining, the OSDs started to complain 
> about backfillfull OSDs and nearfull OSDs.
> A bit of a surprise to me, as RAW size is only 47% used.
> It seems that rebalancing does not happen in a prioritized manner, where 
> planed backfill starts with the OSD with most space available space, but 
> "alphabetically" according to pg-name.
> Is this really true?

I don't know if it does it in any particular order, just that it
certainly doesn't fire off requests to the least filled OSD to receive
data first, so when I have gotten into similar situations, it just
tried to run as many moves as possible given max_backfill and all
that, then some/most might get stuck in toofull, but as the rest of
the slots progress, space gets available and at some point those
toofull ones get handled. It delays the completion but hasn't caused
me any other specific problems.

Though I will admit I have used "ceph osd reweight osd.123
" at times to force emptying of some OSDs, but that was
more my impatience than anything else.


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: replacing OSD nodes

2022-07-20 Thread Jesper Lykkegaard Karlsen
Thanks for you answer Janne.

Yes, I am also running "ceph osd reweight" on the "nearfull" osds, once they 
get too close for comfort.

But I just though a continuous prioritization of rebalancing PGs, could make 
this process more smooth, with less/no need for handheld operations.

Best,
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203


Fra: Janne Johansson 
Sendt: 20. juli 2022 10:47
Til: Jesper Lykkegaard Karlsen 
Cc: ceph-users@ceph.io 
Emne: Re: [ceph-users] replacing OSD nodes

Den tis 19 juli 2022 kl 13:09 skrev Jesper Lykkegaard Karlsen :
>
> Hi all,
> Setup: Octopus - erasure 8-3
> I had gotten to the point where I had some rather old OSD nodes, that I 
> wanted to replace with new ones.
> The procedure was planned like this:
>
>   *   add new replacement OSD nodes
>   *   set all OSDs on the retiring nodes to out.
>   *   wait for everything to rebalance
>   *   remove retiring nodes

> After around 50% misplaced objects remaining, the OSDs started to complain 
> about backfillfull OSDs and nearfull OSDs.
> A bit of a surprise to me, as RAW size is only 47% used.
> It seems that rebalancing does not happen in a prioritized manner, where 
> planed backfill starts with the OSD with most space available space, but 
> "alphabetically" according to pg-name.
> Is this really true?

I don't know if it does it in any particular order, just that it
certainly doesn't fire off requests to the least filled OSD to receive
data first, so when I have gotten into similar situations, it just
tried to run as many moves as possible given max_backfill and all
that, then some/most might get stuck in toofull, but as the rest of
the slots progress, space gets available and at some point those
toofull ones get handled. It delays the completion but hasn't caused
me any other specific problems.

Though I will admit I have used "ceph osd reweight osd.123
" at times to force emptying of some OSDs, but that was
more my impatience than anything else.


--
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rh8 krbd mapping causes no match of type 1 in addrvec problem decoding monmap, -2

2022-07-20 Thread Ilya Dryomov
On Tue, Jul 19, 2022 at 9:55 PM Wesley Dillingham  
wrote:
>
>
> Thanks.
>
> Interestingly the older kernel did not have a problem with it but the newer 
> kernel does.

The older kernel can't communicate via v2 protocol so it doesn't (need
to) distinguish v1 and v2 addresses.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Quincy: cephfs "df" used 6x higher than "du"

2022-07-20 Thread Jake Grimmett

Dear All,

We have just built a new cluster using Quincy 17.2.1

After copying ~25TB to the cluster (from a mimic cluster), we see 152 TB 
used, which is ~6x disparity.


Is this just a ceph accounting error, or is space being wasted?

[root@wilma-s1 ~]# du -sh /cephfs2/users
24T /cephfs2/users

[root@wilma-s1 ~]# ls -lhd /cephfs2/users
drwxr-xr-x 240 root root 24T Jul 19 12:09 /cephfs2/users

[root@wilma-s1 ~]# df -h /cephfs2/users
Filesystem  Size  Used Avail Use% Mounted on
(SNIP):/7.1P  152T  6.9P   3% /cephfs2

root@wilma-s1 ~]# ceph df
--- RAW STORAGE ---
CLASS SIZEAVAIL USED  RAW USED  %RAW USED
hdd7.0 PiB  6.9 PiB  151 TiB   151 TiB   2.10
ssd2.7 TiB  2.7 TiB   11 GiB11 GiB   0.38
TOTAL  7.0 PiB  6.9 PiB  151 TiB   151 TiB   2.10

--- POOLS ---
POOL ID   PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
.mgr 2132   90 MiB   24  270 MiB  02.2 PiB
mds_ssd  2232  1.0 GiB   73.69k  3.0 GiB   0.11881 GiB
ec82pool 23  4096   20 TiB6.28M   25 TiB   0.385.2 PiB
primary_fs_data  2432  0 B1.45M  0 B  0881 GiB


cephfs is using a 8+2 erasure coded data pool (hdd with NVMe db/wal), 
and a 3x replicated default data pool (primary_fs_data - NVMe)


bluestore_min_alloc_size_hdd is 4096
ceph pool set ec82pool compression_algorithm lz4
ceph osd pool set ec82pool compression_mode aggressive

many thanks for any help

Jake

--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy: cephfs "df" used 6x higher than "du"

2022-07-20 Thread Jake Grimmett

Dear All,

Just noticed that ceph osd df shows "Raw Use" of ~360 GiB per OSD, with 
65GiB Data stored, see below.


Is the disparity between du and df due to low level OSD data structures 
(?) consuming a large proportion of space (~300GiB per OSD, 130TB 
total), compared to the 25TB of actual data?


If so, should we expect the disparity in used space to decrease as we 
store more data on the cluster?



[root@wilma-s1 ~]# ceph osd df | head -10
ID   CLASS  WEIGHTREWEIGHT  SIZE RAW USE  DATA OMAP META 
 AVAIL%USE  VAR   PGS  STATUS
 52hdd  16.66209   1.0   17 TiB  365 GiB   67 GiB2 KiB 
960 MiB   16 TiB  2.14  1.01   99  up
 53hdd  16.66209   1.0   17 TiB  365 GiB   67 GiB2 KiB 
904 MiB   16 TiB  2.14  1.01   99  up
 54hdd  16.66209   1.0   17 TiB  360 GiB   62 GiB2 KiB 
878 MiB   16 TiB  2.11  1.00   92  up
 55hdd  16.66209   1.0   17 TiB  362 GiB   64 GiB2 KiB 
838 MiB   16 TiB  2.12  1.00   96  up
 56hdd  16.66209   1.0   17 TiB  365 GiB   67 GiB1 KiB 
855 MiB   16 TiB  2.14  1.01   99  up
 57hdd  16.66209   1.0   17 TiB  359 GiB   61 GiB2 KiB 
915 MiB   16 TiB  2.11  0.99   92  up
 58hdd  16.66209   1.0   17 TiB  361 GiB   63 GiB2 KiB 
853 MiB   16 TiB  2.11  1.00   93  up
 59hdd  16.66209   1.0   17 TiB  359 GiB   61 GiB1 KiB 
815 MiB   16 TiB  2.11  0.99   91  up
 60hdd  16.66209   1.0   17 TiB  365 GiB   67 GiB1 KiB 
914 MiB   16 TiB  2.14  1.01   99  up


thanks

Jake

On 20/07/2022 11:52, Jake Grimmett wrote:

Dear All,

We have just built a new cluster using Quincy 17.2.1

After copying ~25TB to the cluster (from a mimic cluster), we see 152 TB 
used, which is ~6x disparity.


Is this just a ceph accounting error, or is space being wasted?

[root@wilma-s1 ~]# du -sh /cephfs2/users
24T    /cephfs2/users

[root@wilma-s1 ~]# ls -lhd /cephfs2/users
drwxr-xr-x 240 root root 24T Jul 19 12:09 /cephfs2/users

[root@wilma-s1 ~]# df -h /cephfs2/users
Filesystem  Size  Used Avail Use% Mounted on
(SNIP):/    7.1P  152T  6.9P   3% /cephfs2

root@wilma-s1 ~]# ceph df
--- RAW STORAGE ---
CLASS SIZE    AVAIL USED  RAW USED  %RAW USED
hdd    7.0 PiB  6.9 PiB  151 TiB   151 TiB   2.10
ssd    2.7 TiB  2.7 TiB   11 GiB    11 GiB   0.38
TOTAL  7.0 PiB  6.9 PiB  151 TiB   151 TiB   2.10

--- POOLS ---
POOL ID   PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
.mgr 21    32   90 MiB   24  270 MiB  0    2.2 PiB
mds_ssd  22    32  1.0 GiB   73.69k  3.0 GiB   0.11    881 GiB
ec82pool 23  4096   20 TiB    6.28M   25 TiB   0.38    5.2 PiB
primary_fs_data  24    32  0 B    1.45M  0 B  0    881 GiB


cephfs is using a 8+2 erasure coded data pool (hdd with NVMe db/wal), 
and a 3x replicated default data pool (primary_fs_data - NVMe)


bluestore_min_alloc_size_hdd is 4096
ceph pool set ec82pool compression_algorithm lz4
ceph osd pool set ec82pool compression_mode aggressive

many thanks for any help

Jake


--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: RGW Bucket Notifications and MultiPart Uploads

2022-07-20 Thread Casey Bodley
On Wed, Jul 20, 2022 at 12:57 AM Yuval Lifshitz  wrote:
>
> yes, that would work. you would get a "404" until the object is fully
> uploaded.

just note that you won't always get 404 before multipart complete,
because multipart uploads can overwrite existing objects

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: RGW Bucket Notifications and MultiPart Uploads

2022-07-20 Thread Yehuda Sadeh-Weinraub
Can maybe leverage one of the other calls to check for upload completion:
list multipart uploads and/or list parts. The latter should work if you
have the upload id at hand.

Yehuda

On Wed, Jul 20, 2022, 8:40 AM Casey Bodley  wrote:

> On Wed, Jul 20, 2022 at 12:57 AM Yuval Lifshitz 
> wrote:
> >
> > yes, that would work. you would get a "404" until the object is fully
> > uploaded.
>
> just note that you won't always get 404 before multipart complete,
> because multipart uploads can overwrite existing objects
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: RGW Bucket Notifications and MultiPart Uploads

2022-07-20 Thread Daniel Gryniewicz
Seems like the notification for a multipart upload should look different 
to a normal upload?


Daniel

On 7/20/22 08:53, Yehuda Sadeh-Weinraub wrote:

Can maybe leverage one of the other calls to check for upload completion:
list multipart uploads and/or list parts. The latter should work if you
have the upload id at hand.

Yehuda

On Wed, Jul 20, 2022, 8:40 AM Casey Bodley  wrote:


On Wed, Jul 20, 2022 at 12:57 AM Yuval Lifshitz 
wrote:


yes, that would work. you would get a "404" until the object is fully
uploaded.


just note that you won't always get 404 before multipart complete,
because multipart uploads can overwrite existing objects

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Using cloudbase windows RBD / wnbd with pre-pacific clusters

2022-07-20 Thread Wesley Dillingham
I understand that the client side code available from cloudbase started
being distributed with pacific and now quincy client code but is there any
particular reason it shouldn't work in conjunction with a nautilus, for
instance, cluster.

We have seen some errors when trying to do IO with mapped RBDs with the
error:

The semaphore timeout period has expired.

Just trying to rule out the cluster version theory. Thanks.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: replacing OSD nodes

2022-07-20 Thread Janne Johansson
Den ons 20 juli 2022 kl 11:22 skrev Jesper Lykkegaard Karlsen :
> Thanks for you answer Janne.
> Yes, I am also running "ceph osd reweight" on the "nearfull" osds, once they 
> get too close for comfort.
>
> But I just though a continuous prioritization of rebalancing PGs, could make 
> this process more smooth, with less/no need for handheld operations.

You are absolutely right there, just wanted to chip in with my
experiences of "it nags at me but it will still work out" so other
people finding these mails later on can feel a bit relieved at knowing
that a few toofull warnings aren't a major disaster and that it
sometimes happens, because ceph looks for all possible moves, even
those who will run late in the rebalancing.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: RGW Bucket Notifications and MultiPart Uploads

2022-07-20 Thread Mark Selby
I have not tested with Quincy/17.x yet so I do not know which notifications are 
sent for Multipart uploads in this release set.

I know that for Pacific.16.x I needed to add some code/logic to only act on 
notifications that represented the end state of an Object creation.

My tests show that when a multipart upload is in progress if you perform a head 
on the object before the final part is uploaded that you will get back a 200 
with the size that the object eventually will be. The multiple noitifications 
that occur with the Multipart upload have size set to the chunk that is being 
uploaded, not the total size.

In order to get the behavior that I wanted, act on an object after it has been 
uploaded I had to code the following:

- get a notification and get it's size
- head the object in RGW and get it's size
- if the sizes do not match then do nothing.
- When the size in the notification matches the size on the head request the 
event type is ObjectCreated:CompleteMultipartUpload and we know the upload is 
complete.

This is a bunch of extra code and round trips that.I wish that I did not have 
to make.

Hopefully Quincy only sends 2 notifications for a multipart upload (1) The 
initial Post and (2) The find Put.



-- 


Mark Selby
Sr Linux Administrator, The Voleon Group
mse...@voleon.com 
 
 This email is subject to important conditions and disclosures that are listed 
on this web page: https://voleon.com/disclaimer/.
 

On 7/20/22, 5:57 AM, "Daniel Gryniewicz"  wrote:

Seems like the notification for a multipart upload should look different 
to a normal upload?

Daniel

On 7/20/22 08:53, Yehuda Sadeh-Weinraub wrote:
> Can maybe leverage one of the other calls to check for upload completion:
> list multipart uploads and/or list parts. The latter should work if you
> have the upload id at hand.
> 
> Yehuda
> 
> On Wed, Jul 20, 2022, 8:40 AM Casey Bodley  wrote:
> 
>> On Wed, Jul 20, 2022 at 12:57 AM Yuval Lifshitz 
>> wrote:
>>>
>>> yes, that would work. you would get a "404" until the object is fully
>>> uploaded.
>>
>> just note that you won't always get 404 before multipart complete,
>> because multipart uploads can overwrite existing objects
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io