[ceph-users] Re: A practical approach to efficiently store 100 billions small objects in Ceph

2021-03-11 Thread Loïc Dachary
Thanks for clarifying, I think I understand. The idea is that 1,000 ~4KB 
objects are packed together in RBD which stores them in a single 4MB RADOS 
object. Does that answer your question?

On 11/03/2021 08:22, Szabo, Istvan (Agoda) wrote:
> Hi,
>
> It relates to this sentence:
> "The median object size is ~4KB, written in RBD images using the default 
> 4MB[0] object size. That will be ~100 millions RADOS objects instead of 100 
> billions."
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> -Original Message-
> From: Loïc Dachary  
> Sent: Thursday, March 11, 2021 2:10 PM
> To: Szabo, Istvan (Agoda) 
> Cc: Ceph Users 
> Subject: [ceph-users] Re: A practical approach to efficiently store 100 
> billions small objects in Ceph
>
> Email received from outside the company. If in doubt don't click links nor 
> open attachments!
> 
>
> Hi,
>
> On 11/03/2021 04:38, Szabo, Istvan (Agoda) wrote:
>> Does this mean that even in an object store the files which is smaller than 
>> 4MB  will be packed in one 4 MB object?
> I'm not sure I understand the question. Would you be so kind as to rephrase 
> it?
>
> Cheers
>> -Original Message-
>> From: Loïc Dachary 
>> Sent: Thursday, March 11, 2021 2:13 AM
>> To: Konstantin Shalygin 
>> Cc: Ceph Users ; swh-de...@inria.fr
>> Subject: [ceph-users] Re: A practical approach to efficiently store 100 
>> billions small objects in Ceph
>>
>> Email received from outside the company. If in doubt don't click links nor 
>> open attachments!
>> 
>>
>> Hi Konstantin,
>>
>> Thanks for the advice. Luckily objects are packed together and Ceph will 
>> only see larger objects. The median object size is ~4KB, written in RBD 
>> images using the default 4MB[0] object size. That will be ~100 millions 
>> RADOS objects instead of 100 billions.
>>
>> Cheers
>>
>> [0] https://docs.ceph.com/en/latest/man/8/rbd/#cmdoption-rbd-object-size
>>
>> On 10/03/2021 17:44, Konstantin Shalygin wrote:
>>> Loic, please wait (or use shaman builds) for 14.2.17 cause for clusters 
>>> with billions of injects code was not optimal [1] at objects delete step
>>>
>>>
>>> [1] https://tracker.ceph.com/issues/47044 
>>> 
>>> k
>>>
>>> Sent from my iPhone
>>>
 On 10 Mar 2021, at 17:55, Loïc Dachary  wrote:

 The next step will be to write and run benchmarks
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
>>
>>
>> 
>> This message is confidential and is for the sole use of the intended 
>> recipient(s). It may also be privileged or otherwise protected by copyright 
>> or other legal rules. If you have received it by mistake please let us know 
>> by reply email and delete it from your system. It is prohibited to copy this 
>> message or disclose its content to anyone. Any confidentiality or privilege 
>> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
>> the message. All messages sent to and from Agoda may be monitored to ensure 
>> compliance with company policies, to protect the company's interests and to 
>> remove potential malware. Electronic messages may be intercepted, amended, 
>> lost or deleted, or contain viruses.
> --
> Loïc Dachary, Artisan Logiciel Libre
>
>

-- 
Loïc Dachary, Artisan Logiciel Libre




OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mon db growing. over 500Gb

2021-03-11 Thread Marc
>From what I have read here in the past, growing monitor db is related to not 
>having pg's in  'clean active' state


> -Original Message-
> From: ricardo.re.azev...@gmail.com 
> Sent: 11 March 2021 00:59
> To: ceph-users@ceph.io
> Subject: [ceph-users] mon db growing. over 500Gb
> 
> Hi all,
> 
> 
> 
> I have a fairly pressing issue. I had a monitor fall out of quorum
> because
> it ran out of disk space during rebalancing from switching to upmap. I
> noticed all my monitor store.db started taking up nearly all disk space
> so I
> set noout, nobackfill and norecover and shutdown all the monitor
> daemons.
> Each store.db was at:
> 
> 
> 
> mon.a 89GB (the one that firt dropped out)
> 
> mon.a 400GB
> 
> mon.c 400GB
> 
> 
> I tried setting mon_compact_on_start. This brought  mon.a down to 1GB.
> Cool.
> However, when I try it on the other monitors it increased the db size
> ~1Gb/10s so I shut them down again.
> 
> Any idea what is going on? Or how can I shrik back down the db?
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unpurgeable rbd image from trash

2021-03-11 Thread Enrico Bocchi

Thanks a lot, that fixed the problem.

For the record, we are running nautilus 14.2.11 and there is no such 
'--input-file' option for setomapval.
`#rados -p volumes setomapval rbd_trash id_5afa5e5a07b8bc < key_file` 
does the trick.


Cheers,
Enrico


On 10/03/2021 17:17, Jason Dillaman wrote:

Odd, it looks like it's stuck in the "MOVING" state. Perhaps the "rbd
trash mv" command was aborted mid-operation? The way to work around
this issue is as follows:

$ rados -p volumes getomapval rbd_trash id_5afa5e5a07b8bc key_file
$ hexedit key_file ## CHANGE LAST BYTE FROM '01' to '00'
$ rados -p volumes setomapval rbd_trash id_5afa5e5a07b8bc --input-file key_file
$ rbd trash rm --pool volumes 5afa5e5a07b8bc

I'll open a ticket to expose the ability to run "rbd trash mv
--image-id " to workaround an interrupted move operation in the
future and to cleanup the error text.

On Wed, Mar 10, 2021 at 8:37 AM Enrico Bocchi  wrote:

Hello Jason,

# rados -p volumes listomapvals rbd_trash
id_5afa5e5a07b8bc
value (71 bytes) :
  02 01 41 00 00 00 00 2b  00 00 00 76 6f 6c 75 6d
|..A+...volum|
0010  65 2d 30 32 64 39 35 39  66 65 2d 61 36 39 33 2d
|e-02d959fe-a693-|
0020  34 61 63 62 2d 39 35 65  32 2d 63 61 30 34 62 39
|4acb-95e2-ca04b9|
0030  36 35 33 38 39 62 12 05  2a 60 09 c5 d4 16 12 05
|65389b..*`..|
0040  2a 60 09 c5 d4 16 01 |*`.|
0047

Hope that helps.
Cheers,
Enrico


On 10/03/2021 14:31, Jason Dillaman wrote:

Can you provide the output from "rados -p volumes listomapvals rbd_trash"?

On Wed, Mar 10, 2021 at 8:03 AM Enrico Bocchi  wrote:

Hello everyone,

We have an unpurgeable image living in the trash of one of our clusters:
# rbd --pool volumes trash ls
5afa5e5a07b8bc volume-02d959fe-a693-4acb-95e2-ca04b965389b

If we try to purge the whole trash it says the image is being restored
but we have never tried to do that:
# rbd --pool volumes trash purge
Removing images: 0% complete...failed.
2021-03-10 13:58:42.849 7f78b3fc9c80 -1 librbd::api::Trash: remove:
error: image is pending restoration.

When trying to delete manually, it says there are some watchers, but
this is actually not the case:

# rbd --pool volumes trash remove 5afa5e5a07b8bc
rbd: error: image still has watchers2021-03-10 14:00:21.262 7f93ee8f8c80
-1 librbd::api::Trash: remove: error: image is pending restoration.
This means the image is still open or the client using it crashed. Try
again after closing/unmapping it or waiting 30s for the crashed client
to timeout.
Removing image:
0% complete...failed.

# rados listwatchers -p volumes rbd_header.5afa5e5a07b8bc
#

We have tried to stat the first 10 rbd_data objects and they were all
deleted.
We know we can manually delete the omapkey from rbd_trash but we though
it would be better to understand how an image might get in this state.
Has anyone seen this before?

Many thanks!
Cheers,
Enrico


--
Enrico Bocchi
CERN European Laboratory for Particle Physics
IT - Storage Group - General Storage Services
Mailbox: G20500 - Office: 31-2-010
1211 Genève 23
Switzerland
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--
Enrico Bocchi
CERN European Laboratory for Particle Physics
IT - Storage Group - General Storage Services
Mailbox: G20500 - Office: 31-2-010
1211 Genève 23
Switzerland




--
Enrico Bocchi
CERN European Laboratory for Particle Physics
IT - Storage Group - General Storage Services
Mailbox: G20500 - Office: 31-2-010
1211 Genève 23
Switzerland
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mon db growing. over 500Gb

2021-03-11 Thread Andreas John
Hello,

I also observed excessively growing mon DB in case of recovery. Luckily
we were able to solve it by exdending the mon db disk.

Without having the chance to re-check: The options nobackfill and
norecover might cause that behavior.It feelds like mon holds data that
cannot be flushed to an OSD.


rgds,

j.



On 11.03.21 10:47, Marc wrote:
> From what I have read here in the past, growing monitor db is related to not 
> having pg's in  'clean active' state
>
>
>> -Original Message-
>> From: ricardo.re.azev...@gmail.com 
>> Sent: 11 March 2021 00:59
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] mon db growing. over 500Gb
>>
>> Hi all,
>>
>>
>>
>> I have a fairly pressing issue. I had a monitor fall out of quorum
>> because
>> it ran out of disk space during rebalancing from switching to upmap. I
>> noticed all my monitor store.db started taking up nearly all disk space
>> so I
>> set noout, nobackfill and norecover and shutdown all the monitor
>> daemons.
>> Each store.db was at:
>>
>>
>>
>> mon.a 89GB (the one that firt dropped out)
>>
>> mon.a 400GB
>>
>> mon.c 400GB
>>
>>
>> I tried setting mon_compact_on_start. This brought  mon.a down to 1GB.
>> Cool.
>> However, when I try it on the other monitors it increased the db size
>> ~1Gb/10s so I shut them down again.
>>
>> Any idea what is going on? Or how can I shrik back down the db?
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad
Before I started the upgrade the cluster was healthy but one 
OSD(osd.355) was down, can't remember if it was in or out.

Upgrade was started with
ceph orch upgrade start --image 
goharbor.example.com/library/ceph/ceph:v15.2.9


The upgrade started but when Ceph tried to upgrade osd.355 it paused 
with the following messages:


2021-03-11T09:15:35.638104+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Target is goharbor.example.com/library/ceph/ceph:v15.2.9 with id 
dfc48307963697ff48acd9dd6fda4a7a24017b9d8124f86c2

a542b0802fe77ba
2021-03-11T09:15:35.639882+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Checking mgr daemons...
2021-03-11T09:15:35.644170+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
All mgr daemons are up to date.
2021-03-11T09:15:35.644376+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Checking mon daemons...
2021-03-11T09:15:35.647669+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
All mon daemons are up to date.
2021-03-11T09:15:35.647866+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Checking crash daemons...
2021-03-11T09:15:35.652035+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Setting container_image for all crash...
2021-03-11T09:15:35.653683+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
All crash daemons are up to date.
2021-03-11T09:15:35.653896+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Checking osd daemons...
2021-03-11T09:15:36.273345+ mgr.pech-mon-2.cjeiyc [INF] It is 
presumed safe to stop ['osd.355']
2021-03-11T09:15:36.273504+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
It is presumed safe to stop ['osd.355']
2021-03-11T09:15:36.273887+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Redeploying osd.355
2021-03-11T09:15:36.276673+ mgr.pech-mon-2.cjeiyc [ERR] Upgrade: 
Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.355 on host 
pech-hd-009 failed.



One of the first ting the upgrade did was to upgrade mon, so they are 
restarted and now the osd.355 no longer exist


$ ceph osd info osd.355
Error EINVAL: osd.355 does not exist

But if I run a resume
ceph orch upgrade resume
it still tries to upgrade osd.355, same message as above.

I tried to stop and start the upgrade again with
ceph orch upgrade stop
ceph orch upgrade start --image 
goharbor.example.com/library/ceph/ceph:v15.2.9

it still tries to upgrade osd.355, with the same message as above.

Looking at the source code it looks like it get daemons to upgrade from 
mgr cache, so I restarted both mgr but still it tries to upgrade 
osd.355.



Does anyone know how I can get the upgrade to continue?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: A practical approach to efficiently store 100 billions small objects in Ceph

2021-03-11 Thread Szabo, Istvan (Agoda)
Hi,

It relates to this sentence:
"The median object size is ~4KB, written in RBD images using the default 4MB[0] 
object size. That will be ~100 millions RADOS objects instead of 100 billions."

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Loïc Dachary  
Sent: Thursday, March 11, 2021 2:10 PM
To: Szabo, Istvan (Agoda) 
Cc: Ceph Users 
Subject: [ceph-users] Re: A practical approach to efficiently store 100 
billions small objects in Ceph

Email received from outside the company. If in doubt don't click links nor open 
attachments!


Hi,

On 11/03/2021 04:38, Szabo, Istvan (Agoda) wrote:
> Does this mean that even in an object store the files which is smaller than 
> 4MB  will be packed in one 4 MB object?
I'm not sure I understand the question. Would you be so kind as to rephrase it?

Cheers
>
> -Original Message-
> From: Loïc Dachary 
> Sent: Thursday, March 11, 2021 2:13 AM
> To: Konstantin Shalygin 
> Cc: Ceph Users ; swh-de...@inria.fr
> Subject: [ceph-users] Re: A practical approach to efficiently store 100 
> billions small objects in Ceph
>
> Email received from outside the company. If in doubt don't click links nor 
> open attachments!
> 
>
> Hi Konstantin,
>
> Thanks for the advice. Luckily objects are packed together and Ceph will only 
> see larger objects. The median object size is ~4KB, written in RBD images 
> using the default 4MB[0] object size. That will be ~100 millions RADOS 
> objects instead of 100 billions.
>
> Cheers
>
> [0] https://docs.ceph.com/en/latest/man/8/rbd/#cmdoption-rbd-object-size
>
> On 10/03/2021 17:44, Konstantin Shalygin wrote:
>> Loic, please wait (or use shaman builds) for 14.2.17 cause for clusters with 
>> billions of injects code was not optimal [1] at objects delete step
>>
>>
>> [1] https://tracker.ceph.com/issues/47044 
>> 
>> k
>>
>> Sent from my iPhone
>>
>>> On 10 Mar 2021, at 17:55, Loïc Dachary  wrote:
>>>
>>> The next step will be to write and run benchmarks
> --
> Loïc Dachary, Artisan Logiciel Libre
>
>
>
> 
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.

--
Loïc Dachary, Artisan Logiciel Libre


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how smart is ceph recovery?

2021-03-11 Thread Marc
> >
> > 2. If a down host comes up again and it's osd are started. Is data
> still being copied, or does ceph see that checksums(?)
> 
> PG or RADOS object epoch I think. So if data hasn’t changed, the
> recovery completes without having anything to do.
> 
> > are the same and just sets a pointer(?) 

> back to the old location?
Yes I mean the osdmap having the old osd on the node that was down.

Hmmm, I currently have pg's in 'active+remapped+backfill_wait' of a pool 
rbd.backup. Of which I know nothing in it has changed. And those osd's 
listed[1] are already up. Especially when you are getting at the 'end' of 
recovery where osd_max_backfills=X has less effect and recovery takes longer. 
It would be nice to have the "no work needed pg's" be processed quickly.

[1]
[18,25,41]p18 [18,41,17]
[5,0,25]p5 [5,0,4
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: A practical approach to efficiently store 100 billions small objects in Ceph

2021-03-11 Thread Szabo, Istvan (Agoda)
Yeah, makes sense and sounds a good idea :) I’ve never thought about this, will 
think in case of object store in our clusters.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

On 2021. Mar 11., at 15:06, Loïc Dachary  wrote:

Thanks for clarifying, I think I understand. The idea is that 1,000 ~4KB 
objects are packed together in RBD which stores them in a single 4MB RADOS 
object. Does that answer your question?

On 11/03/2021 08:22, Szabo, Istvan (Agoda) wrote:
Hi,

It relates to this sentence:
"The median object size is ~4KB, written in RBD images using the default 4MB[0] 
object size. That will be ~100 millions RADOS objects instead of 100 billions."

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Loïc Dachary 
Sent: Thursday, March 11, 2021 2:10 PM
To: Szabo, Istvan (Agoda) 
Cc: Ceph Users 
Subject: [ceph-users] Re: A practical approach to efficiently store 100 
billions small objects in Ceph

Email received from outside the company. If in doubt don't click links nor open 
attachments!


Hi,

On 11/03/2021 04:38, Szabo, Istvan (Agoda) wrote:
Does this mean that even in an object store the files which is smaller than 4MB 
 will be packed in one 4 MB object?
I'm not sure I understand the question. Would you be so kind as to rephrase it?

Cheers
-Original Message-
From: Loïc Dachary 
Sent: Thursday, March 11, 2021 2:13 AM
To: Konstantin Shalygin 
Cc: Ceph Users ; swh-de...@inria.fr
Subject: [ceph-users] Re: A practical approach to efficiently store 100 
billions small objects in Ceph

Email received from outside the company. If in doubt don't click links nor open 
attachments!


Hi Konstantin,

Thanks for the advice. Luckily objects are packed together and Ceph will only 
see larger objects. The median object size is ~4KB, written in RBD images using 
the default 4MB[0] object size. That will be ~100 millions RADOS objects 
instead of 100 billions.

Cheers

[0] https://docs.ceph.com/en/latest/man/8/rbd/#cmdoption-rbd-object-size

On 10/03/2021 17:44, Konstantin Shalygin wrote:
Loic, please wait (or use shaman builds) for 14.2.17 cause for clusters with 
billions of injects code was not optimal [1] at objects delete step


[1] https://tracker.ceph.com/issues/47044 

k

Sent from my iPhone

On 10 Mar 2021, at 17:55, Loïc Dachary  wrote:

The next step will be to write and run benchmarks
--
Loïc Dachary, Artisan Logiciel Libre




This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
--
Loïc Dachary, Artisan Logiciel Libre



--
Loïc Dachary, Artisan Logiciel Libre


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm (curl master)/15.2.9:: how to add orchestration

2021-03-11 Thread Adrian Sevcenco
Hi! After an initial bumpy bootstrapping (IMHO the defaults should be 
whatever is already defined in .ssh of the user and custom values setup 
with cli arguments) now i'm stuck adding any service/hosts/osds because 
apparently i lack orchestration .. the the documentation show a big 
"Page does not exist"

see
https://docs.ceph.com/en/latest/docs/octopus/mgr/orchestrator

so, what is it and what options do i have?
to set up it seems that is as easy as:
ceph orch set backend

I just started with ceph and i just want to start a ceph service (i 
cannot call it a cluster) on my desktop (with 2 dedicated osds) to get 
familiar also with usage.


Thanks a lot!
Adrian



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm (curl master)/15.2.9:: how to add orchestration

2021-03-11 Thread Janne Johansson
Den tors 11 mars 2021 kl 13:56 skrev Adrian Sevcenco :
> apparently i lack orchestration .. the the documentation show a big
> "Page does not exist"
> see
> https://docs.ceph.com/en/latest/docs/octopus/mgr/orchestrator
>

Where does this link come from?

Usually "latest" and an actual release name (octopus in this case)
don't appear both in an docs url, so wherever you saw that particular
url must have a broken reference.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Has anyone contact Data for Samsung Datacenter SSD Support ?

2021-03-11 Thread Christoph Adomeit
Hi,

I hope someone here can help me out with some contact data, email-adress or 
phone Number for Samsung Datacenter SSD Support ? If I contact Standard Samsung 
Datacenter Support they tell me they are not there to support PM1735 Drives.

We are planning a new Ceph-Cluster and we are thinking of Samsung PM1735 NVME 
u.2 ssds

Unfortunately the PM1735 is not available with u2 interface but the PM1733 is.

Some manager from Samsung once told me that PM1733 and PM1735 is exactly the 
same Hardware, it is only provisioned differently.

But he did not know whom to ask. Any idea of whom I could contact at Samsung or 
of how to provision the PM1733 (7.6TB) to a PM1735 (6.4TB) .

I want the provisioning for better DWPD (3DWPD instead of 1DWPD).


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] how to tell balancer to balance

2021-03-11 Thread Boris Behrens
Hi,
I know this topic seems to be handled a lot (as far as I can see), but I
reached the end of my google_foo.

* We have OSDs that are near full, but there are also OSDs that are only
loaded with 50%.
* We have 4,8,16 TB rotating disks in the cluster.
* The disks that get packed are 4TB disks and very empty disks are also 4TB
* The OSD nodes are all around the same total disk space (51 - 59)
* The balancer tells me that it can not find further optimization, or that
pg_num is decreasin.

How can I debug further before the cluster goes into a bad state?

[root@s3db1 ~]# ceph osd df tree | sort -nk 17 | head -n 30
ID  CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
  %USE  VAR  PGS STATUS TYPE NAME
MIN/MAX VAR: 0.75/1.23  STDDEV: 6.96
   TOTAL 673 TiB 474 TiB 452 TiB 100 GiB 1.2 TiB 199
TiB 70.39
 -358.49872-  58 TiB  39 TiB  36 TiB 8.2 GiB  85 GiB  19
TiB 67.40 0.96   -host s3db2
 -458.49872-  58 TiB  40 TiB  35 TiB  35 GiB  81 GiB  19
TiB 67.91 0.96   -host s3db3
-1150.94173-  51 TiB  35 TiB  35 TiB 3.5 GiB  94 GiB  16
TiB 68.00 0.97   -host s3db10
-1051.28331-  51 TiB  35 TiB  35 TiB 4.6 GiB  93 GiB  16
TiB 69.16 0.98   -host s3db9
 -658.89636-  59 TiB  41 TiB  40 TiB 2.4 GiB 102 GiB  18
TiB 69.15 0.98   -host s3db5
-1250.99052-  51 TiB  36 TiB  36 TiB 1.8 GiB  93 GiB  15
TiB 69.99 0.99   -host s3db11
 -258.20561-  58 TiB  41 TiB  37 TiB 9.6 GiB  96 GiB  17
TiB 70.00 0.99   -host s3db1
 -1   673.44452- 673 TiB 474 TiB 452 TiB 100 GiB 1.2 TiB 199
TiB 70.39 1.00   -root default
 -558.49872-  58 TiB  42 TiB  35 TiB 7.0 GiB  94 GiB  17
TiB 71.06 1.01   -host s3db4
 -858.89636-  59 TiB  42 TiB  42 TiB 3.6 GiB 108 GiB  17
TiB 71.91 1.02   -host s3db7
 -758.89636-  59 TiB  43 TiB  42 TiB  15 GiB 120 GiB  16
TiB 72.69 1.03   -host s3db6
-3758.55478-  59 TiB  43 TiB  43 TiB 4.4 GiB 117 GiB  16
TiB 73.18 1.04   -host s3db12
 -951.28331-  51 TiB  38 TiB  38 TiB 4.9 GiB 103 GiB  13
TiB 74.18 1.05   -host s3db8
 15   hdd   3.63689  1.0 3.6 TiB 1.9 TiB 1.7 TiB 2.1 GiB 0 B 1.7
TiB 52.87 0.75  45 up osd.15
  6   hdd   3.63689  1.0 3.6 TiB 1.9 TiB 1.7 TiB 1.7 GiB 0 B 1.7
TiB 52.90 0.75  46 up osd.6
 12   hdd   3.63689  1.0 3.6 TiB 1.9 TiB 1.7 TiB 570 MiB 0 B 1.7
TiB 53.04 0.75  41 up osd.12
 81   hdd   3.63689  1.0 3.6 TiB 2.0 TiB 1.7 TiB 895 MiB 0 B 1.7
TiB 54.26 0.77  51 up osd.81
 27   hdd   3.73630  1.0 3.7 TiB 2.1 TiB 2.0 TiB 6.8 MiB 5.8 GiB 1.6
TiB 56.12 0.80  47 up osd.27
  3   hdd   3.63689  1.0 3.6 TiB 2.1 TiB 1.6 TiB 510 MiB 0 B 1.6
TiB 57.04 0.81  51 up osd.3
  5   hdd   3.63689  1.0 3.6 TiB 2.1 TiB 1.5 TiB 431 MiB 0 B 1.5
TiB 57.88 0.82  49 up osd.5
 80   hdd   3.63689  1.0 3.6 TiB 2.1 TiB 1.5 TiB 1.8 GiB 0 B 1.5
TiB 58.31 0.83  51 up osd.80
 25   hdd   3.73630  1.0 3.7 TiB 2.2 TiB 2.1 TiB 4.1 MiB 6.1 GiB 1.5
TiB 58.91 0.84  39 up osd.25
  0   hdd   3.73630  1.0 3.7 TiB 2.2 TiB 2.1 TiB  83 MiB 6.2 GiB 1.5
TiB 60.03 0.85  46 up osd.0
 79   hdd   3.63689  1.0 3.6 TiB 2.3 TiB 1.4 TiB 1.8 GiB 0 B 1.4
TiB 62.53 0.89  47 up osd.79
 61   hdd   7.32619  1.0 7.3 TiB 4.6 TiB 4.6 TiB 1.1 GiB  12 GiB 2.7
TiB 62.80 0.89 101 up osd.61
 67   hdd   7.27739  1.0 7.3 TiB 4.6 TiB 4.6 TiB 557 MiB  13 GiB 2.7
TiB 63.29 0.90  96 up osd.67
 72   hdd   7.32619  1.0 7.3 TiB 4.6 TiB 4.6 TiB 107 MiB  11 GiB 2.7
TiB 63.36 0.90  87 up osd.72

[root@s3db1 ~]# ceph osd df tree | sort -nk 17 | tail
 51   hdd   7.27739  1.0 7.3 TiB 5.6 TiB 5.5 TiB 724 MiB  14 GiB 1.7
TiB 76.34 1.08 105 up osd.51
 71   hdd   3.68750  1.0 3.7 TiB 2.8 TiB 2.8 TiB 3.7 MiB 7.8 GiB 867
GiB 77.04 1.09  47 up osd.71
 82   hdd   3.63689  1.0 3.6 TiB 2.8 TiB 839 GiB 628 MiB 0 B 839
GiB 77.48 1.10  45 up osd.82
 14   hdd   3.63689  1.0 3.6 TiB 2.9 TiB 777 GiB  18 GiB 0 B 777
GiB 79.14 1.12  59 up osd.14
  4   hdd   3.63689  1.0 3.6 TiB 2.9 TiB 752 GiB 826 MiB 0 B 752
GiB 79.80 1.13  53 up osd.4
 75   hdd   3.68750  1.0 3.7 TiB 2.9 TiB 2.9 TiB 523 MiB 8.2 GiB 757
GiB 79.95 1.14  53 up osd.75
 76   hdd   3.68750  1.0 3.7 TiB 3.0 TiB 3.0 TiB 237 MiB 9.2 GiB 668
GiB 82.30 1.17  50 up osd.76
 33   hdd   3.73630  1.0 3.7 TiB 3.1 TiB 3.0 TiB 380 MiB 8.5 GiB 671
GiB 82.46 1.17  57 up osd.33
 34   hdd   3.73630  1.0 3.7 TiB 3.1 TiB 3.0 TiB 464 MiB 8.4 GiB 605
GiB 84.18 1.20  60 up os

[ceph-users] Re: Alertmanager not using custom configuration template

2021-03-11 Thread Marc 'risson' Schmitt
Hi,

On Thu, 11 Mar 2021 11:47:44 +0100
Sebastian Wagner  wrote:
> Indeed. I just merged https://github.com/ceph/ceph/pull/39932
> which fixes the names of those config keys.
> 
> Might want to try again (with slashes instead of underscores).  

This was indeed the problem. Thanks for your fix!

Regards,

-- 
Marc 'risson' Schmitt
CRI - EPITA
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm (curl master)/15.2.9:: how to add orchestration

2021-03-11 Thread Adrian Sevcenco

On 3/11/21 3:07 PM, Sebastian Wagner wrote:

Hi Adrian,

Hi!


Am 11.03.21 um 13:55 schrieb Adrian Sevcenco:

Hi! After an initial bumpy bootstrapping (IMHO the defaults should be
whatever is already defined in .ssh of the user and custom values setup
with cli arguments) now i'm stuck adding any service/hosts/osds because
apparently i lack orchestration


did you call bootstrap with --skip-ssh? Might explain this.
yes.. i did a lot of tryout to make it work with the already existing 
keys and ~/.ssh/config .. quite frustrating ..



Don't know how you ended up with your url, but the correct one is:



https://docs.ceph.com/en/octopus/mgr/orchestrator/

this is the link presented by dashboard when goint to Hosts or Inventory
and warning me that i lack orchestration and pointing me to this 
documentation link



$ ceph mgr module enable cephadm
$ ceph orch set backend cephadm

great thanks a lot!!
i can now add OSDs..


$ ceph cephadm set-ssh-config -i ...
$ cephadm set-priv-key -i ...
$ cephadm set-pub-key -i ...

i found that i can set these in the bootstrap...

Thanks a lot for help!
Adrian



Then


https://docs.ceph.com/en/octopus/cephadm/install/#add-hosts-to-the-cluster



should get you going again.

I'd recommend to avoid calling --skip-ssh to avoid this roundtrip.
Setting the ssh configs via



https://docs.ceph.com/en/latest/man/8/cephadm/#bootstrap


works better typically.



.. the the documentation show a big
"Page does not exist"
see
https://docs.ceph.com/en/latest/docs/octopus/mgr/orchestrator

so, what is it and what options do i have?
to set up it seems that is as easy as:
ceph orch set backend

I just started with ceph and i just want to start a ceph service (i
cannot call it a cluster) on my desktop (with 2 dedicated osds) to get
familiar also with usage.

Thanks a lot!
Adrian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io






--
--
Adrian Sevcenco, Ph.D.   |
Institute of Space Science - ISS, Romania|
adrian.sevcenco at {cern.ch,spacescience.ro} |
--



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 3 x OSD work start after host reboot

2021-03-11 Thread Andrew Walker-Brown
Hi all,

I’m just testing a new cluster and after shutting down one of the hosts, when I 
bring it back up none of the OSD’s will restart.

The services fail to start of the osd’s and systemctl status for the service 
states “failed with result ‘exit-code’”

Where to start looking for the root cause?

Kind regards,

Andrew


Sent from Mail for Windows 10
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] NVME pool creation time :: OSD services strange state

2021-03-11 Thread Adrian Sevcenco
Hi! So, after i selected the tags to add 2 nvme ssds i declared a 
replicated n=2 pool .. and for the last 30 min the progress shown in 
notification is 0% and iotop shows around 100K/s for 2 (???) ceph-mon 
processes and that all ...


and in my service list the osd services look somehow empty:
https://prntscr.com/10iwwbh

what did i miss?

Thanks a lot!
Adrian



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Alertmanager not using custom configuration template

2021-03-11 Thread Sebastian Wagner
Hi Mark,

Indeed. I just merged https://github.com/ceph/ceph/pull/39932
which fixes the names of those config keys.

Might want to try again (with slashes instead of underscores).

Thanks for reporting this,

Sebastian

Am 10.03.21 um 15:34 schrieb Marc 'risson' Schmitt:
> Hi,
> 
> I'm trying to use a custom template for Alertmanager deployed with
> Cephadm. Following its documentation[1], I set the option
> `mgr/cephadm/alertmanager_alertmanager.yml` to my own template,
> restarted the mgr, and re-deployed Alertmanager. However, Cephadm seems
> to always use its internal template.
> 
> After some debugging, I found that the mgr indeed queries for
> `mgr/cephadm/alertmanager_alertmanager.yml`, but the end template
> always is the default one.
> 
> Am I doing something wrong? Is anyone else having this issue?
> 
> [1]
> https://docs.ceph.com/en/octopus/cephadm/monitoring/#using-custom-configuration-files
> 
> Regards,
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad

Hi Sebastian

On 11.03.2021 13:13, Sebastian Wagner wrote:

looks like

$ ssh pech-hd-009
# cephadm ls

is returning this non-existent OSDs.

can you verify that `cephadm ls` on that host doesn't
print osd.355 ?


"cephadm ls" on the node does list this drive

{
"style": "cephadm:v1",
"name": "osd.355",
"fsid": "3614abcc-201c-11eb-995a-2794bcc75ae0",
"systemd_unit": "ceph-3614abcc-201c-11eb-995a-2794bcc75ae0@osd.355",
"enabled": true,
"state": "stopped",
"container_id": null,
"container_image_name": 
"goharbor.example.com/library/ceph/ceph:v15.2.5",

"container_image_id": null,
"version": null,
"started": null,
"created": "2021-01-20T09:53:22.229080",
"deployed": "2021-02-09T09:24:02.855576",
"configured": "2021-02-09T09:24:04.211587"
}


To resolve it, could I just remove it with "cephadm rm-daemon"?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Openstack rbd image Error deleting problem

2021-03-11 Thread Konstantin Shalygin
You can enable object-map feature online and rebuild it. This will be helping 
for deleting objects.


k

Sent from my iPhone

> On 11 Mar 2021, at 04:05, Norman.Kern  wrote:
> 
> No, I use its default features like this:
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Container deployment - Ceph-volume activation

2021-03-11 Thread Cloud Guy
Hello,



TL;DR

Looking for guidance on ceph-volume lvm activate --all as it would apply to
a containerized ceph deployment (Nautilus or Octopus).



Detail:

I’m planning to upgrade my Nautilus non-container cluster to Octopus
(eventually containerized).   There’s an expanded procedure that was tested
and working in our lab, however won’t go into the whole process.   My
question is around existing OSD hosts.



I have to re-platform the host OS, and one of the ways in the OSDs were
reactivated previously when this was done (non-containerized) was to
install ceph packages, deploy keys, config, etc.   then run ceph-volume lvm
activate --all to magically bring up all OSDs.



Looking for a similar approach except if the OSDs are containerized, and I
re-platform the host OS (Centos -> Ubuntu), how could I reactivate all OSDs
as containers and avoid rebuilding data on the OSDs?



Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NVME pool creation time :: OSD services strange state

2021-03-11 Thread Adrian Sevcenco

On 3/11/21 4:45 PM, Adrian Sevcenco wrote:
Hi! So, after i selected the tags to add 2 nvme ssds i declared a 
replicated n=2 pool .. and for the last 30 min the progress shown in 
notification is 0% and iotop shows around 100K/s for 2 (???) ceph-mon 
processes and that all ...


and in my service list the osd services look somehow empty:
https://prntscr.com/10iwwbh
so, i found error in the logs related to OSDs, and i saved the logs 
here: https://asevcenc.web.cern.ch/asevcenc/CEPH/osd_log.txt


it woulds seem that this is a python incompatibility ..
is it a problem if cephadm is curl downloaded but locally i have:
rpm -qa | grep ceph
python3-ceph-common-15.2.9-1.fc33.x86_64
python3-ceph-argparse-15.2.9-1.fc33.x86_64
libcephfs2-15.2.9-1.fc33.x86_64
python3-cephfs-15.2.9-1.fc33.x86_64
ceph-common-15.2.9-1.fc33.x86_64

?

Thank you!
Adrian



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NVME pool creation time :: OSD services strange state - SOLVED

2021-03-11 Thread Adrian Sevcenco

On 3/11/21 5:01 PM, Adrian Sevcenco wrote:

On 3/11/21 4:45 PM, Adrian Sevcenco wrote:
Hi! So, after i selected the tags to add 2 nvme ssds i declared a 
replicated n=2 pool .. and for the last 30 min the progress shown in 
notification is 0% and iotop shows around 100K/s for 2 (???) ceph-mon 
processes and that all ...


and in my service list the osd services look somehow empty:
https://prntscr.com/10iwwbh
so, i found error in the logs related to OSDs, and i saved the logs 
here: https://asevcenc.web.cern.ch/asevcenc/CEPH/osd_log.txt
So, i noticed in the log that it had a complain about having a GPT 
header. after deletion with dd and re-add of OSDs it seems that the pool 
was generated.


Adrian



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad

On 11.03.2021 15:47, Sebastian Wagner wrote:

yes

Am 11.03.21 um 15:46 schrieb Kai Stian Olstad:


To resolve it, could I just remove it with "cephadm rm-daemon"?


That worked like a charm, and the upgrade is resumed.

Thank you Sebastian.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSDs crashing after server reboot.

2021-03-11 Thread Cassiano Pilipavicius
Hi, please if someone know how to help, I have an HDD pool in mycluster and
after rebooting one server,  my osds has started to crash.

This pool is a backup pool and have OSD as failure domain with an size of 2.

After rebooting one server, My osds started to crash, and the thing is only
getting worse. I have then tried to run ceph-bluestore-tool repair and I
receive what I think is the same error that shows on the osd logs:

[root@cwvh13 ~]# ceph-bluestore-tool repair --path
/var/lib/ceph/osd/ceph-81 --log-level 10
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/Allocator.cc:
In function 'virtual Allocator::SocketHook::~SocketHook()' thread
7f6467ffcec0 time 2021-03-11 12:13:12.121766
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/Allocator.cc:
53: FAILED ceph_assert(r == 0)
 ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x7f645e1a7b27]
 2: (()+0x25ccef) [0x7f645e1a7cef]
 3: (()+0x3cd57f) [0x5642e85c457f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x5642e85f3f37]
 5: (BlueStore::_close_alloc()+0x42) [0x5642e84379d2]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x5642e84bbac8]
 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x293) [0x5642e84bbf13]
 8: (main()+0x13cc) [0x5642e83caaec]
 9: (__libc_start_main()+0xf5) [0x7f645ae24555]
 10: (()+0x1fae9f) [0x5642e83f1e9f]
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Sebastian Wagner
Hi Kai,

looks like

$ ssh pech-hd-009
# cephadm ls

is returning this non-existent OSDs.

can you verify that `cephadm ls` on that host doesn't
print osd.355 ?

Best,
Sebastian

Am 11.03.21 um 12:16 schrieb Kai Stian Olstad:
> Before I started the upgrade the cluster was healthy but one
> OSD(osd.355) was down, can't remember if it was in or out.
> Upgrade was started with
>     ceph orch upgrade start --image
> goharbor.example.com/library/ceph/ceph:v15.2.9
> 
> The upgrade started but when Ceph tried to upgrade osd.355 it paused
> with the following messages:
> 
>     2021-03-11T09:15:35.638104+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Target is goharbor.example.com/library/ceph/ceph:v15.2.9 with id
> dfc48307963697ff48acd9dd6fda4a7a24017b9d8124f86c2
> a542b0802fe77ba
>     2021-03-11T09:15:35.639882+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Checking mgr daemons...
>     2021-03-11T09:15:35.644170+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> All mgr daemons are up to date.
>     2021-03-11T09:15:35.644376+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Checking mon daemons...
>     2021-03-11T09:15:35.647669+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> All mon daemons are up to date.
>     2021-03-11T09:15:35.647866+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Checking crash daemons...
>     2021-03-11T09:15:35.652035+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Setting container_image for all crash...
>     2021-03-11T09:15:35.653683+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> All crash daemons are up to date.
>     2021-03-11T09:15:35.653896+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Checking osd daemons...
>     2021-03-11T09:15:36.273345+ mgr.pech-mon-2.cjeiyc [INF] It is
> presumed safe to stop ['osd.355']
>     2021-03-11T09:15:36.273504+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> It is presumed safe to stop ['osd.355']
>     2021-03-11T09:15:36.273887+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Redeploying osd.355
>     2021-03-11T09:15:36.276673+ mgr.pech-mon-2.cjeiyc [ERR] Upgrade:
> Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.355 on host
> pech-hd-009 failed.
> 
> 
> One of the first ting the upgrade did was to upgrade mon, so they are
> restarted and now the osd.355 no longer exist
> 
>     $ ceph osd info osd.355
>     Error EINVAL: osd.355 does not exist
> 
> But if I run a resume
>     ceph orch upgrade resume
> it still tries to upgrade osd.355, same message as above.
> 
> I tried to stop and start the upgrade again with
>     ceph orch upgrade stop
>     ceph orch upgrade start --image
> goharbor.example.com/library/ceph/ceph:v15.2.9
> it still tries to upgrade osd.355, with the same message as above.
> 
> Looking at the source code it looks like it get daemons to upgrade from
> mgr cache, so I restarted both mgr but still it tries to upgrade osd.355.
> 
> 
> Does anyone know how I can get the upgrade to continue?
> 
> -- 
> Kai Stian Olstad
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs crashing after server reboot.

2021-03-11 Thread Igor Fedotov

Hi Cassiano,

the backtrace you've provided relates to the bug fixed by: 
https://github.com/ceph/ceph/pull/37793


This fix is going to be releases with the upcoming v14.2.17.


But I doubt that your original crashes have the same root cause - this 
issue appears during shutdown only.


Anyway you can work around it by using different allocator: avl or bitmap.


Thanks,

Igor


On 3/11/2021 6:21 PM, Cassiano Pilipavicius wrote:

Hi, please if someone know how to help, I have an HDD pool in mycluster and
after rebooting one server,  my osds has started to crash.

This pool is a backup pool and have OSD as failure domain with an size of 2.

After rebooting one server, My osds started to crash, and the thing is only
getting worse. I have then tried to run ceph-bluestore-tool repair and I
receive what I think is the same error that shows on the osd logs:

[root@cwvh13 ~]# ceph-bluestore-tool repair --path
/var/lib/ceph/osd/ceph-81 --log-level 10
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/Allocator.cc:
In function 'virtual Allocator::SocketHook::~SocketHook()' thread
7f6467ffcec0 time 2021-03-11 12:13:12.121766
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/Allocator.cc:
53: FAILED ceph_assert(r == 0)
  ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
(stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x7f645e1a7b27]
  2: (()+0x25ccef) [0x7f645e1a7cef]
  3: (()+0x3cd57f) [0x5642e85c457f]
  4: (HybridAllocator::~HybridAllocator()+0x17) [0x5642e85f3f37]
  5: (BlueStore::_close_alloc()+0x42) [0x5642e84379d2]
  6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x5642e84bbac8]
  7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x293) [0x5642e84bbf13]
  8: (main()+0x13cc) [0x5642e83caaec]
  9: (__libc_start_main()+0xf5) [0x7f645ae24555]
  10: (()+0x1fae9f) [0x5642e83f1e9f]
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph osd Reweight command in octopus

2021-03-11 Thread Brent Kennedy
We have a ceph octopus cluster running 15.2.6, its indicating a near full
osd which I can see is not weighted equally with the rest of the osds.  I
tried to do the usual "ceph osd reweight osd.0 0.95" to force it down a
little bit, but unlike the nautilus clusters, I see no data movement when
issuing the command.  If I run a ceph osd tree, it shows the reweight
setting, but no data movement appears to be occurring.  

 

Is there some new thing in ocotopus I am missing?  I looked through the
release notes for .7, .8 and .9 and didn't see any fixes that jumped out as
resolving a bug related to this.  The Octopus cluster was deployed using
ceph-ansible and upgraded to 15.2.6.  I plan to upgrade to 15.2.9 in the
coming month.

 

Any thoughts?

 

Regards,

-Brent

 

Existing Clusters:

Test: Ocotpus 15.2.5 ( all virtual on nvme )

US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4
gateways, 2 iscsi gateways

UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4
gateways, 2 iscsi gateways

US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4 gateways,
2 iscsi gateways

UK Production(SSD): Octopus 15.2.6 with 5 osd servers, 3 mons, 4 gateways

 

 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Dave Hall
Istvan,

I agree that there is always risk with failure-domain < node, especially
with EC pools.  We are accepting this risk to lower the financial barrier
to entry.

In our minds, we have good power protection and new hardware, so the
greatest immediate risks for our smaller cluster (approaching 6 OSD nodes
and 48 HDDs) are NVMe write exhaustion and HDD failures.   Since we have
multiple OSDs sharing a single NVMe device it occurs to me that we might
want to get Ceph to 'map' against that.  In a way, NVMe devices are our
'nodes' at the current size of our cluster.

-Dave

--
Dave Hall
Binghamton University

On Wed, Mar 10, 2021 at 10:41 PM Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> wrote:

> Don't forget if you have server failure you might loose many objects. If
> the failure domain is osd, it means let's say you have 12 drives in each
> server, 8+2 EC in an unlucky situation can be located in 1 server also.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> -Original Message-
> From: Dave Hall 
> Sent: Wednesday, March 10, 2021 11:42 PM
> To: ceph-users 
> Subject: [ceph-users] Failure Domain = NVMe?
>
> Email received from outside the company. If in doubt don't click links nor
> open attachments!
> 
>
> Hello,
>
> In some documentation I was reading last night about laying out OSDs, it
> was suggested that if more that one OSD uses the same NVMe drive, the
> failure-domain should probably be set to node. However, for a small cluster
> the inclination is to use EC-pools and failure-domain = OSD.
>
> I was wondering if there is a middle ground - could we define
> failure-domain = NVMe?  I think the map would need to be defined manually
> in the same way that failure-domain = rack requires information about which
> nodes are in each rack.
>
> Example:  My latest OSD nodes have 8 HDDs and 3 U.2 NVMe.  I'd set up the
> WAL/DB for with HDDs per OSD  (wasted space on the 3rd NVMe).
> Across all my OSD nodes I will have 8 HDDs and either 2 or 3 NVMe
> devices per node - 15 total NVMe devices.   My preferred EC-pool profile
> is 8+2.  It seems that this profile could be safely dispersed across 15
> failure domains, resulting in protection against NVMe failure.
>
> Please let me know if this is worth pursuing.
>
> Thanks.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> 607-760-2328 (Cell)
> 607-777-4641 (Office)
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm (curl master)/15.2.9:: how to add orchestration

2021-03-11 Thread Sebastian Wagner
Hi Adrian,



Am 11.03.21 um 13:55 schrieb Adrian Sevcenco:
> Hi! After an initial bumpy bootstrapping (IMHO the defaults should be
> whatever is already defined in .ssh of the user and custom values setup
> with cli arguments) now i'm stuck adding any service/hosts/osds because
> apparently i lack orchestration

did you call bootstrap with --skip-ssh? Might explain this.

Don't know how you ended up with your url, but the correct one is:


> https://docs.ceph.com/en/octopus/mgr/orchestrator/

$ ceph mgr module enable cephadm
$ ceph orch set backend cephadm
$ ceph cephadm set-ssh-config -i ...
$ cephadm set-priv-key -i ...
$ cephadm set-pub-key -i ...

Then

> https://docs.ceph.com/en/octopus/cephadm/install/#add-hosts-to-the-cluster


should get you going again.

I'd recommend to avoid calling --skip-ssh to avoid this roundtrip.
Setting the ssh configs via


> https://docs.ceph.com/en/latest/man/8/cephadm/#bootstrap

works better typically.


> .. the the documentation show a big
> "Page does not exist"
> see
> https://docs.ceph.com/en/latest/docs/octopus/mgr/orchestrator
> 
> so, what is it and what options do i have?
> to set up it seems that is as easy as:
> ceph orch set backend
> 
> I just started with ceph and i just want to start a ceph service (i
> cannot call it a cluster) on my desktop (with 2 dedicated osds) to get
> familiar also with usage.
> 
> Thanks a lot!
> Adrian
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs crashing after server reboot.

2021-03-11 Thread Cassiano Pilipavicius
Hi, really this error was only showing up when I've tried to run
ceph-bluestore-tool repair, In my 3 OSDs that keeps crashing, it show the
following log... please let me know if there is something I can do to get
the pool back to a functioning state.

Uptime(secs): 0.0 total, 0.0 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 21.51 MB/s write, 0.00 GB read, 0.00
MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00
MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0
level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for
pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0
memtable_compaction, 0 memtable_slowdown, interval 0 total count

** File Read Latency Histogram By Level [default] **

   -32> 2021-03-11 14:25:55.812 7f50161b7700  5 prioritycache tune_memory
target: 4294967296 mapped: 445022208 unmapped: 59588608 heap: 504610816 old
mem: 134217728 new mem: 2564495564
   -31> 2021-03-11 14:25:55.813 7f50161b7700  5 prioritycache tune_memory
target: 4294967296 mapped: 445210624 unmapped: 59400192 heap: 504610816 old
mem: 2564495564 new mem: 2816296009
   -30> 2021-03-11 14:25:55.813 7f50161b7700  5
bluestore.MempoolThread(0x558aa9a8ea98) _trim_shards cache_size: 2816296009
kv_alloc: 956301312 kv_used: 6321600 meta_alloc: 956301312 meta_used: 11680
data_alloc: 620756992 data_used: 151552
   -29> 2021-03-11 14:25:55.838 7f502962ba80  0 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/cls/cephfs/cls_cephfs.cc:197:
loading cephfs
   -28> 2021-03-11 14:25:55.839 7f502962ba80  0 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/cls/hello/cls_hello.cc:296:
loading cls_hello
   -27> 2021-03-11 14:25:55.840 7f502962ba80  0 _get_class not permitted to
load kvs
   -26> 2021-03-11 14:25:55.840 7f502962ba80  0 _get_class not permitted to
load lua
   -25> 2021-03-11 14:25:55.852 7f502962ba80  0 _get_class not permitted to
load sdk
   -24> 2021-03-11 14:25:55.853 7f502962ba80  0 osd.80 697960 crush map has
features 283675107524608, adjusting msgr requires for clients
   -23> 2021-03-11 14:25:55.853 7f502962ba80  0 osd.80 697960 crush map has
features 283675107524608 was 8705, adjusting msgr requires for mons
   -22> 2021-03-11 14:25:55.853 7f502962ba80  0 osd.80 697960 crush map has
features 3026702624700514304, adjusting msgr requires for osds
   -21> 2021-03-11 14:25:56.814 7f50161b7700  5 prioritycache tune_memory
target: 4294967296 mapped: 501989376 unmapped: 21192704 heap: 523182080 old
mem: 2816296009 new mem: 2842012351
   -20> 2021-03-11 14:25:57.816 7f50161b7700  5 prioritycache tune_memory
target: 4294967296 mapped: 542867456 unmapped: 19210240 heap: 562077696 old
mem: 2842012351 new mem: 2844985645
   -19> 2021-03-11 14:25:58.327 7f502962ba80  0 osd.80 697960 load_pgs
   -18> 2021-03-11 14:25:58.818 7f50161b7700  5 prioritycache tune_memory
target: 4294967296 mapped: 596754432 unmapped: 229376 heap: 596983808 old
mem: 2844985645 new mem: 2845356061
   -17> 2021-03-11 14:25:59.818 7f50161b7700  5 prioritycache tune_memory
target: 4294967296 mapped: 679067648 unmapped: 753664 heap: 679821312 old
mem: 2845356061 new mem: 2845406382
   -16> 2021-03-11 14:26:00.820 7f50161b7700  5 prioritycache tune_memory
target: 4294967296 mapped: 729161728 unmapped: 991232 heap: 730152960 old
mem: 2845406382 new mem: 2845414228
   -15> 2021-03-11 14:26:00.820 7f50161b7700  5
bluestore.MempoolThread(0x558aa9a8ea98) _trim_shards cache_size: 2845414228
kv_alloc: 1174405120 kv_used: 166707616 meta_alloc: 1006632960 meta_used:
54703 data_alloc: 637534208 data_used: 745472
   -14> 2021-03-11 14:26:01.735 7f502962ba80  0 osd.80 697960 load_pgs
opened 63 pgs
   -13> 2021-03-11 14:26:01.735 7f502962ba80  0 osd.80 697960 using
weightedpriority op queue with priority op cut off at 64.
   -12> 2021-03-11 14:26:01.736 7f502962ba80 -1 osd.80 697960
log_to_monitors {default=true}
   -11> 2021-03-11 14:26:01.743 7f502962ba80 -1 osd.80 697960
mon_cmd_maybe_osd_create fail: 'osd.80 has already bound to class 'backup',
can not reset class to 'hdd'; use 'ceph osd crush rm-device-class ' to
remove old class first': (16) Device or resource busy
   -10> 2021-03-11 14:26:01.746 7f502962ba80  0 osd.80 697960 done with
init, starting boot process
-9> 2021-03-11 14:26:01.748 7f5013740700  4 mgrc handle_mgr_map Got map
version 175738
-8> 2021-03-11 14:26:01.748 7f5013740700  4 mgrc handle_mgr_map Active
mgr is now [v2:10.69.57.2:6802/3699587,v1:10.69.57.2

[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Steven Pine
One potential issue is maintenance after a nvme failure. Depending on how
the hardware is configured, you will need to bring the whole node down to
replace the failed nvme, which could cause PG to become read only if you
are close to your min threshold. I think the additional risk is not worth
it, but if you move ahead anyway you should also either not use EC, or have
a much higher EC ratio, 8-3 or 8-4, or since you have only 48 HDDs, a lower
count, 6-3, might be better.

On Thu, Mar 11, 2021 at 8:03 AM Dave Hall  wrote:

> Istvan,
>
> I agree that there is always risk with failure-domain < node, especially
> with EC pools.  We are accepting this risk to lower the financial barrier
> to entry.
>
> In our minds, we have good power protection and new hardware, so the
> greatest immediate risks for our smaller cluster (approaching 6 OSD nodes
> and 48 HDDs) are NVMe write exhaustion and HDD failures.   Since we have
> multiple OSDs sharing a single NVMe device it occurs to me that we might
> want to get Ceph to 'map' against that.  In a way, NVMe devices are our
> 'nodes' at the current size of our cluster.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
>
> On Wed, Mar 10, 2021 at 10:41 PM Szabo, Istvan (Agoda) <
> istvan.sz...@agoda.com> wrote:
>
> > Don't forget if you have server failure you might loose many objects. If
> > the failure domain is osd, it means let's say you have 12 drives in each
> > server, 8+2 EC in an unlucky situation can be located in 1 server also.
> >
> > Istvan Szabo
> > Senior Infrastructure Engineer
> > ---
> > Agoda Services Co., Ltd.
> > e: istvan.sz...@agoda.com
> > ---
> >
> > -Original Message-
> > From: Dave Hall 
> > Sent: Wednesday, March 10, 2021 11:42 PM
> > To: ceph-users 
> > Subject: [ceph-users] Failure Domain = NVMe?
> >
> > Email received from outside the company. If in doubt don't click links
> nor
> > open attachments!
> > 
> >
> > Hello,
> >
> > In some documentation I was reading last night about laying out OSDs, it
> > was suggested that if more that one OSD uses the same NVMe drive, the
> > failure-domain should probably be set to node. However, for a small
> cluster
> > the inclination is to use EC-pools and failure-domain = OSD.
> >
> > I was wondering if there is a middle ground - could we define
> > failure-domain = NVMe?  I think the map would need to be defined manually
> > in the same way that failure-domain = rack requires information about
> which
> > nodes are in each rack.
> >
> > Example:  My latest OSD nodes have 8 HDDs and 3 U.2 NVMe.  I'd set up the
> > WAL/DB for with HDDs per OSD  (wasted space on the 3rd NVMe).
> > Across all my OSD nodes I will have 8 HDDs and either 2 or 3 NVMe
> > devices per node - 15 total NVMe devices.   My preferred EC-pool profile
> > is 8+2.  It seems that this profile could be safely dispersed across 15
> > failure domains, resulting in protection against NVMe failure.
> >
> > Please let me know if this is worth pursuing.
> >
> > Thanks.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdh...@binghamton.edu
> > 607-760-2328 (Cell)
> > 607-777-4641 (Office)
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
> >
> > 
> > This message is confidential and is for the sole use of the intended
> > recipient(s). It may also be privileged or otherwise protected by
> copyright
> > or other legal rules. If you have received it by mistake please let us
> know
> > by reply email and delete it from your system. It is prohibited to copy
> > this message or disclose its content to anyone. Any confidentiality or
> > privilege is not waived or lost by any mistaken delivery or unauthorized
> > disclosure of the message. All messages sent to and from Agoda may be
> > monitored to ensure compliance with company policies, to protect the
> > company's interests and to remove potential malware. Electronic messages
> > may be intercepted, amended, lost or deleted, or contain viruses.
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Steven Pine

*E * steven.p...@webair.com  |  *P * 516.938.4100 x
*Webair* | 501 Franklin Avenue Suite 200, Garden City NY, 11530
webair.com
[image: Facebook icon]   [image:
Twitter icon]  [image: Linkedin icon]

NOTICE: This electronic mail message and all attachments transmitted with
it are intended solely for the use of the addressee and may contain legally
privileged proprietary and confidential information. If the reader of this
message is not the intended recipient, or if you are an

[ceph-users] Re: Container deployment - Ceph-volume activation

2021-03-11 Thread 胡 玮文
Hi,

Assuming you are using cephadm? Checkout this 
https://docs.ceph.com/en/latest/cephadm/osd/#activate-existing-osds


ceph cephadm osd activate ...

在 2021年3月11日,23:01,Cloud Guy  写道:

Hello,



TL;DR

Looking for guidance on ceph-volume lvm activate --all as it would apply to
a containerized ceph deployment (Nautilus or Octopus).



Detail:

I’m planning to upgrade my Nautilus non-container cluster to Octopus
(eventually containerized).   There’s an expanded procedure that was tested
and working in our lab, however won’t go into the whole process.   My
question is around existing OSD hosts.



I have to re-platform the host OS, and one of the ways in the OSDs were
reactivated previously when this was done (non-containerized) was to
install ceph packages, deploy keys, config, etc.   then run ceph-volume lvm
activate --all to magically bring up all OSDs.



Looking for a similar approach except if the OSDs are containerized, and I
re-platform the host OS (Centos -> Ubuntu), how could I reactivate all OSDs
as containers and avoid rebuilding data on the OSDs?



Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-03-11 Thread Frank Schilder
Hi Chris,

I found the problem. "ceph-volume simple activate" modifies the OSD's meta data 
in an invalid way.

On a pre lvm-converted ceph-disk OSD I had in my cupboard:

[root@ceph-adm:ceph-20 ~]# mount /dev/sdq1 mnt
[root@ceph-adm:ceph-20 ~]# ls -l mnt
[...]
lrwxrwxrwx. 1 ceph ceph  58 Mar 15  2019 block -> 
/dev/disk/by-partuuid/a1e5ef7d-9bab-4911-abe5-9075b91d88a4
[..]
[root@ceph-adm:ceph-20 ~]# umount mnt

[root@ceph-adm:ceph-20 ~]# cat 
/etc/ceph/osd/59-9b88d6ec-87a4-4640-b80e-81d3d56fac15.json
{
"active": "ok",
"block": {
"path": "/dev/disk/by-partuuid/a1e5ef7d-9bab-4911-abe5-9075b91d88a4",
"uuid": "a1e5ef7d-9bab-4911-abe5-9075b91d88a4"
},
"block_uuid": "a1e5ef7d-9bab-4911-abe5-9075b91d88a4",
"bluefs": 1,
"ceph_fsid": "e4ece518-f2cb-4708-b00f-b6bf511e91d9",
"cluster_name": "ceph",
"data": {
"path": "/dev/sdq1",
"uuid": "9b88d6ec-87a4-4640-b80e-81d3d56fac15"
},
"fsid": "9b88d6ec-87a4-4640-b80e-81d3d56fac15",
"keyring": "AQBP4opcBeCYOxAA4sOpTthNE6T28WUf4Bgm3w==",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"none": "",
"ready": "ready",
"require_osd_release": "",
"type": "bluestore",
"whoami": 59
}

Now, "ceph-volume simple activate" modifies the symlink "block" to point to an 
unstable path:

[root@ceph-adm:ceph-20 ~]# ceph-volume simple activate --file 
"/etc/ceph/osd/59-9b88d6ec-87a4-4640-b80e-81d3d56fac15.json" --no-systemd
Running command: /usr/bin/mount -v /dev/sdq1 /var/lib/ceph/osd/ceph-59
 stdout: mount: /dev/sdq1 mounted on /var/lib/ceph/osd/ceph-59.
Running command: /usr/bin/ln -snf /dev/sdq2 /var/lib/ceph/osd/ceph-59/block
Running command: /usr/bin/chown -R ceph:ceph /dev/sdq2
--> Skipping enabling of `simple` systemd unit
--> Skipping masking of ceph-disk systemd units
--> Skipping enabling and starting OSD simple systemd unit because --no-systemd 
was used
--> Successfully activated OSD 59 with FSID 9b88d6ec-87a4-4640-b80e-81d3d56fac15

Its the command "/usr/bin/ln -snf /dev/sdq2 /var/lib/ceph/osd/ceph-59/block" 
that destroys the integrity of the OSD. If you reboot the machine and the 
devices get different names, the next execution of "ceph-volume simple scan" 
will produce a corrupted meta data file. This will also happen if you move a 
converted OSD to another host and try to scan+start it.

The change of the symbolic link to an unstable device path is a critical bug 
and I don't even understand why it happens in the first place. There is no 
point and the only valid link target would be 
"/dev/disk/by-partuuid/a1e5ef7d-9bab-4911-abe5-9075b91d88a4" any ways.

I can work aroud that by resetting the link to its correct value after 
activation. However, this should really be fixed.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Chris Dunlop 
Sent: 03 March 2021 05:06:09
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] OSD id 241 != my id 248: conversion from "ceph-disk" 
to "ceph-volume simple" destroys OSDs

Hi Frank,

On Tue, Mar 02, 2021 at 02:58:05PM +, Frank Schilder wrote:
> Hi all,
>
> this is a follow-up on "reboot breaks OSDs converted from ceph-disk to 
> ceph-volume simple".
>
> I converted a number of ceph-disk OSDs to ceph-volume using "simple scan" and 
> "simple activate". Somewhere along the way, the OSDs meta-data gets rigged 
> and the prominent symptom is that the symlink block is changes from a 
> part-uuid target to an unstable device name target like:
>
> before conversion:
>
> block -> /dev/disk/by-partuuid/9123be91-7620-495a-a9b7-cc85b1de24b7
>
> after conversion:
>
> block -> /dev/sdj2
>
> This is a huge problem as the "after conversion" device names are unstable. I 
> have now a cluster that I cannot reboot servers on due to this problem. OSDs 
> randomly re-assigned devices will refuse to start with:
>
> 2021-03-02 15:56:21.709 7fb7c2549b80 -1 OSD id 241 != my id 248
>
> Please help me with getting out of this mess.


These paths might be coming from /etc/ceph/osd/*.json files.

Have your tried editing the files to replace /dev/sdXX path with the 
by-partuuid path?

Cheers,

Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Christian Wuerdig
For EC 8+2 you can get away with 5 hosts by ensuring each host gets 2
shards similar to this:
https://ceph.io/planet/erasure-code-on-small-clusters/
If a host dies/goes down you can still recover all data (although at that
stage your cluster is no longer available for client io).
You shouldn't just consider failure but also maintenance scenarios which
will require a node to offline for some time. In particular a ceph upgrades
can take some time - especially if something goes wrong. You have no
breathing room left at that stage and your cluster will be dead until all
nodes are up again


On Fri, 12 Mar 2021 at 02:03, Dave Hall  wrote:

> Istvan,
>
> I agree that there is always risk with failure-domain < node, especially
> with EC pools.  We are accepting this risk to lower the financial barrier
> to entry.
>
> In our minds, we have good power protection and new hardware, so the
> greatest immediate risks for our smaller cluster (approaching 6 OSD nodes
> and 48 HDDs) are NVMe write exhaustion and HDD failures.   Since we have
> multiple OSDs sharing a single NVMe device it occurs to me that we might
> want to get Ceph to 'map' against that.  In a way, NVMe devices are our
> 'nodes' at the current size of our cluster.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
>
> On Wed, Mar 10, 2021 at 10:41 PM Szabo, Istvan (Agoda) <
> istvan.sz...@agoda.com> wrote:
>
> > Don't forget if you have server failure you might loose many objects. If
> > the failure domain is osd, it means let's say you have 12 drives in each
> > server, 8+2 EC in an unlucky situation can be located in 1 server also.
> >
> > Istvan Szabo
> > Senior Infrastructure Engineer
> > ---
> > Agoda Services Co., Ltd.
> > e: istvan.sz...@agoda.com
> > ---
> >
> > -Original Message-
> > From: Dave Hall 
> > Sent: Wednesday, March 10, 2021 11:42 PM
> > To: ceph-users 
> > Subject: [ceph-users] Failure Domain = NVMe?
> >
> > Email received from outside the company. If in doubt don't click links
> nor
> > open attachments!
> > 
> >
> > Hello,
> >
> > In some documentation I was reading last night about laying out OSDs, it
> > was suggested that if more that one OSD uses the same NVMe drive, the
> > failure-domain should probably be set to node. However, for a small
> cluster
> > the inclination is to use EC-pools and failure-domain = OSD.
> >
> > I was wondering if there is a middle ground - could we define
> > failure-domain = NVMe?  I think the map would need to be defined manually
> > in the same way that failure-domain = rack requires information about
> which
> > nodes are in each rack.
> >
> > Example:  My latest OSD nodes have 8 HDDs and 3 U.2 NVMe.  I'd set up the
> > WAL/DB for with HDDs per OSD  (wasted space on the 3rd NVMe).
> > Across all my OSD nodes I will have 8 HDDs and either 2 or 3 NVMe
> > devices per node - 15 total NVMe devices.   My preferred EC-pool profile
> > is 8+2.  It seems that this profile could be safely dispersed across 15
> > failure domains, resulting in protection against NVMe failure.
> >
> > Please let me know if this is worth pursuing.
> >
> > Thanks.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdh...@binghamton.edu
> > 607-760-2328 (Cell)
> > 607-777-4641 (Office)
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
> >
> > 
> > This message is confidential and is for the sole use of the intended
> > recipient(s). It may also be privileged or otherwise protected by
> copyright
> > or other legal rules. If you have received it by mistake please let us
> know
> > by reply email and delete it from your system. It is prohibited to copy
> > this message or disclose its content to anyone. Any confidentiality or
> > privilege is not waived or lost by any mistaken delivery or unauthorized
> > disclosure of the message. All messages sent to and from Agoda may be
> > monitored to ensure compliance with company policies, to protect the
> > company's interests and to remove potential malware. Electronic messages
> > may be intercepted, amended, lost or deleted, or contain viruses.
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Alertmanager not using custom configuration template

2021-03-11 Thread Marc 'risson' Schmitt
Quick follow-up on this,

On Thu, 11 Mar 2021 14:58:41 +0100
Marc 'risson' Schmitt  wrote:
> > Indeed. I just merged https://github.com/ceph/ceph/pull/39932
> > which fixes the names of those config keys.

Cephadm is supposed to include some default Prometheus configuration
for alerting[1], if this configuration is present in the container. It
gets the path to this configuration from
`mgr/cephadm/prometheus_alerts_path`, which is by default
`/etc/prometheus/ceph/ceph_default_alerts.yml` and is indeed a valid
Prometheus rules file inside the ceph-mgr container. However it doesn't
end up in `/var/lib/ceph//prometheus/etc/prometheus/alerting/` as
they should.

Is there anything in particular I have to configure for this to work?

[1]
https://github.com/ceph/ceph/blob/v15.2.8/src/pybind/mgr/cephadm/services/monitoring.py#L232

Thanks in advance,

-- 
Marc 'risson' Schmitt
CRI - EPITA
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Sebastian Wagner
yes

Am 11.03.21 um 15:46 schrieb Kai Stian Olstad:
> Hi Sebastian
> 
> On 11.03.2021 13:13, Sebastian Wagner wrote:
>> looks like
>>
>> $ ssh pech-hd-009
>> # cephadm ls
>>
>> is returning this non-existent OSDs.
>>
>> can you verify that `cephadm ls` on that host doesn't
>> print osd.355 ?
> 
> "cephadm ls" on the node does list this drive
> 
> {
>     "style": "cephadm:v1",
>     "name": "osd.355",
>     "fsid": "3614abcc-201c-11eb-995a-2794bcc75ae0",
>     "systemd_unit": "ceph-3614abcc-201c-11eb-995a-2794bcc75ae0@osd.355",
>     "enabled": true,
>     "state": "stopped",
>     "container_id": null,
>     "container_image_name":
> "goharbor.example.com/library/ceph/ceph:v15.2.5",
>     "container_image_id": null,
>     "version": null,
>     "started": null,
>     "created": "2021-01-20T09:53:22.229080",
>     "deployed": "2021-02-09T09:24:02.855576",
>     "configured": "2021-02-09T09:24:04.211587"
> }
> 
> 
> To resolve it, could I just remove it with "cephadm rm-daemon"?
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Steven Pine
Setting domain failure on a per node basis will prevent data loss in the
case of an nvme failure, you would need multiple nvme failures across
different hosts. If data loss is the primary concern then again, you will
want a higher EC ratio, 6:3 or 6:4 but with only 6 osds, then 4:2 or even
3:3, or skip EC altogether and use 3x replication, that is likely the
safest and best tested use case. You can also take backups of your ceph
cluster and send it elsewhere, a tool like backy2 can do this with somewhat
minimal setup.

but if you have some magical insistence on using the setup you had already
determined prior to asking the mailing list then go ahead, and good luck.

On Thu, Mar 11, 2021 at 1:56 PM Dave Hall  wrote:

> Hello,
>
> While I appreciate and acknowledge the concerns regarding host failure and
> maintenance shutdowns, our main concern at this time is data loss.  Our use
> case at this time allows for suspension of client I/0 and/or for full
> cluster shutdown for maintenance, but loss of data would be catastrophic.
> It seems that with my current configuration an NVMe failure could cause
> data loss unless the shards are organized to survive this.
>
> So my question is not whether this is prudent, but actually whether this is
> possible, and if anybody could point to hints on how to implement it.
>
> Thanks.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> 607-760-2328 (Cell)
> 607-777-4641 (Office)
>
>
> On Thu, Mar 11, 2021 at 1:28 PM Christian Wuerdig <
> christian.wuer...@gmail.com> wrote:
>
> > For EC 8+2 you can get away with 5 hosts by ensuring each host gets 2
> > shards similar to this:
> > https://ceph.io/planet/erasure-code-on-small-clusters/
> > If a host dies/goes down you can still recover all data (although at that
> > stage your cluster is no longer available for client io).
> > You shouldn't just consider failure but also maintenance scenarios which
> > will require a node to offline for some time. In particular a ceph
> upgrades
> > can take some time - especially if something goes wrong. You have no
> > breathing room left at that stage and your cluster will be dead until all
> > nodes are up again
> >
> >
> > On Fri, 12 Mar 2021 at 02:03, Dave Hall  wrote:
> >
> >> Istvan,
> >>
> >> I agree that there is always risk with failure-domain < node, especially
> >> with EC pools.  We are accepting this risk to lower the financial
> barrier
> >> to entry.
> >>
> >> In our minds, we have good power protection and new hardware, so the
> >> greatest immediate risks for our smaller cluster (approaching 6 OSD
> nodes
> >> and 48 HDDs) are NVMe write exhaustion and HDD failures.   Since we have
> >> multiple OSDs sharing a single NVMe device it occurs to me that we might
> >> want to get Ceph to 'map' against that.  In a way, NVMe devices are our
> >> 'nodes' at the current size of our cluster.
> >>
> >> -Dave
> >>
> >> --
> >> Dave Hall
> >> Binghamton University
> >>
> >> On Wed, Mar 10, 2021 at 10:41 PM Szabo, Istvan (Agoda) <
> >> istvan.sz...@agoda.com> wrote:
> >>
> >> > Don't forget if you have server failure you might loose many objects.
> If
> >> > the failure domain is osd, it means let's say you have 12 drives in
> each
> >> > server, 8+2 EC in an unlucky situation can be located in 1 server
> also.
> >> >
> >> > Istvan Szabo
> >> > Senior Infrastructure Engineer
> >> > ---
> >> > Agoda Services Co., Ltd.
> >> > e: istvan.sz...@agoda.com
> >> > ---
> >> >
> >> > -Original Message-
> >> > From: Dave Hall 
> >> > Sent: Wednesday, March 10, 2021 11:42 PM
> >> > To: ceph-users 
> >> > Subject: [ceph-users] Failure Domain = NVMe?
> >> >
> >> > Email received from outside the company. If in doubt don't click links
> >> nor
> >> > open attachments!
> >> > 
> >> >
> >> > Hello,
> >> >
> >> > In some documentation I was reading last night about laying out OSDs,
> it
> >> > was suggested that if more that one OSD uses the same NVMe drive, the
> >> > failure-domain should probably be set to node. However, for a small
> >> cluster
> >> > the inclination is to use EC-pools and failure-domain = OSD.
> >> >
> >> > I was wondering if there is a middle ground - could we define
> >> > failure-domain = NVMe?  I think the map would need to be defined
> >> manually
> >> > in the same way that failure-domain = rack requires information about
> >> which
> >> > nodes are in each rack.
> >> >
> >> > Example:  My latest OSD nodes have 8 HDDs and 3 U.2 NVMe.  I'd set up
> >> the
> >> > WAL/DB for with HDDs per OSD  (wasted space on the 3rd NVMe).
> >> > Across all my OSD nodes I will have 8 HDDs and either 2 or 3 NVMe
> >> > devices per node - 15 total NVMe devices.   My preferred EC-pool
> profile
> >> > is 8+2.  It seems that this profile could be safely dispersed across
> 15
> >> > failure domains, resulting in protection against NVMe failu

[ceph-users] Re: mon db growing. over 500Gb

2021-03-11 Thread ricardo.re.azevedo
HI Andreas,

That's good to know. I managed to fix the problem! Here is my journey in
case it helps anyone:

My system drives are only 512GB so I added spare 1Tb drives to each server
and moved the mon db to the new drive. I set noout, nobackfill and norecover
and enabled only the ceph mon and osd services (disabled mgr and mds in case
they were throwing the log messages). I then let it sit. In the first hour
the db expanded:

mon.a: 1GB ->80GB
mon.b: 500GB ->550GB
mon.c:  500GB ->500GB

then after another hour mon.a increased to 100GB but mon.c dropped to 50GB.
After another hour mon.a and mon.c were down to ~10Gb. By the next morning
the final mon was also ~10Gb and the cluster was happy again. Thank you
ceph!

It would be great to know what caused this initial inflation but my take
away is to keep the mon db on a drive separate the OS in case of db
overinflation (and the 10GB min hardware requirements should have an
asterisk if this is a common issue). I think part of my issue was that
inflation started interfering with OS functions, exacerbating things.

Thanks all for your help. Definitely helped me sort things out.

Best,
Ricardo

-Original Message-
From: Andreas John  
Sent: Thursday, March 11, 2021 2:32 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: mon db growing. over 500Gb

Hello,

I also observed excessively growing mon DB in case of recovery. Luckily we
were able to solve it by exdending the mon db disk.

Without having the chance to re-check: The options nobackfill and norecover
might cause that behavior.It feelds like mon holds data that cannot be
flushed to an OSD.


rgds,

j.



On 11.03.21 10:47, Marc wrote:
> From what I have read here in the past, growing monitor db is related 
> to not having pg's in  'clean active' state
>
>
>> -Original Message-
>> From: ricardo.re.azev...@gmail.com 
>> Sent: 11 March 2021 00:59
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] mon db growing. over 500Gb
>>
>> Hi all,
>>
>>
>>
>> I have a fairly pressing issue. I had a monitor fall out of quorum 
>> because it ran out of disk space during rebalancing from switching to 
>> upmap. I noticed all my monitor store.db started taking up nearly all 
>> disk space so I set noout, nobackfill and norecover and shutdown all 
>> the monitor daemons.
>> Each store.db was at:
>>
>>
>>
>> mon.a 89GB (the one that firt dropped out)
>>
>> mon.a 400GB
>>
>> mon.c 400GB
>>
>>
>> I tried setting mon_compact_on_start. This brought  mon.a down to 1GB.
>> Cool.
>> However, when I try it on the other monitors it increased the db size 
>> ~1Gb/10s so I shut them down again.
>>
>> Any idea what is going on? Or how can I shrik back down the db?
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
>> email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
>
--
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email
to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [External Email] Re: Re: Failure Domain = NVMe?

2021-03-11 Thread Marc
> In my current hardware configurations each NVMe supports multiple OSDs.
> In
> my earlier nodes it is 8 OSDs sharing one NVMe (which is also too
> small).
> In the near term I will add NVMe to those nodes, but I'll still have 5
> OSDs
> some OSDs, and 2 or 3 on all the others.  So an NVMe failure will take
> out
> at least 2 OSDs.  Becasue of this it seems potentially worthwhile to go
> through the trouble of defining failure domain = nvme to assure maximum
> resilience.
> 

Do you have some test results of the increase of write performance using the 
osd's with the nvme? Just wondering what can be expected.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Can FS snapshots cause factor 3 performance loss?

2021-03-11 Thread Frank Schilder
Hi all,

we are observing a dramatic performance drop on our ceph file system and are 
wondering if this could be related to ceph fs snapshots. We are taking rotating 
snapshots in 2 directories and have 11 snapshots in each (ls below) as of 
today. We observe the performance drop with an rsync process that writes to 
ceph fs to another folder *without* snapshots. The performance reduction is a 
factor of 3 or even higher.

Could this possibly be caused by the snapshots being present? Has anyone else 
seen something like this?

The reason we consider snapshots is that not much else changed on the cluster 
except that we started taking rolling snapshots on the 23rd of February. In 
addition, the kernel symbols ceph_update_snap_trace, rebuild_snap_realms and 
build_snap_context show up really high in a perf report. The performance 
reduction seems to be present since at least 3 days.

The ceph version is mimic 13.2.10. The kernel version of the rsync server is 
3.10.0-1127.10.1.el7.x86_64.

$ ls home/.snap
2021-02-23_183554+0100_weekly  2021-03-06_000611+0100_daily   
2021-03-09_000611+0100_daily
2021-03-01_000911+0100_weekly  2021-03-07_000611+0100_daily   
2021-03-10_000611+0100_daily
2021-03-04_000611+0100_daily   2021-03-08_000611+0100_daily   
2021-03-11_000611+0100_daily
2021-03-05_000611+0100_daily   2021-03-08_000911+0100_weekly

$ ls groups/.snap
2021-02-23_183554+0100_weekly  2021-03-06_000611+0100_daily   
2021-03-09_000611+0100_daily
2021-03-01_000912+0100_weekly  2021-03-07_000611+0100_daily   
2021-03-10_000612+0100_daily
2021-03-04_000611+0100_daily   2021-03-08_000611+0100_daily   
2021-03-11_000612+0100_daily
2021-03-05_000611+0100_daily   2021-03-08_000911+0100_weekly

Many thanks for any pointers and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [External Email] Re: Re: Failure Domain = NVMe?

2021-03-11 Thread Steven Pine
Setting the failure domain to host will accomplish nearly the same goal and
provide better results during maintenance, host reboots, and of course host
failures.

But otherwise you can try manually creating crush maps and map a domain
failure to nvme and the osds under it, but the additional work and room for
error and bugs this can cause is not recommended.

On Thu, Mar 11, 2021 at 3:27 PM Dave Hall  wrote:

> Steven,
>
> In my current hardware configurations each NVMe supports multiple OSDs.
> In my earlier nodes it is 8 OSDs sharing one NVMe (which is also too
> small).  In the near term I will add NVMe to those nodes, but I'll still
> have 5 OSDs some OSDs, and 2 or 3 on all the others.  So an NVMe failure
> will take out at least 2 OSDs.  Becasue of this it seems potentially
> worthwhile to go through the trouble of defining failure domain = nvme to
> assure maximum resilience.
>
> -Dvae
>
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> 607-760-2328 (Cell)
> 607-777-4641 (Office)
>
>
> On Thu, Mar 11, 2021 at 2:20 PM Steven Pine 
> wrote:
>
>> Setting domain failure on a per node basis will prevent data loss in the
>> case of an nvme failure, you would need multiple nvme failures across
>> different hosts. If data loss is the primary concern then again, you will
>> want a higher EC ratio, 6:3 or 6:4 but with only 6 osds, then 4:2 or even
>> 3:3, or skip EC altogether and use 3x replication, that is likely the
>> safest and best tested use case. You can also take backups of your ceph
>> cluster and send it elsewhere, a tool like backy2 can do this with somewhat
>> minimal setup.
>>
>> but if you have some magical insistence on using the setup you had
>> already determined prior to asking the mailing list then go ahead, and good
>> luck.
>>
>> On Thu, Mar 11, 2021 at 1:56 PM Dave Hall  wrote:
>>
>>> Hello,
>>>
>>> While I appreciate and acknowledge the concerns regarding host failure
>>> and
>>> maintenance shutdowns, our main concern at this time is data loss.  Our
>>> use
>>> case at this time allows for suspension of client I/0 and/or for full
>>> cluster shutdown for maintenance, but loss of data would be catastrophic.
>>> It seems that with my current configuration an NVMe failure could cause
>>> data loss unless the shards are organized to survive this.
>>>
>>> So my question is not whether this is prudent, but actually whether this
>>> is
>>> possible, and if anybody could point to hints on how to implement it.
>>>
>>> Thanks.
>>>
>>> -Dave
>>>
>>> --
>>> Dave Hall
>>> Binghamton University
>>> kdh...@binghamton.edu
>>> 607-760-2328 (Cell)
>>> 607-777-4641 (Office)
>>>
>>>
>>> On Thu, Mar 11, 2021 at 1:28 PM Christian Wuerdig <
>>> christian.wuer...@gmail.com> wrote:
>>>
>>> > For EC 8+2 you can get away with 5 hosts by ensuring each host gets 2
>>> > shards similar to this:
>>> > https://ceph.io/planet/erasure-code-on-small-clusters/
>>> > If a host dies/goes down you can still recover all data (although at
>>> that
>>> > stage your cluster is no longer available for client io).
>>> > You shouldn't just consider failure but also maintenance scenarios
>>> which
>>> > will require a node to offline for some time. In particular a ceph
>>> upgrades
>>> > can take some time - especially if something goes wrong. You have no
>>> > breathing room left at that stage and your cluster will be dead until
>>> all
>>> > nodes are up again
>>> >
>>> >
>>> > On Fri, 12 Mar 2021 at 02:03, Dave Hall  wrote:
>>> >
>>> >> Istvan,
>>> >>
>>> >> I agree that there is always risk with failure-domain < node,
>>> especially
>>> >> with EC pools.  We are accepting this risk to lower the financial
>>> barrier
>>> >> to entry.
>>> >>
>>> >> In our minds, we have good power protection and new hardware, so the
>>> >> greatest immediate risks for our smaller cluster (approaching 6 OSD
>>> nodes
>>> >> and 48 HDDs) are NVMe write exhaustion and HDD failures.   Since we
>>> have
>>> >> multiple OSDs sharing a single NVMe device it occurs to me that we
>>> might
>>> >> want to get Ceph to 'map' against that.  In a way, NVMe devices are
>>> our
>>> >> 'nodes' at the current size of our cluster.
>>> >>
>>> >> -Dave
>>> >>
>>> >> --
>>> >> Dave Hall
>>> >> Binghamton University
>>> >>
>>> >> On Wed, Mar 10, 2021 at 10:41 PM Szabo, Istvan (Agoda) <
>>> >> istvan.sz...@agoda.com> wrote:
>>> >>
>>> >> > Don't forget if you have server failure you might loose many
>>> objects. If
>>> >> > the failure domain is osd, it means let's say you have 12 drives in
>>> each
>>> >> > server, 8+2 EC in an unlucky situation can be located in 1 server
>>> also.
>>> >> >
>>> >> > Istvan Szabo
>>> >> > Senior Infrastructure Engineer
>>> >> > ---
>>> >> > Agoda Services Co., Ltd.
>>> >> > e: istvan.sz...@agoda.com
>>> >> > ---
>>> >> >
>>> >> > -Original Message-
>>> >> > From: Dave Hall 
>>> >> > Sent: Wednesday, March 10, 2021 11

[ceph-users] v14.2.17 Nautilus released

2021-03-11 Thread David Galloway
Body:
We're happy to announce the 17th backport release in the Nautilus
series. We recommend users to update to this release. For a detailed
release notes with links & changelog please refer to the official blog
entry at https://ceph.io/releases/v14-2-17-nautilus-released

Notable Changes
---

* $pid expansion in config paths like `admin_socket` will now properly
expand to the daemon pid for commands like `ceph-mds` or `ceph-osd`.
Previously only `ceph-fuse`/`rbd-nbd` expanded `$pid` with the actual
daemon pid.
* RADOS: PG removal has been optimized in this release.
* RADOS: Memory allocations are tracked in finer detail in BlueStore and
displayed as a part of the ``dump_mempools`` command.
* cephfs: clients which acquire capabilities too quickly are throttled
to prevent instability.  See new config option
``mds_session_cap_acquisition_throttle`` to control this behavior.

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-14.2.17.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 2e95b5d99e0dec516803c8a1b57fbd2c8f45fd63
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-03-11 Thread Chris Dunlop

Hi Frank,

I agree there's a problem there. Howewever, to clarify: the json file 
already contains the /dev/sdq1 path (at data:path) and the "simple activate" 
is just reading the file. I.e. the problem lies with the json file creator, 
which was the "ceph-volume simple scan" step.


For fix your immediate issue I'd suggest fixing the existing json files to 
point data:path to the by-partuuid path. That should allow your "simple 
activate" to work with the stable paths.


In general it seems something in the "ceph-volume simple scan" is doing a 
"realpath()" or "readlink()" on the by-partuuid path to get to the /dev/sdq1 
path.  


Oh, it's likely this is the culprit:

src/ceph-volume/ceph_volume/devices/simple/scan.py
class Scan(object):
...
def scan_device(self, path):
...
if os.path.islink(path):
device = os.readlink(path)
else:
device = path

I'm not sure what the general fix might be - there may be good reason to 
prefer the symlink destination path, e.g. the next steps use the 'device' 
var to look for lvm info which may not work with the original symlink path.

I'll leave it up to the developers to work out a proper solution to this!

If you haven't already it's probably worth opening a ticket.

Cheers,

Chris

On Thu, Mar 11, 2021 at 02:19:57PM +, Frank Schilder wrote:

Hi Chris,

I found the problem. "ceph-volume simple activate" modifies the OSD's meta data 
in an invalid way.

On a pre lvm-converted ceph-disk OSD I had in my cupboard:

[root@ceph-adm:ceph-20 ~]# mount /dev/sdq1 mnt
[root@ceph-adm:ceph-20 ~]# ls -l mnt
[...]
lrwxrwxrwx. 1 ceph ceph  58 Mar 15  2019 block -> 
/dev/disk/by-partuuid/a1e5ef7d-9bab-4911-abe5-9075b91d88a4
[..]
[root@ceph-adm:ceph-20 ~]# umount mnt

[root@ceph-adm:ceph-20 ~]# cat 
/etc/ceph/osd/59-9b88d6ec-87a4-4640-b80e-81d3d56fac15.json
{
   "active": "ok",
   "block": {
   "path": "/dev/disk/by-partuuid/a1e5ef7d-9bab-4911-abe5-9075b91d88a4",
   "uuid": "a1e5ef7d-9bab-4911-abe5-9075b91d88a4"
   },
   "block_uuid": "a1e5ef7d-9bab-4911-abe5-9075b91d88a4",
   "bluefs": 1,
   "ceph_fsid": "e4ece518-f2cb-4708-b00f-b6bf511e91d9",
   "cluster_name": "ceph",
   "data": {
   "path": "/dev/sdq1",
   "uuid": "9b88d6ec-87a4-4640-b80e-81d3d56fac15"
   },
   "fsid": "9b88d6ec-87a4-4640-b80e-81d3d56fac15",
   "keyring": "AQBP4opcBeCYOxAA4sOpTthNE6T28WUf4Bgm3w==",
   "kv_backend": "rocksdb",
   "magic": "ceph osd volume v026",
   "mkfs_done": "yes",
   "none": "",
   "ready": "ready",
   "require_osd_release": "",
   "type": "bluestore",
   "whoami": 59
}

Now, "ceph-volume simple activate" modifies the symlink "block" to point to an 
unstable path:

[root@ceph-adm:ceph-20 ~]# ceph-volume simple activate --file 
"/etc/ceph/osd/59-9b88d6ec-87a4-4640-b80e-81d3d56fac15.json" --no-systemd
Running command: /usr/bin/mount -v /dev/sdq1 /var/lib/ceph/osd/ceph-59
stdout: mount: /dev/sdq1 mounted on /var/lib/ceph/osd/ceph-59.
Running command: /usr/bin/ln -snf /dev/sdq2 /var/lib/ceph/osd/ceph-59/block
Running command: /usr/bin/chown -R ceph:ceph /dev/sdq2
--> Skipping enabling of `simple` systemd unit
--> Skipping masking of ceph-disk systemd units
--> Skipping enabling and starting OSD simple systemd unit because --no-systemd 
was used
--> Successfully activated OSD 59 with FSID 9b88d6ec-87a4-4640-b80e-81d3d56fac15

Its the command "/usr/bin/ln -snf /dev/sdq2 /var/lib/ceph/osd/ceph-59/block" that 
destroys the integrity of the OSD. If you reboot the machine and the devices get different names, 
the next execution of "ceph-volume simple scan" will produce a corrupted meta data file. 
This will also happen if you move a converted OSD to another host and try to scan+start it.

The change of the symbolic link to an unstable device path is a critical bug and I don't 
even understand why it happens in the first place. There is no point and the only valid 
link target would be 
"/dev/disk/by-partuuid/a1e5ef7d-9bab-4911-abe5-9075b91d88a4" any ways.

I can work aroud that by resetting the link to its correct value after 
activation. However, this should really be fixed.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best way to add OSDs - whole node or one by one?

2021-03-11 Thread Reed Dier
I'm sure there is a "correct" way, but I think it mostly relates to how busy 
your cluster is, and how tolerant it is of the added load from the backfills.

My current modus operandi is to set the noin, noout, nobackfill, norecover, and 
norebalance flags first.
This makes sure that new OSDs don't come in, current OSDs don't go out, it 
doesn't start backfilling or try to rebalance (yet).

Add all of my OSDs.

Then unset noin and norebalance.
In all of the new OSDs.
Let it work out the new crush map so that data isn't constantly in motion 
moving back and forth as new OSD hosts are added.
Inject osd_max_backfills and osd_recovery_max_active to 1
Then unset norecover and nobackfill and noout.

Then it should slowly but surely chip away at recovery.
During times of lighter load I can ratchet up the max backfills and recovery 
max actives to a higher level to chug through more of it while iops aren't 
being burned.

I'm sure everyone has their own way, but I've been very comfortable with this 
approach over the last few years.

NOTE: you probably want to make sure that the balancer and the pg_autoscaler 
are set to off during this, otherwise they might throw backfills on the pile 
and you will feel like you'll never reach the bottom.

Reed

> On Mar 10, 2021, at 9:55 AM, Dave Hall  wrote:
> 
> Hello,
> 
> I am currently in the process of expanding my Nautilus cluster from 3 nodes 
> (combined OSD/MGR/MON/MDS) to 6 OSD nodes and 3 management nodes.  The old 
> and new OSD nodes all have 8 x 12TB HDDs plus NVMe.   The front and back 
> networks are 10GB.
> 
> Last Friday evening I injected a whole new OSD node, increasing the OSD HDDs 
> from 24 to 32.  As of this morning the cluster is still re-balancing - with 
> periodic warnings about degraded PGs and missed deep-scrub deadlines.   So 
> after 4.5 days my misplaced PGs are down from 33% to 2%.
> 
> My question:  For a cluster of this size, what is the best-practice procedure 
> for adding OSDs?  Should I use 'ceph-volume prepare' to layout the new OSDs, 
> but only add them a couple at a time, or should I continue adding whole nodes?
> 
> Maybe this has to do with a maximum percentage of misplaced PGs. The first 
> new node increased the OSD capacity by 33% and resulted in 33% PG 
> misplacement.  The next node will only result in 25% misplacement.  If a too 
> high percentage of misplaced PGs negatively impacts rebalancing or data 
> availability, what is a reasonable ceiling for this percentage?
> 
> Thanks.
> 
> -Dave
> 
> -- 
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> 607-760-2328 (Cell)
> 607-777-4641 (Office)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failure Domain = NVMe?

2021-03-11 Thread Dave Hall
Hello,

While I appreciate and acknowledge the concerns regarding host failure and
maintenance shutdowns, our main concern at this time is data loss.  Our use
case at this time allows for suspension of client I/0 and/or for full
cluster shutdown for maintenance, but loss of data would be catastrophic.
It seems that with my current configuration an NVMe failure could cause
data loss unless the shards are organized to survive this.

So my question is not whether this is prudent, but actually whether this is
possible, and if anybody could point to hints on how to implement it.

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
607-760-2328 (Cell)
607-777-4641 (Office)


On Thu, Mar 11, 2021 at 1:28 PM Christian Wuerdig <
christian.wuer...@gmail.com> wrote:

> For EC 8+2 you can get away with 5 hosts by ensuring each host gets 2
> shards similar to this:
> https://ceph.io/planet/erasure-code-on-small-clusters/
> If a host dies/goes down you can still recover all data (although at that
> stage your cluster is no longer available for client io).
> You shouldn't just consider failure but also maintenance scenarios which
> will require a node to offline for some time. In particular a ceph upgrades
> can take some time - especially if something goes wrong. You have no
> breathing room left at that stage and your cluster will be dead until all
> nodes are up again
>
>
> On Fri, 12 Mar 2021 at 02:03, Dave Hall  wrote:
>
>> Istvan,
>>
>> I agree that there is always risk with failure-domain < node, especially
>> with EC pools.  We are accepting this risk to lower the financial barrier
>> to entry.
>>
>> In our minds, we have good power protection and new hardware, so the
>> greatest immediate risks for our smaller cluster (approaching 6 OSD nodes
>> and 48 HDDs) are NVMe write exhaustion and HDD failures.   Since we have
>> multiple OSDs sharing a single NVMe device it occurs to me that we might
>> want to get Ceph to 'map' against that.  In a way, NVMe devices are our
>> 'nodes' at the current size of our cluster.
>>
>> -Dave
>>
>> --
>> Dave Hall
>> Binghamton University
>>
>> On Wed, Mar 10, 2021 at 10:41 PM Szabo, Istvan (Agoda) <
>> istvan.sz...@agoda.com> wrote:
>>
>> > Don't forget if you have server failure you might loose many objects. If
>> > the failure domain is osd, it means let's say you have 12 drives in each
>> > server, 8+2 EC in an unlucky situation can be located in 1 server also.
>> >
>> > Istvan Szabo
>> > Senior Infrastructure Engineer
>> > ---
>> > Agoda Services Co., Ltd.
>> > e: istvan.sz...@agoda.com
>> > ---
>> >
>> > -Original Message-
>> > From: Dave Hall 
>> > Sent: Wednesday, March 10, 2021 11:42 PM
>> > To: ceph-users 
>> > Subject: [ceph-users] Failure Domain = NVMe?
>> >
>> > Email received from outside the company. If in doubt don't click links
>> nor
>> > open attachments!
>> > 
>> >
>> > Hello,
>> >
>> > In some documentation I was reading last night about laying out OSDs, it
>> > was suggested that if more that one OSD uses the same NVMe drive, the
>> > failure-domain should probably be set to node. However, for a small
>> cluster
>> > the inclination is to use EC-pools and failure-domain = OSD.
>> >
>> > I was wondering if there is a middle ground - could we define
>> > failure-domain = NVMe?  I think the map would need to be defined
>> manually
>> > in the same way that failure-domain = rack requires information about
>> which
>> > nodes are in each rack.
>> >
>> > Example:  My latest OSD nodes have 8 HDDs and 3 U.2 NVMe.  I'd set up
>> the
>> > WAL/DB for with HDDs per OSD  (wasted space on the 3rd NVMe).
>> > Across all my OSD nodes I will have 8 HDDs and either 2 or 3 NVMe
>> > devices per node - 15 total NVMe devices.   My preferred EC-pool profile
>> > is 8+2.  It seems that this profile could be safely dispersed across 15
>> > failure domains, resulting in protection against NVMe failure.
>> >
>> > Please let me know if this is worth pursuing.
>> >
>> > Thanks.
>> >
>> > -Dave
>> >
>> > --
>> > Dave Hall
>> > Binghamton University
>> > kdh...@binghamton.edu
>> > 607-760-2328 (Cell)
>> > 607-777-4641 (Office)
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
>> > email to ceph-users-le...@ceph.io
>> >
>> > 
>> > This message is confidential and is for the sole use of the intended
>> > recipient(s). It may also be privileged or otherwise protected by
>> copyright
>> > or other legal rules. If you have received it by mistake please let us
>> know
>> > by reply email and delete it from your system. It is prohibited to copy
>> > this message or disclose its content to anyone. Any confidentiality or
>> > privilege is not waived or lost by any mistaken delivery or unauthorized
>> > disclosure of the message. All mess

[ceph-users] Re: Ceph server

2021-03-11 Thread Reed Dier
I'm going to echo what Stefan said.

I would ditch the 2x SATA drives to free up your slots.
Replace with an M.2 or SATADOM.

I would also recommend moving from the 2x X710-DA2 cards to 1x X710-DA4 card.
It can't saturate the x8 slot, and it frees up a PCIe slot for possibly another 
NVMe card or something else if you need it down the line.

The only other thing I would say to consider is making sure you know the 
endurance of the 4510 is enough for your workload long term.

Reed

> On Mar 10, 2021, at 1:12 PM, Stefan Kooman  wrote:
> 
> On 3/10/21 5:43 PM, Ignazio Cassano wrote:
>> Hello, what do you think about of ceph cluster made up of 6 nodes each one
>> with the following configuration ?
>> A+ Server 1113S-WN10RT
>> Barebone
>> Supermicro A+ Server 1113S-WN10RT - 1U - 10x U.2 NVMe - 2x M.2 - Dual
>> 10-Gigabit LAN - 750W Redundant
>> Processor
>> AMD EPYC™ 7272 Processor 12-core 2.90GHz 64MB Cache (120W)
>> Memory
>> 8 x 8GB PC4-25600 3200MHz DDR4 ECC RDIMM
> 
> ^^ I would double that amount of RAM, especially (see below) if you plan on 
> adding more NVMe drives.
> 
>> U.2/U.3 NVMe Drive
>> 5 x 8.0TB Intel® SSD DC P4510 Series U.2 PCIe 3.1 x4 NVMe Solid State Drive
>> Hard Drive
> 
> ^^ Why 5 * 8.0 TB instead of 10 * 4.0 TB? Are you planning on upgrading 
> later? Ceph likes more OSDs better than fewer larger ones. Recovery will be 
> faster as well, and the impact of one NVMe dying will be lower.
> 
>> 2 x 240GB Intel® SSD D3-S4610 Series 2.5" SATA 6.0Gb/s Solid State Drive
> 
> ^^ Do you sacrifce two NVMe ports for two SATA OS disks? If so, I would 
> advise for getting (redundant, optional) U.2 NVMe or SATADOM or similar.
> 
>> Network Card
>> 2 x Intel® 10-Gigabit Ethernet Converged Network Adapter X710-DA2 (2x SFP+)
>> Server Management
> 
> ^ Why two? One for "public" and one for "cluster"? Than most probably you 
> won't need that, and one bond would suffice (see current Ceph best 
> practices). If you need 40 Gb/s in one LACP trunk: perfectly fine.
> 
> Gr. Stefan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [External Email] Re: Re: Failure Domain = NVMe?

2021-03-11 Thread Dave Hall
Steven,

In my current hardware configurations each NVMe supports multiple OSDs.  In
my earlier nodes it is 8 OSDs sharing one NVMe (which is also too small).
In the near term I will add NVMe to those nodes, but I'll still have 5 OSDs
some OSDs, and 2 or 3 on all the others.  So an NVMe failure will take out
at least 2 OSDs.  Becasue of this it seems potentially worthwhile to go
through the trouble of defining failure domain = nvme to assure maximum
resilience.

-Dvae

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
607-760-2328 (Cell)
607-777-4641 (Office)


On Thu, Mar 11, 2021 at 2:20 PM Steven Pine  wrote:

> Setting domain failure on a per node basis will prevent data loss in the
> case of an nvme failure, you would need multiple nvme failures across
> different hosts. If data loss is the primary concern then again, you will
> want a higher EC ratio, 6:3 or 6:4 but with only 6 osds, then 4:2 or even
> 3:3, or skip EC altogether and use 3x replication, that is likely the
> safest and best tested use case. You can also take backups of your ceph
> cluster and send it elsewhere, a tool like backy2 can do this with somewhat
> minimal setup.
>
> but if you have some magical insistence on using the setup you had already
> determined prior to asking the mailing list then go ahead, and good luck.
>
> On Thu, Mar 11, 2021 at 1:56 PM Dave Hall  wrote:
>
>> Hello,
>>
>> While I appreciate and acknowledge the concerns regarding host failure and
>> maintenance shutdowns, our main concern at this time is data loss.  Our
>> use
>> case at this time allows for suspension of client I/0 and/or for full
>> cluster shutdown for maintenance, but loss of data would be catastrophic.
>> It seems that with my current configuration an NVMe failure could cause
>> data loss unless the shards are organized to survive this.
>>
>> So my question is not whether this is prudent, but actually whether this
>> is
>> possible, and if anybody could point to hints on how to implement it.
>>
>> Thanks.
>>
>> -Dave
>>
>> --
>> Dave Hall
>> Binghamton University
>> kdh...@binghamton.edu
>> 607-760-2328 (Cell)
>> 607-777-4641 (Office)
>>
>>
>> On Thu, Mar 11, 2021 at 1:28 PM Christian Wuerdig <
>> christian.wuer...@gmail.com> wrote:
>>
>> > For EC 8+2 you can get away with 5 hosts by ensuring each host gets 2
>> > shards similar to this:
>> > https://ceph.io/planet/erasure-code-on-small-clusters/
>> > If a host dies/goes down you can still recover all data (although at
>> that
>> > stage your cluster is no longer available for client io).
>> > You shouldn't just consider failure but also maintenance scenarios which
>> > will require a node to offline for some time. In particular a ceph
>> upgrades
>> > can take some time - especially if something goes wrong. You have no
>> > breathing room left at that stage and your cluster will be dead until
>> all
>> > nodes are up again
>> >
>> >
>> > On Fri, 12 Mar 2021 at 02:03, Dave Hall  wrote:
>> >
>> >> Istvan,
>> >>
>> >> I agree that there is always risk with failure-domain < node,
>> especially
>> >> with EC pools.  We are accepting this risk to lower the financial
>> barrier
>> >> to entry.
>> >>
>> >> In our minds, we have good power protection and new hardware, so the
>> >> greatest immediate risks for our smaller cluster (approaching 6 OSD
>> nodes
>> >> and 48 HDDs) are NVMe write exhaustion and HDD failures.   Since we
>> have
>> >> multiple OSDs sharing a single NVMe device it occurs to me that we
>> might
>> >> want to get Ceph to 'map' against that.  In a way, NVMe devices are our
>> >> 'nodes' at the current size of our cluster.
>> >>
>> >> -Dave
>> >>
>> >> --
>> >> Dave Hall
>> >> Binghamton University
>> >>
>> >> On Wed, Mar 10, 2021 at 10:41 PM Szabo, Istvan (Agoda) <
>> >> istvan.sz...@agoda.com> wrote:
>> >>
>> >> > Don't forget if you have server failure you might loose many
>> objects. If
>> >> > the failure domain is osd, it means let's say you have 12 drives in
>> each
>> >> > server, 8+2 EC in an unlucky situation can be located in 1 server
>> also.
>> >> >
>> >> > Istvan Szabo
>> >> > Senior Infrastructure Engineer
>> >> > ---
>> >> > Agoda Services Co., Ltd.
>> >> > e: istvan.sz...@agoda.com
>> >> > ---
>> >> >
>> >> > -Original Message-
>> >> > From: Dave Hall 
>> >> > Sent: Wednesday, March 10, 2021 11:42 PM
>> >> > To: ceph-users 
>> >> > Subject: [ceph-users] Failure Domain = NVMe?
>> >> >
>> >> > Email received from outside the company. If in doubt don't click
>> links
>> >> nor
>> >> > open attachments!
>> >> > 
>> >> >
>> >> > Hello,
>> >> >
>> >> > In some documentation I was reading last night about laying out
>> OSDs, it
>> >> > was suggested that if more that one OSD uses the same NVMe drive, the
>> >> > failure-domain should probably be set to node. However, for a small
>> >> cluster
>> >> > the inclination is to use EC-p

[ceph-users] Unhealthy Cluster | Remove / Purge duplicate osds | Fix daemon

2021-03-11 Thread Oliver Weinmann

Hi,

On my 3 node Octopus 15.2.5 test cluster, that I haven't used for quite 
a while, I noticed that it shows some errors:


[root@gedasvl02 ~]# ceph health detail
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config 
/var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config

INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
HEALTH_WARN 2 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon osd.0 on gedaopl02 is in error state
    daemon node-exporter.gedaopl01 on gedaopl01 is in error state

The error about the osd.0 is strange since osd.0 is actually up and 
running but on a different node. I guess I missed to correctly remove it 
from node gedaopl02 and then added a new osd to a different node 
gedaopl01 and now there are duplicate osd ids for osd.0 and osd.2.


[root@gedasvl02 ~]# ceph orch ps
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config 
/var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config

INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
NAME HOST   STATUS    REFRESHED AGE  
VERSION    IMAGE NAME    IMAGE ID  CONTAINER ID
alertmanager.gedasvl02   gedasvl02  running (6h)  7m ago 4M   
0.20.0 docker.io/prom/alertmanager:v0.20.0 0881eb8f169f  5b80fb977a5f
crash.gedaopl01  gedaopl01  stopped   7m ago 4M   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  810cf432b6d6
crash.gedaopl02  gedaopl02  running (5h)  7m ago 4M   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  34ab264fd5ed
crash.gedaopl03  gedaopl03  running (2d)  7m ago 2d   
15.2.9 docker.io/ceph/ceph:v15 dfc483079636  233f30086d2d
crash.gedasvl02  gedasvl02  running (6h)  7m ago 4M   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  ea3d3e7c4f58
grafana.gedasvl02    gedasvl02  running (6h)  7m ago 4M   
6.6.2  docker.io/ceph/ceph-grafana:6.6.2 a0dce381714a  5a94f3e41c32
mds.cephfs.gedaopl01.zjuhem  gedaopl01  stopped   7m ago 3M   
  docker.io/ceph/ceph:v15  
mds.cephfs.gedasvl02.xsjtpi  gedasvl02  running (6h)  7m ago 3M   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  26e7c8759d89
mgr.gedaopl03.zilwbl gedaopl03  running (7h)  7m ago 7h   
15.2.9 docker.io/ceph/ceph:v15 dfc483079636  e18b6f40871c
mon.gedaopl03    gedaopl03  running (7h)  7m ago 7h   
15.2.9 docker.io/ceph/ceph:v15 dfc483079636  5afdf40e41ba
mon.gedasvl02    gedasvl02  running (6h)  7m ago 4M   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  e83dfcd864aa
node-exporter.gedaopl01  gedaopl01  error 7m ago 4M   
0.18.1 docker.io/prom/node-exporter:v0.18.1 e5a616e4b9cf  0fefcfcc9639
node-exporter.gedaopl02  gedaopl02  running (5h)  7m ago 4M   
0.18.1 docker.io/prom/node-exporter:v0.18.1 e5a616e4b9cf  f459045b7e41
node-exporter.gedaopl03  gedaopl03  running (2d)  7m ago 2d   
0.18.1 docker.io/prom/node-exporter:v0.18.1 e5a616e4b9cf  3bd9f8dd6d5b
node-exporter.gedasvl02  gedasvl02  running (6h)  7m ago 4M   
0.18.1 docker.io/prom/node-exporter:v0.18.1 e5a616e4b9cf  72e96963261e
*osd.0    gedaopl01  running (5h)  7m ago 5h   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  ed76fafb1988**
**osd.0    gedaopl02  error 7m ago 4M   
 docker.io/ceph/ceph:v15    *
osd.1    gedaopl01  running (4h)  7m ago 3d   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  41a43733e601
*osd.2    gedaopl01  stopped   7m ago 4M   
 docker.io/ceph/ceph:v15    **
**osd.2    gedaopl03  running (7h)  7m ago 7h   
15.2.9 docker.io/ceph/ceph:v15 dfc483079636  ac9e660db2fb*
osd.3    gedaopl03  running (7h)  7m ago 7h   
15.2.9 docker.io/ceph/ceph:v15 dfc483079636  bde17b5bb2fb
osd.4    gedaopl02  running (5h)  7m ago 3d   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  7cc3ef7c4469
osd.5    gedaopl02  running (5h)  7m ago 3d   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  761b96d235e4
osd.6    gedaopl02  running (5h)  7m ago 3d   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  d047b28fe2bd
osd.7    gedaopl03  running (7h)  7m ago 7h   
15.2.9 docker.io/ceph/ceph:v15 dfc483079636  3b54b01841f4
osd.8    gedaopl01  running (5h)  7m ago 5h   
15.2.5 docker.io/ceph/ceph:v15 4405f6339e35  cdd308cdc82b
prometheus.gedasvl02 gedasvl02  running (5h)  7m ago 4M   
2.18.1 docker.io/prom/prometheus:v2.18.1 de242295e225  591cef3bbaa4


Is there a way to clean / purge the stopped and error ones?

I don't know what is wrong with the node-exporter. Because looking at 
podman ps -a on gedaopl01 looks ok. Maybe also a 

[ceph-users] Question about delayed write IOs, octopus, mixed storage

2021-03-11 Thread Philip Brown
I'm running some tests with mixed storage units, and octopus.
8 nodes, each with 2 SSDs, and 8 HDDs .
the SSDsare relatively small: around 100GB each.

Im mapping 8 rbds, striping them together, and running fio on them for  testing.

# fio --filename=/./fio.testfile --size=120GB --rw=randrw --bs=8k 
--direct=1 --ioengine=libaio  --iodepth=64 --numjobs=4 --time_based 
--group_reporting --name=readwritelatency-test-job --runtime=120 --eta-newline=1


Trouble is, I'm seeing sporadic delays of IOs.

When I test ZFS, for example, it has this neat wait clumping status check:
zpool iostat -w 20

and it shows me that some write io's are taking over 4 secomds to complete. 
Many are taking 1s or 2s


This kind of thing has sort of happened before(but previously, I think I was 
using SSDs exclusively). When I emailed the list, people suggested turning off 
RBD cache, which worked great in that situation.

This time, I have already done that (I believe), but still see this behavior.

Would folks have any further suggestions to smooth performance out?
The odd thing is, I read that bluestore is supposed to smooth things out and 
provide consistent response time, but that doesnt seem to be the case.



Sample output from the zpool iostat below:

twelve   total_wait disk_waitsyncq_waitasyncq_wait
latency  read  write   read  write   read  write   read  write  scrub   trim
--  -  -  -  -  -  -  -  -  -  -
 .. snip ...

1ms 1.01K  0  1.00K  0  0  0  0  0  0  0
2ms29  0 29 18  0  0  0  1  0  0
4ms23  3 23 14  0  0  0  3  0  0
8ms64  6 64  9  0  0  0  7  0  0
16ms   74 10 74 59  0  0  0 11  0  0
33ms   24 17 24154  0  0  0 19  0  0
67ms7 25  7100  0  0  0 26  0  0
134ms   3 40  3 36  0  0  0 36  0  0
268ms   1 59  1 18  0  0  0 59  0  0
536ms   0116  0  3  0  0  0113  0  0
1s  0109  0  0  0  0  0 98  0  0
2s  0 24  0  0  0  0  0 20  0  0
4s  0  2  0  0  0  0  0  1  0  0
8s  0  0  0  0  0  0  0  0  0  0
17s 0  0  0  0  0  0  0  0  0  0







--
Philip Brown| Sr. Linux System Administrator | Medata, Inc. 
5 Peters Canyon Rd Suite 250 
Irvine CA 92606 
Office 714.918.1310| Fax 714.918.1325 
pbr...@medata.com| www.medata.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to tell balancer to balance

2021-03-11 Thread Boris Behrens
Hi Joe,

I've tested to create a plan on my own, but I still get the same message (Error
EALREADY: Unable to find further optimization, or pool(s) pg_num is
decreasing, or distribution is already perfect).
What I've also tried to reweight 0.8 the three most filled OSDs, which
worked good. After all misplaced objects were resolved I reweighted them
back to 1, and now I am where I started.

The point with the wasted disks is something I can not follow. If there are
two types of disks, ceph balances pretty good between these disks.
 -658.89636-  59 TiB  41 TiB  40 TiB 2.4 GiB 104 GiB  18
TiB 69.20 0.98   -host s3db5
  0   hdd   3.73630  1.0 3.7 TiB 2.2 TiB 2.1 TiB  70 MiB 6.3 GiB 1.5
TiB 60.06 0.85  46 up osd.0
 25   hdd   3.73630  1.0 3.7 TiB 2.2 TiB 2.1 TiB 4.1 MiB 6.1 GiB 1.5
TiB 58.94 0.84  39 up osd.25
 26   hdd   3.73630  1.0 3.7 TiB 2.7 TiB 2.6 TiB 179 MiB 7.3 GiB 1.0
TiB 72.32 1.03  47 up osd.26
 27   hdd   3.73630  1.0 3.7 TiB 2.1 TiB 2.0 TiB 6.8 MiB 6.1 GiB 1.6
TiB 56.17 0.80  47 up osd.27
 28   hdd  14.65039  1.0  15 TiB  11 TiB  11 TiB 935 MiB  28 GiB 3.9
TiB 73.68 1.05 197 up osd.28
 29   hdd  14.65039  1.0  15 TiB 9.9 TiB 9.8 TiB 515 MiB  24 GiB 4.7
TiB 67.83 0.96 188 up osd.29
 30   hdd  14.65039  1.0  15 TiB  11 TiB  11 TiB 774 MiB  26 GiB 3.9
TiB 73.56 1.04 196 up osd.30
 -758.89636-  59 TiB  43 TiB  42 TiB  15 GiB 120 GiB  16
TiB 72.73 1.03   -host s3db6
 32   hdd   3.73630  1.0 3.7 TiB 2.7 TiB 2.6 TiB  22 MiB 7.8 GiB 1.0
TiB 72.18 1.02  60 up osd.32
 33   hdd   3.73630  1.0 3.7 TiB 3.1 TiB 3.0 TiB 381 MiB 8.1 GiB 670
GiB 82.50 1.17  57 up osd.33
 34   hdd   3.73630  1.0 3.7 TiB 3.1 TiB 3.0 TiB 444 MiB 8.5 GiB 604
GiB 84.21 1.20  60 up osd.34
 35   hdd   3.73630  1.0 3.7 TiB 3.2 TiB 3.1 TiB 296 MiB 8.6 GiB 513
GiB 86.59 1.23  53 up osd.35
 36   hdd  14.65039  1.0  15 TiB 9.9 TiB 9.8 TiB 541 MiB  24 GiB 4.8
TiB 67.55 0.96 209 up osd.36
 37   hdd  14.65039  1.0  15 TiB  10 TiB  10 TiB  12 GiB  36 GiB 4.5
TiB 69.22 0.98 191 up osd.37
 38   hdd  14.65039  1.0  15 TiB  11 TiB  11 TiB 1.1 GiB  27 GiB 4.0
TiB 72.62 1.03 209 up osd.38

[root@s3db1 ~]# ceph balancer optimize boristest
Error EINVAL: Balancer enabled, disable to optimize manually
[root@s3db1 ~]# ceph balancer off
[root@s3db1 ~]# ceph balancer optimize bbtest
Error EALREADY: Unable to find further optimization, or pool(s) pg_num is
decreasing, or distribution is already perfect
[root@s3db1 ~]# ceph balancer on



Am Do., 11. März 2021 um 21:59 Uhr schrieb Joe Comeau :

> I read your email
>
> maybe DO NOT use the balancer
> pools should be of different disks
>
> then you could balance within the pools
>
> disks should all be the same size
> the 4TB will fill up - the 8TB will only go to 50% (or 4 TB) - so in
> effect wasting 4TB of the 8 TB disk
>
> our cluster & our pool
> All our disks no matter what are 8 TB in size.
>
>
>
>
>
> >>> Boris Behrens  3/11/2021 5:53 AM >>>
> Hi,
> I know this topic seems to be handled a lot (as far as I can see), but I
> reached the end of my google_foo.
>
> * We have OSDs that are near full, but there are also OSDs that are only
> loaded with 50%.
> * We have 4,8,16 TB rotating disks in the cluster.
> * The disks that get packed are 4TB disks and very empty disks are also 4TB
> * The OSD nodes are all around the same total disk space (51 - 59)
> * The balancer tells me that it can not find further optimization, or that
> pg_num is decreasin.
>
> How can I debug further before the cluster goes into a bad state?
>
> [root@s3db1 ~]# ceph osd df tree | sort -nk 17 | head -n 30
> ID  CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
>   %USE  VAR  PGS STATUS TYPE NAME
> MIN/MAX VAR: 0.75/1.23  STDDEV: 6.96
>TOTAL 673 TiB 474 TiB 452 TiB 100 GiB 1.2 TiB 199
> TiB 70.39
> -358.49872-  58 TiB  39 TiB  36 TiB 8.2 GiB  85 GiB  19
> TiB 67.40 0.96   -host s3db2
> -458.49872-  58 TiB  40 TiB  35 TiB  35 GiB  81 GiB  19
> TiB 67.91 0.96   -host s3db3
> -1150.94173-  51 TiB  35 TiB  35 TiB 3.5 GiB  94 GiB  16
> TiB 68.00 0.97   -host s3db10
> -1051.28331-  51 TiB  35 TiB  35 TiB 4.6 GiB  93 GiB  16
> TiB 69.16 0.98   -host s3db9
> -658.89636-  59 TiB  41 TiB  40 TiB 2.4 GiB 102 GiB  18
> TiB 69.15 0.98   -host s3db5
> -1250.99052-  51 TiB  36 TiB  36 TiB 1.8 GiB  93 GiB  15
> TiB 69.99 0.99   -host s3db11
> -258.20561-  58 TiB  41 TiB  37 TiB 9.6 GiB  96 GiB  17
> TiB 70.00 0.99   -host s3db1
> -1   673.44452- 673 TiB 474 TiB 452 TiB 100 GiB 1.2 TiB 199
> TiB 70.39 1.00   -root default

[ceph-users] Re: Ceph server

2021-03-11 Thread Ignazio Cassano
Many thanks
Ignazio

Il Ven 12 Mar 2021, 00:04 Reed Dier  ha scritto:

> I'm going to echo what Stefan said.
>
> I would ditch the 2x SATA drives to free up your slots.
> Replace with an M.2 or SATADOM.
>
> I would also recommend moving from the 2x X710-DA2 cards to 1x X710-DA4
> card.
> It can't saturate the x8 slot, and it frees up a PCIe slot for possibly
> another NVMe card or something else if you need it down the line.
>
> The only other thing I would say to consider is making sure you know the
> endurance of the 4510 is enough for your workload long term.
>
> Reed
>
> > On Mar 10, 2021, at 1:12 PM, Stefan Kooman  wrote:
> >
> > On 3/10/21 5:43 PM, Ignazio Cassano wrote:
> >> Hello, what do you think about of ceph cluster made up of 6 nodes each
> one
> >> with the following configuration ?
> >> A+ Server 1113S-WN10RT
> >> Barebone
> >> Supermicro A+ Server 1113S-WN10RT - 1U - 10x U.2 NVMe - 2x M.2 - Dual
> >> 10-Gigabit LAN - 750W Redundant
> >> Processor
> >> AMD EPYC™ 7272 Processor 12-core 2.90GHz 64MB Cache (120W)
> >> Memory
> >> 8 x 8GB PC4-25600 3200MHz DDR4 ECC RDIMM
> >
> > ^^ I would double that amount of RAM, especially (see below) if you plan
> on adding more NVMe drives.
> >
> >> U.2/U.3 NVMe Drive
> >> 5 x 8.0TB Intel® SSD DC P4510 Series U.2 PCIe 3.1 x4 NVMe Solid State
> Drive
> >> Hard Drive
> >
> > ^^ Why 5 * 8.0 TB instead of 10 * 4.0 TB? Are you planning on upgrading
> later? Ceph likes more OSDs better than fewer larger ones. Recovery will be
> faster as well, and the impact of one NVMe dying will be lower.
> >
> >> 2 x 240GB Intel® SSD D3-S4610 Series 2.5" SATA 6.0Gb/s Solid State Drive
> >
> > ^^ Do you sacrifce two NVMe ports for two SATA OS disks? If so, I would
> advise for getting (redundant, optional) U.2 NVMe or SATADOM or similar.
> >
> >> Network Card
> >> 2 x Intel® 10-Gigabit Ethernet Converged Network Adapter X710-DA2 (2x
> SFP+)
> >> Server Management
> >
> > ^ Why two? One for "public" and one for "cluster"? Than most probably
> you won't need that, and one bond would suffice (see current Ceph best
> practices). If you need 40 Gb/s in one LACP trunk: perfectly fine.
> >
> > Gr. Stefan
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ERROR: S3 error: 403 (SignatureDoesNotMatch)

2021-03-11 Thread Szabo, Istvan (Agoda)
Hi,

I'm struggling with my old cluster cnamed address.
The s3 and curl commands are working properly with the not cnamed address, but 
with the cnamed one, I got this in the ciwetweb log:

2021-03-12 10:24:18.812329 7f6b0c527700  1 == starting new request 
req=0x7f6b0c520f90 =
2021-03-12 10:24:18.812387 7f6b0c527700  2 req 10:0.58::HEAD 
/::initializing for trans_id = txa-00604adee2-8e4fc3-default
2021-03-12 10:24:18.812412 7f6b0c527700 10 rgw api priority: s3=5 s3website=4
2021-03-12 10:24:18.812417 7f6b0c527700 10 host=cnamedhostname
2021-03-12 10:24:18.812484 7f6b0c527700 10 handler=25RGWHandler_REST_Bucket_S3
2021-03-12 10:24:18.812490 7f6b0c527700  2 req 10:0.000163:s3:HEAD /::getting 
op 3
2021-03-12 10:24:18.812499 7f6b0c527700 10 op=25RGWStatBucket_ObjStore_S3
2021-03-12 10:24:18.812503 7f6b0c527700  2 req 10:0.000176:s3:HEAD 
/:stat_bucket:verifying requester
2021-03-12 10:24:18.812541 7f6b0c527700  2 req 10:0.000214:s3:HEAD 
/:stat_bucket:normalizing buckets and tenants
2021-03-12 10:24:18.812548 7f6b0c527700 10 s->object= s->bucket= 
cnamedhostname
2021-03-12 10:24:18.812556 7f6b0c527700  2 req 10:0.000229:s3:HEAD 
/:stat_bucket:init permissions
2021-03-12 10:24:18.812594 7f6b0c527700 10 cache get: 
name=default.rgw.meta+root+ cnamedhostname : type miss (requested=0x16, 
cached=0x0)
2021-03-12 10:24:18.813525 7f6b0c527700 10 cache put: 
name=default.rgw.meta+root+ cnamedhostname info.flags=0x0
2021-03-12 10:24:18.813554 7f6b0c527700 10 moving default.rgw.meta+root+ 
cnamedhostname to cache LRU end
2021-03-12 10:24:18.813664 7f6b0c527700 10 read_permissions on cnamedhostname 
[] ret=-2002
2021-03-12 10:24:18.813833 7f6b0c527700  2 req 10:0.001506:s3:HEAD 
/:stat_bucket:op status=0
2021-03-12 10:24:18.813848 7f6b0c527700  2 req 10:0.001520:s3:HEAD 
/:stat_bucket:http status=404
2021-03-12 10:24:18.813855 7f6b0c527700  1 == req done req=0x7f6b0c520f90 
op status=0 http_status=404 ==
2021-03-12 10:24:18.813962 7f6b0c527700  1 civetweb: 0x557d45468000: 
10.118.199.248 - - [12/Mar/2021:10:24:18 +0700] "HEAD / HTTP/1.1" 404 0 - 
curl/7.29.0

And I got this on the s3cmd verbose output:
DEBUG: s3cmd version 2.1.0
DEBUG: ConfigParser: Reading file '.s3cfg-testuser-http'
DEBUG: ConfigParser: access_key->29...17_chars...J
DEBUG: ConfigParser: secret_key->fK...37_chars...R
DEBUG: ConfigParser: host_base->cnamedhostname:80
DEBUG: ConfigParser: host_bucket->cnamedhostname:80/%(bucket)
DEBUG: ConfigParser: use_https->False
DEBUG: ConfigParser: signature_v2->True
DEBUG: Updating Config.Config cache_file ->
DEBUG: Updating Config.Config follow_symlinks -> False
DEBUG: Updating Config.Config verbosity -> 10
DEBUG: Unicodising 'ls' using UTF-8
DEBUG: Command: ls
DEBUG: CreateRequest: resource[uri]=/
DEBUG: Using signature v2
DEBUG: SignHeaders: u'GET\n\n\n\nx-amz-date:Fri, 12 Mar 2021 03:31:39 +\n/'
DEBUG: Processing request, please wait...
DEBUG: get_hostname(None): cnamedhostname
DEBUG: ConnMan.get(): creating new connection: http://cnamedhostname
DEBUG: non-proxied HTTPConnection(cnamedhostname, None)
DEBUG: format_uri(): /
DEBUG: Sending request method_string='GET', uri=u'/', headers={'Authorization': 
u'AWS 293WEU2ADWGIUO4RN39J:Q7kh7kzWXWSqMvUqqWwLOY6QKUE=', 'x-amz-date': 'Fri, 
12 Mar 2021 03:31:39 +'}, body=(0 bytes)
DEBUG: ConnMan.put(): connection put back to pool (http://cnamedhostname#1)
DEBUG: Response:
{'data': 'SignatureDoesNotMatchtxb-00604ae09b-8e4fbd-default8e4fbd-default-default',
'headers': {'accept-ranges': 'bytes',
 'content-length': '198',
 'content-type': 'application/xml',
 'date': 'Fri, 12 Mar 2021 03:31:39 GMT',
 'x-amz-request-id': 
'txb-00604ae09b-8e4fbd-default'},
'reason': 'Forbidden',
'status': 403}
DEBUG: S3Error: 403 (Forbidden)
DEBUG: HttpHeader: date: Fri, 12 Mar 2021 03:31:39 GMT
DEBUG: HttpHeader: content-length: 198
DEBUG: HttpHeader: x-amz-request-id: 
txb-00604ae09b-8e4fbd-default
DEBUG: HttpHeader: content-type: application/xml
DEBUG: HttpHeader: accept-ranges: bytes
DEBUG: ErrorXML: Code: 'SignatureDoesNotMatch'
DEBUG: ErrorXML: RequestId: 'txb-00604ae09b-8e4fbd-default'
DEBUG: ErrorXML: HostId: '8e4fbd-default-default'
ERROR: S3 error: 403 (SignatureDoesNotMatch)

Any idea?

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's