[ceph-users] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)

2021-06-07 Thread Jeremy Hansen
What’s the proper way to track down where this error is coming from?  Thanks.


6/7/21 12:40:00 AM
[WRN]
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)

6/7/21 12:40:00 AM
[WRN]
Health detail: HEALTH_WARN 1 failed cephadm daemon(s)










signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)

2021-06-07 Thread 赵贺东
Hello Jeremy Hansen,

try:
ceph log last cephadm

or see files below
/var/log/ceph/cephadm.log



> On Jun 7, 2021, at 15:49, Jeremy Hansen  wrote:
> 
> What’s the proper way to track down where this error is coming from?  Thanks.
> 
> 
> 6/7/21 12:40:00 AM
> [WRN]
> [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
> 
> 6/7/21 12:40:00 AM
> [WRN]
> Health detail: HEALTH_WARN 1 failed cephadm daemon(s)
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)

2021-06-07 Thread Jeremy Hansen
Thank you.  So I see this:

2021-06-07T08:41:24.133493+ mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 
1494 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
2021-06-07T08:44:37.650022+ mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 
1592 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
2021-06-07T08:47:07.039405+ mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 
1667 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
2021-06-07T08:51:00.094847+ mgr.cn01.ceph.la1.clx.corp.xnkoft (mgr.224161) 
1785 : cephadm [INF] Reconfiguring osd.29 (monmap changed)…

Yet…

ceph osd ls
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
16
17
18
20
22
23
24
26
27
31
33
34

So how would I approach fixing this?

> On Jun 7, 2021, at 1:10 AM, 赵贺东  wrote:
> 
> Hello Jeremy Hansen,
> 
> try:
> ceph log last cephadm
> 
> or see files below
> /var/log/ceph/cephadm.log
> 
> 
> 
>> On Jun 7, 2021, at 15:49, Jeremy Hansen  wrote:
>> 
>> What’s the proper way to track down where this error is coming from?  Thanks.
>> 
>> 
>> 6/7/21 12:40:00 AM
>> [WRN]
>> [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
>> 
>> 6/7/21 12:40:00 AM
>> [WRN]
>> Health detail: HEALTH_WARN 1 failed cephadm daemon(s)
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)

2021-06-07 Thread Jeremy Hansen
So I found the failed daemon:

[root@cn05 ~]# systemctl  | grep 29

● ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d@osd.29.service  
loaded failed failedCeph osd.29 
for bfa2ad58-c049-11eb-9098-3c8cf8ed728d

But I’ve already replaced this osd, so this is perhaps left over from a 
previous osd.29 on this host.  How would I go about removing this cleanly and 
more important, in a way that Ceph is aware of the change, therefore clearing 
the warning.

Thanks
-jeremy


> On Jun 7, 2021, at 1:54 AM, Jeremy Hansen  wrote:
> 
> Signed PGP part
> Thank you.  So I see this:
> 
> 2021-06-07T08:41:24.133493+ mgr.cn01.ceph.la1.clx.corp.xnkoft 
> (mgr.224161) 1494 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
> 2021-06-07T08:44:37.650022+ mgr.cn01.ceph.la1.clx.corp.xnkoft 
> (mgr.224161) 1592 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
> 2021-06-07T08:47:07.039405+ mgr.cn01.ceph.la1.clx.corp.xnkoft 
> (mgr.224161) 1667 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
> 2021-06-07T08:51:00.094847+ mgr.cn01.ceph.la1.clx.corp.xnkoft 
> (mgr.224161) 1785 : cephadm [INF] Reconfiguring osd.29 (monmap changed)…
> 
> Yet…
> 
> ceph osd ls
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> 11
> 12
> 13
> 14
> 16
> 17
> 18
> 20
> 22
> 23
> 24
> 26
> 27
> 31
> 33
> 34
> 
> So how would I approach fixing this?
> 
>> On Jun 7, 2021, at 1:10 AM, 赵贺东  wrote:
>> 
>> Hello Jeremy Hansen,
>> 
>> try:
>> ceph log last cephadm
>> 
>> or see files below
>> /var/log/ceph/cephadm.log
>> 
>> 
>> 
>>> On Jun 7, 2021, at 15:49, Jeremy Hansen  wrote:
>>> 
>>> What’s the proper way to track down where this error is coming from?  
>>> Thanks.
>>> 
>>> 
>>> 6/7/21 12:40:00 AM
>>> [WRN]
>>> [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
>>> 
>>> 6/7/21 12:40:00 AM
>>> [WRN]
>>> Health detail: HEALTH_WARN 1 failed cephadm daemon(s)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)

2021-06-07 Thread Jeremy Hansen
cephadm rm-daemon --name osd.29

on the node with the stale daemon did the trick.

-jeremy


> On Jun 7, 2021, at 2:24 AM, Jeremy Hansen  wrote:
> 
> Signed PGP part
> So I found the failed daemon:
> 
> [root@cn05 ~]# systemctl  | grep 29
> 
> ● ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d@osd.29.service
>   loaded failed failedCeph 
> osd.29 for bfa2ad58-c049-11eb-9098-3c8cf8ed728d
> 
> But I’ve already replaced this osd, so this is perhaps left over from a 
> previous osd.29 on this host.  How would I go about removing this cleanly and 
> more important, in a way that Ceph is aware of the change, therefore clearing 
> the warning.
> 
> Thanks
> -jeremy
> 
> 
>> On Jun 7, 2021, at 1:54 AM, Jeremy Hansen  wrote:
>> 
>> Signed PGP part
>> Thank you.  So I see this:
>> 
>> 2021-06-07T08:41:24.133493+ mgr.cn01.ceph.la1.clx.corp.xnkoft 
>> (mgr.224161) 1494 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
>> 2021-06-07T08:44:37.650022+ mgr.cn01.ceph.la1.clx.corp.xnkoft 
>> (mgr.224161) 1592 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
>> 2021-06-07T08:47:07.039405+ mgr.cn01.ceph.la1.clx.corp.xnkoft 
>> (mgr.224161) 1667 : cephadm [INF] Reconfiguring osd.29 (monmap changed)...
>> 2021-06-07T08:51:00.094847+ mgr.cn01.ceph.la1.clx.corp.xnkoft 
>> (mgr.224161) 1785 : cephadm [INF] Reconfiguring osd.29 (monmap changed)…
>> 
>> Yet…
>> 
>> ceph osd ls
>> 0
>> 1
>> 2
>> 3
>> 4
>> 5
>> 6
>> 7
>> 8
>> 9
>> 10
>> 11
>> 12
>> 13
>> 14
>> 16
>> 17
>> 18
>> 20
>> 22
>> 23
>> 24
>> 26
>> 27
>> 31
>> 33
>> 34
>> 
>> So how would I approach fixing this?
>> 
>>> On Jun 7, 2021, at 1:10 AM, 赵贺东  wrote:
>>> 
>>> Hello Jeremy Hansen,
>>> 
>>> try:
>>> ceph log last cephadm
>>> 
>>> or see files below
>>> /var/log/ceph/cephadm.log
>>> 
>>> 
>>> 
 On Jun 7, 2021, at 15:49, Jeremy Hansen  wrote:
 
 What’s the proper way to track down where this error is coming from?  
 Thanks.
 
 
 6/7/21 12:40:00 AM
 [WRN]
 [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
 
 6/7/21 12:40:00 AM
 [WRN]
 Health detail: HEALTH_WARN 1 failed cephadm daemon(s)
 
 
 
 
 
 
 
 
 
 
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
> 
> 



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian buster nautilus 14.2.21 missing?

2021-06-07 Thread Loïc Dachary
Hi,

It is not available indeed. The package files are present but this was 
unintentional: nautilus 14.2.21 is not available on download.ceph.com. You need 
to get it from https://backports.debian.org/changes/buster-backports.html once 
it is packaged (it currently is 14.2.20).

Cheers

On 07/06/2021 11:55, Szabo, Istvan (Agoda) wrote:
> Hi,
> 
> Am I doing something wrong or the 21 update is missing for buster?
> 
> Thank you
> 
> 
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-07 Thread Eneko Lacunza

Hi Marc,

El 4/6/21 a las 16:39, Marc escribió:

Do you use rbd images in containers that are residing on osd nodes? Does this 
give any problems? I used to have kernel mounted cephfs on a osd node, after a 
specific luminous release this was giving me problems.

No, we use Ceph for VM storage. Some of the VMs host containers.

Cheers




-Original Message-
From: Eneko Lacunza 
Sent: Friday, 4 June 2021 15:49
To: ceph-users@ceph.io
Subject: *SPAM* [ceph-users] Re: Why you might want packages
not containers for Ceph deployments

Hi,

We operate a few Ceph hyperconverged clusters with Proxmox, that
provides a custom ceph package repository. They do a great work; and
deployment is a brezee.

So, even as currently we would rely on Proxmox packages/distribution and
not upstream, we have a number of other projects deployed with
containers and we even distribute some of our own development in deb and
container packages, so I will comment on our view:

El 2/6/21 a las 23:26, Oliver Freyermuth escribió:
[...]

If I operate services in containers built by developers, of course
this ensures the setup works, and dependencies are well tested, and
even upgrades work well — but it also means that,
at the end of the day, if I run 50 services in 50 different containers
from 50 different upstreams, I'll have up to 50 different versions of
OpenSSL floating around my production servers.
If a security issue is found in any of the packages used in all the
container images, I now need to trust the security teams of all the 50
developer groups building these containers
(and most FOSS projects won't have the ressources, understandably...),
instead of the one security team of the disto I use. And then, I also
have to re-pull all these containers, after finding out that a
security fix has become available.
Or I need to build all these containers myself, and effectively take
over the complete job, and have my own security team.

This may scale somewhat well, if you have a team of 50 people, and
every person takes care of one service. Containers are often your
friend in this case[1],
since it allows to isolate the different responsibilities along with
the service.

But this is rarely the case outside of industry, and especially not in
academics.
So the approach we chose for us is to have one common OS everywhere,
and automate all of our deployment and configuration management with
Puppet.
Of course, that puts is in one of the many corners out there, but it
scales extremely well to all services we operate,
and I can still trust the distro maintainers to keep the base OS safe
on all our servers, automate reboots etc.

For Ceph, we've actually seen questions about security issues already
on the list[0] (never answered AFAICT).

These are the two main issues I find with containers really:

- Keeping hosts uptodate is more complex (apt-get update+apt-get
dist-upgrade and also some kind of docker pull+docker
restart/docker-compose up ...). Much of the time the second part is not
standard (just deployed a Harbor service, upgrade is quite simple but I
have to know how to do it as it's speciffic, maintenance would be much
easier if it was packaged in Debian). I won't say it's more difficult,
but it will be more diverse and complex.

- Container image quality and security support quality, that will vary
from upstream to upstream. You have to research each of them to know
were they stand. A distro (specially a good one like Debian, Ubuntu,
RHEL or SUSE) has known, quality security support for the repositories.
They will even fix issues not fixed by upstream (o backport them to
distro's version...). This is more an upstream vs distro issue, really.

About debugging issues reported with Ceph containers, I think those are
things waiting for a fix: why are logs writen in container image (or an
ephemeral volume, I don't know really how is that done right now)
instead of an external name volume o a local mapped dir in /var/log/ceph ?

All that said, I think that it makes sense for an upstream project like
Ceph, to distribute container images, as it is the most generic way to
distribute (you can deploy on any system/distro supporting container
images) and eases development. But only distributing container images
could make more users depend on third party distribution (global or
specific distros), which would delay feeback/bugreport to upstream.

Cheers and thanks for the great work!

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako B

[ceph-users] Re: Connect ceph to proxmox

2021-06-07 Thread Szabo, Istvan (Agoda)
Hi,

Struggling with the global id reclaim. If it is set to true the clients can 
connect, but if it set to false as in the documentation is written, it will 
disconnect the old clients.
The weird thing is that 1 client is already updated to the newest version, and 
when set to false, it disconnect that one too.

Is there any magic behind this? Played around with this but not successful :/
https://forum.proxmox.com/threads/ceph-nautilus-and-octopus-security-update-for-insecure-global_id-reclaim-cve-2021-20288.88038

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

From: Martin Verges 
Sent: Sunday, June 6, 2021 12:47 AM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users 
Subject: Re: [ceph-users] Connect ceph to proxmox

Hello Szabo,

you can try it with our docs at 
https://croit.io/docs/master/hypervisors/proxmox, maybe it helps you to connect 
your Ceph cluster to Proxmox.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


On Sat, 5 Jun 2021 at 04:20, Szabo, Istvan (Agoda) 
mailto:istvan.sz...@agoda.com>> wrote:
Hi,

Is there a way to connect from my nautilus ceph setup the pool that I created 
in ceph to proxmox? Or need a totally different ceph install?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com>
---


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] slow ops at restarting OSDs (octopus)

2021-06-07 Thread Manuel Lausch
Hello,

I implemented a new cluster with 48 Nodes á 24 OSDs.  
I have a replicated pool with 4 replica. The crushrule distributes the
replicas to different racks.

With this cluster I tested a upgrade from Nautilis (14.2.20) to Octopus
(15.2.13). The update itself worked well until I began the restarts of
the OSDs in the 4th rack. Since then I get slow ops while stopping
OSDs. I think something happend here, after all replica partners are
running on the new version. This issue remains after completing the
upgrade. 

With Nautilus I had similar issues with slow ops when stopping OSDs. I
could resolve this with the option „osd_fast_shutdown → false“. I let
this option set to false while upgrading. For testing/debugging, I set
this to true (default value) and got better results when stopping OSDs,
but the problem is not completely vanished.

Had someone else this problem and could fix it? What can I do to get
rid of slow ops when resarting OSDs?

All Servers are connected with 2x10G network links


Manuel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Connect ceph to proxmox

2021-06-07 Thread Szabo, Istvan (Agoda)
So the client is on 14.2.20 the cluster is on 14.2.21. Seems like the Debian 
buster repo is missing the 21 update?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Szabo, Istvan (Agoda)  
Sent: Monday, June 7, 2021 4:10 PM
To: Martin Verges ; Andrew Walker-Brown 

Cc: ceph-users 
Subject: [ceph-users] Re: Connect ceph to proxmox

Hi,

Struggling with the global id reclaim. If it is set to true the clients can 
connect, but if it set to false as in the documentation is written, it will 
disconnect the old clients.
The weird thing is that 1 client is already updated to the newest version, and 
when set to false, it disconnect that one too.

Is there any magic behind this? Played around with this but not successful :/
https://forum.proxmox.com/threads/ceph-nautilus-and-octopus-security-update-for-insecure-global_id-reclaim-cve-2021-20288.88038

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

From: Martin Verges 
Sent: Sunday, June 6, 2021 12:47 AM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users 
Subject: Re: [ceph-users] Connect ceph to proxmox

Hello Szabo,

you can try it with our docs at 
https://croit.io/docs/master/hypervisors/proxmox, maybe it helps you to connect 
your Ceph cluster to Proxmox.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 
231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


On Sat, 5 Jun 2021 at 04:20, Szabo, Istvan (Agoda) 
mailto:istvan.sz...@agoda.com>> wrote:
Hi,

Is there a way to connect from my nautilus ceph setup the pool that I created 
in ceph to proxmox? Or need a totally different ceph install?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com>
---


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Debian buster nautilus 14.2.21 missing?

2021-06-07 Thread Szabo, Istvan (Agoda)
Hi,

Am I doing something wrong or the 21 update is missing for buster?

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephfs objets without 'parent' xattr?

2021-06-07 Thread Burkhard Linke

Hi,


during an OS upgrade from Ubuntu 18.04 to 20.04 we seem to have 
triggered a bcache bug on three OSD hosts. These hosts are used with a 
6+2 EC pool used with CephFS, so a number of PGs are affected by the 
bug. We were able to restart two of the three hosts (and will run some 
extra scrubs on all PGs), but at least 7 PGs have unfound objects now. 
I'm currently trying to find out which files are affected to restore 
them from backup or inform the users about data corruption for files in 
dedicated scratch directories.



This works quite well for most of the files. The 'parent' xattr attached 
to the file's first chunks contains the complete path of the file within 
the filesystem, so locating the files should be easy with a little help 
from ceph-dencoder. But there are some files that do not have the 
'parent' xattr:



for pg in $(ceph health detail | grep active\+recovery_unfound | cut -d' 
' -f 6); do echo $pg; for obj in $(ceph pg $pg list_unfound | jq -r 
'.objects | .[] | .oid.oid'| cut -c1-11); do rados -p bcf_fs_data_rep 
getxattr $obj. parent > $obj.parent; done; done



gives

77.1
error getting xattr bcf_fs_data_rep/1002a143927./parent: (2) No 
such file or directory

.


'bcf_fs_data_rep' is the first data pool of the filesystem which should 
contain the xattr data. But for a number of objects (58 of 928) the 
above command is not able to retrieve the information.



Questions:

1. If the 'parent' xattr is not available in the first data pool (and 
neither in the meta data pool), what might be the state of these 
objects? Can they be deleted by using 'mark_unfound delete'?


2. the list_unfound command only prints 256 objects; how can this limit 
be lifted, since some pools have more unfound objects?



Regards,

Burkhard
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Running ISCSI with Ubuntu 18.04 OS

2021-06-07 Thread Michel Niyoyita
Dear all ,

IS it possible to configure and run ISCSI when you are deploying ceph using
ansible running on ubuntu 18.04 OS? please help me to know and if possible
provide helpful links on that.

Best Regard

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephfs root/boot?

2021-06-07 Thread Harry G. Coin
Has anyone added the 'conf.d' modules (and on the centos/rhel/fedora
world done the selinux work) so that initramfs/dracut can 'direct kernel
boot' cephfs as a guest image root file system?  It took some work for
the nfs folks to manage being the root filesystem.

Harry



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Turning on "compression_algorithm" old pool with 500TB usage

2021-06-07 Thread Florian Pritz
On Mon, Jun 07, 2021 at 06:22:07AM +0300, Konstantin Shalygin  
wrote:
> The same. You need to rewrite all your data

You can do that without user interaction too, if you recreate all
affected OSDs. That way the data on the OSD is recreated and thus
compressed because it is "new".

Florian


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-07 Thread Ed Kalk
We have no one currently using containers for anything. *Therefore, we 
run old CEPH code to avoid them.
If there was an option to not do containers on modern CEPH, that would 
be better for alot of people who don't want them.


-Ed

On 6/7/2021 2:54 AM, Eneko Lacunza wrote:

Hi Marc,

El 4/6/21 a las 16:39, Marc escribió:
Do you use rbd images in containers that are residing on osd nodes? 
Does this give any problems? I used to have kernel mounted cephfs on 
a osd node, after a specific luminous release this was giving me 
problems.

No, we use Ceph for VM storage. Some of the VMs host containers.

Cheers




-Original Message-
From: Eneko Lacunza 
Sent: Friday, 4 June 2021 15:49
To: ceph-users@ceph.io
Subject: *SPAM* [ceph-users] Re: Why you might want packages
not containers for Ceph deployments

Hi,

We operate a few Ceph hyperconverged clusters with Proxmox, that
provides a custom ceph package repository. They do a great work; and
deployment is a brezee.

So, even as currently we would rely on Proxmox packages/distribution 
and

not upstream, we have a number of other projects deployed with
containers and we even distribute some of our own development in deb 
and

container packages, so I will comment on our view:

El 2/6/21 a las 23:26, Oliver Freyermuth escribió:
[...]

If I operate services in containers built by developers, of course
this ensures the setup works, and dependencies are well tested, and
even upgrades work well — but it also means that,
at the end of the day, if I run 50 services in 50 different containers
from 50 different upstreams, I'll have up to 50 different versions of
OpenSSL floating around my production servers.
If a security issue is found in any of the packages used in all the
container images, I now need to trust the security teams of all the 50
developer groups building these containers
(and most FOSS projects won't have the ressources, understandably...),
instead of the one security team of the disto I use. And then, I also
have to re-pull all these containers, after finding out that a
security fix has become available.
Or I need to build all these containers myself, and effectively take
over the complete job, and have my own security team.

This may scale somewhat well, if you have a team of 50 people, and
every person takes care of one service. Containers are often your
friend in this case[1],
since it allows to isolate the different responsibilities along with
the service.

But this is rarely the case outside of industry, and especially not in
academics.
So the approach we chose for us is to have one common OS everywhere,
and automate all of our deployment and configuration management with
Puppet.
Of course, that puts is in one of the many corners out there, but it
scales extremely well to all services we operate,
and I can still trust the distro maintainers to keep the base OS safe
on all our servers, automate reboots etc.

For Ceph, we've actually seen questions about security issues already
on the list[0] (never answered AFAICT).

These are the two main issues I find with containers really:

- Keeping hosts uptodate is more complex (apt-get update+apt-get
dist-upgrade and also some kind of docker pull+docker
restart/docker-compose up ...). Much of the time the second part is not
standard (just deployed a Harbor service, upgrade is quite simple but I
have to know how to do it as it's speciffic, maintenance would be much
easier if it was packaged in Debian). I won't say it's more difficult,
but it will be more diverse and complex.

- Container image quality and security support quality, that will vary
from upstream to upstream. You have to research each of them to know
were they stand. A distro (specially a good one like Debian, Ubuntu,
RHEL or SUSE) has known, quality security support for the repositories.
They will even fix issues not fixed by upstream (o backport them to
distro's version...). This is more an upstream vs distro issue, really.

About debugging issues reported with Ceph containers, I think those are
things waiting for a fix: why are logs writen in container image (or an
ephemeral volume, I don't know really how is that done right now)
instead of an external name volume o a local mapped dir in 
/var/log/ceph ?


All that said, I think that it makes sense for an upstream project like
Ceph, to distribute container images, as it is the most generic way to
distribute (you can deploy on any system/distro supporting container
images) and eases development. But only distributing container images
could make more users depend on third party distribution (global or
specific distros), which would delay feeback/bugreport to upstream.

Cheers and thanks for the great work!

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/

[ceph-users] Re: Connect ceph to proxmox

2021-06-07 Thread Alwin Antreich
Hi Istvan,

June 7, 2021 11:54 AM, "Szabo, Istvan (Agoda)"  wrote:

> So the client is on 14.2.20 the cluster is on 14.2.21. Seems like the Debian 
> buster repo is missing
> the 21 update?
Best ask the Proxmox dev's about a 14.2.21 build. Or you could build it 
yourself, there is everything in the repo for it.
https://git.proxmox.com/?p=ceph.git;a=shortlog;h=refs/heads/nautilus-stable-6

The above aside, best upgrade Proxmox to Ceph Octopus, Nautilus is soon EoL 
anyway.

--
Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Failed OSD has 29 Slow MDS Ops.

2021-06-07 Thread Dave Hall
Hello,

Nautilus 14.2.16

I had an OSD go bad about 10 days ago.  Apparently as it was going down
some MDS ops got hung up waiting for it to come back.  I was out of town
for a couple days and found the OSD 'Down and Out' when I checked in.
(Also, oddly, the cluster did not appear to initiate recovery right away -
it took until I rebooted the OSD node.)

As of right now, the damaged OSD is 'safe-to-destroy' but the slow ops are
still hanging around.  Earlier today I quiesced the clients that were
accessing the CephFS, then unmounted and re-mounted it.  However, this did
not clear the lingering ops.

When I had the node offline I verified that the HDD and NVMe associated
with the OSD seem to actually be healthy, so I plan to zap and re-deploy
using the same hardware.  I would also like to upgrade to 14.2.20 (latest
Ceph for debian 10), but I'm hesitant to do any of this until I get rid of
these 29 slow ops.

Can anybody suggest a path forward?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Global Recovery Event

2021-06-07 Thread Jeremy Hansen

I’m seeing this in my health status:

  progress:
Global Recovery Event (13h)
  [] (remaining: 5w)

I’m not sure how this was initiated but this is a cluster with almost zero 
objects.  Is there a way to halt this process?  Why would it estimate 5 weeks 
to recover a cluster with almost zero data?

[ceph: root@cn01 /]# ceph -s -w
  cluster:
id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
health: HEALTH_OK

  services:
mon: 2 daemons, quorum cn02,cn05 (age 13h)
mgr: cn01.ceph.la1.clx.corp.xnkoft(active, since 13h), standbys: cn02.arszct
mds: 1/1 daemons up, 1 standby
osd: 27 osds: 27 up (since 13h), 27 in (since 16h)

  data:
volumes: 1/1 healthy
pools:   3 pools, 65 pgs
objects: 22.09k objects, 86 GiB
usage:   261 GiB used, 98 TiB / 98 TiB avail
pgs: 65 active+clean

  progress:
Global Recovery Event (13h)
  [] (remaining: 5w)



Thanks
-jeremy


signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Global Recovery Event

2021-06-07 Thread Neha Ojha
On Mon, Jun 7, 2021 at 5:24 PM Jeremy Hansen  wrote:
>
>
> I’m seeing this in my health status:
>
>   progress:
> Global Recovery Event (13h)
>   [] (remaining: 5w)
>
> I’m not sure how this was initiated but this is a cluster with almost zero 
> objects.  Is there a way to halt this process?  Why would it estimate 5 weeks 
> to recover a cluster with almost zero data?

You could be running into https://tracker.ceph.com/issues/49988. You
can try to run "ceph progress clear" and see if that helps or just
turn the progress module off and turn it back on.

- Neha

>
> [ceph: root@cn01 /]# ceph -s -w
>   cluster:
> id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
> health: HEALTH_OK
>
>   services:
> mon: 2 daemons, quorum cn02,cn05 (age 13h)
> mgr: cn01.ceph.la1.clx.corp.xnkoft(active, since 13h), standbys: 
> cn02.arszct
> mds: 1/1 daemons up, 1 standby
> osd: 27 osds: 27 up (since 13h), 27 in (since 16h)
>
>   data:
> volumes: 1/1 healthy
> pools:   3 pools, 65 pgs
> objects: 22.09k objects, 86 GiB
> usage:   261 GiB used, 98 TiB / 98 TiB avail
> pgs: 65 active+clean
>
>   progress:
> Global Recovery Event (13h)
>   [] (remaining: 5w)
>
>
>
> Thanks
> -jeremy
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Global Recovery Event

2021-06-07 Thread Jeremy Hansen
This seems to have recovered on its own.

Thank you
-jeremy

> On Jun 7, 2021, at 5:44 PM, Neha Ojha  wrote:
> 
> On Mon, Jun 7, 2021 at 5:24 PM Jeremy Hansen  > wrote:
>> 
>> 
>> I’m seeing this in my health status:
>> 
>>  progress:
>>Global Recovery Event (13h)
>>  [] (remaining: 5w)
>> 
>> I’m not sure how this was initiated but this is a cluster with almost zero 
>> objects.  Is there a way to halt this process?  Why would it estimate 5 
>> weeks to recover a cluster with almost zero data?
> 
> You could be running into https://tracker.ceph.com/issues/49988 
> . You
> can try to run "ceph progress clear" and see if that helps or just
> turn the progress module off and turn it back on.
> 
> - Neha
> 
>> 
>> [ceph: root@cn01 /]# ceph -s -w
>>  cluster:
>>id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
>>health: HEALTH_OK
>> 
>>  services:
>>mon: 2 daemons, quorum cn02,cn05 (age 13h)
>>mgr: cn01.ceph.la1.clx.corp.xnkoft(active, since 13h), standbys: 
>> cn02.arszct
>>mds: 1/1 daemons up, 1 standby
>>osd: 27 osds: 27 up (since 13h), 27 in (since 16h)
>> 
>>  data:
>>volumes: 1/1 healthy
>>pools:   3 pools, 65 pgs
>>objects: 22.09k objects, 86 GiB
>>usage:   261 GiB used, 98 TiB / 98 TiB avail
>>pgs: 65 active+clean
>> 
>>  progress:
>>Global Recovery Event (13h)
>>  [] (remaining: 5w)
>> 
>> 
>> 
>> Thanks
>> -jeremy
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> 


signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Only 2/5 mon services running

2021-06-07 Thread Jeremy Hansen

In an attempt to troubleshoot why only 2/5 mon services were running, I believe 
I’ve broke something:

[ceph: root@cn01 /]# ceph orch ls
NAME   PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
alertmanager  1/1  81s ago9d   count:1
crash 6/6  7m ago 9d   *
grafana   1/1  80s ago9d   count:1
mds.testfs2/2  81s ago9d   
cn01.ceph.la1.clx.corp;cn02.ceph.la1.clx.corp;cn03.ceph.la1.clx.corp;cn04.ceph.la1.clx.corp;cn05.ceph.la1.clx.corp;cn06.ceph.la1.clx.corp;count:2
mgr   2/2  81s ago9d   count:2
mon   2/5  81s ago9d   count:5
node-exporter 6/6  7m ago 9d   *
osd.all-available-devices   20/26  7m ago 9d   *
osd.unmanaged 7/7  7m ago -
prometheus2/2  80s ago9d   count:2

I tried to stop and start the mon service, but now the cluster is pretty much 
unresponsive, I’m assuming because I stopped mon:

[ceph: root@cn01 /]# ceph orch stop mon
Scheduled to stop mon.cn01 on host 'cn01.ceph.la1.clx.corp'
Scheduled to stop mon.cn02 on host 'cn02.ceph.la1.clx.corp'
Scheduled to stop mon.cn03 on host 'cn03.ceph.la1.clx.corp'
Scheduled to stop mon.cn04 on host 'cn04.ceph.la1.clx.corp'
Scheduled to stop mon.cn05 on host 'cn05.ceph.la1.clx.corp'
[ceph: root@cn01 /]# ceph orch start mon


^CCluster connection aborted


Now even after a reboot of the cluster, it’s unresponsive.  How do I get mon 
started again?

I’m going through Ceph and breaking things left and right, so I apologize for 
all the questions.  I learn best from breaking things and figuring out how to 
resolve the issues.


Thank you
-jeremy


signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Only 2/5 mon services running

2021-06-07 Thread Jeremy Hansen
It looks like the second mon server was down from my reboot.  Restarted and 
everything is functional again but I still can’t figure out why only 2 out of 
the 5 mon servers is down and won’t start.  If they were functioning, I 
probably wouldn’t have noticing the cluster being down.

Thanks
-jeremy


> On Jun 7, 2021, at 7:53 PM, Jeremy Hansen  wrote:
> 
> Signed PGP part
> 
> In an attempt to troubleshoot why only 2/5 mon services were running, I 
> believe I’ve broke something:
> 
> [ceph: root@cn01 /]# ceph orch ls
> NAME   PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
> alertmanager  1/1  81s ago9d   count:1
> crash 6/6  7m ago 9d   *
> grafana   1/1  80s ago9d   count:1
> mds.testfs2/2  81s ago9d   
> cn01.ceph.la1.clx.corp;cn02.ceph.la1.clx.corp;cn03.ceph.la1.clx.corp;cn04.ceph.la1.clx.corp;cn05.ceph.la1.clx.corp;cn06.ceph.la1.clx.corp;count:2
> mgr   2/2  81s ago9d   count:2
> mon   2/5  81s ago9d   count:5
> node-exporter 6/6  7m ago 9d   *
> osd.all-available-devices   20/26  7m ago 9d   *
> osd.unmanaged 7/7  7m ago -
> prometheus2/2  80s ago9d   count:2
> 
> I tried to stop and start the mon service, but now the cluster is pretty much 
> unresponsive, I’m assuming because I stopped mon:
> 
> [ceph: root@cn01 /]# ceph orch stop mon
> Scheduled to stop mon.cn01 on host 'cn01.ceph.la1.clx.corp'
> Scheduled to stop mon.cn02 on host 'cn02.ceph.la1.clx.corp'
> Scheduled to stop mon.cn03 on host 'cn03.ceph.la1.clx.corp'
> Scheduled to stop mon.cn04 on host 'cn04.ceph.la1.clx.corp'
> Scheduled to stop mon.cn05 on host 'cn05.ceph.la1.clx.corp'
> [ceph: root@cn01 /]# ceph orch start mon
> 
> 
> ^CCluster connection aborted
> 
> 
> Now even after a reboot of the cluster, it’s unresponsive.  How do I get mon 
> started again?
> 
> I’m going through Ceph and breaking things left and right, so I apologize for 
> all the questions.  I learn best from breaking things and figuring out how to 
> resolve the issues.
> 
> 
> Thank you
> -jeremy
> 
> 



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to enable lazyio under kcephfs?

2021-06-07 Thread opengers
ceph: 14.2.x
kernel: 4.15

In cephfs, due to the need for cache consistency, When a client is
executing buffer IO, another client will hang when reading and writing the
same file

It seems that lazyio can solve this problem, lazyio allows multiple clients
to execute buffer IO at the same time(relax consistency), But I am not sure
how to enable lazyio under the kernel mount, the test found that the
"client_force_lazyio" parameter does not work

My final requirement is to use lazyio to implement multiple clients to read
and write the same file(buffer IO mode)

Can someone explain how to enable lazyio under kcephfs, thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io