[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2022-09-02 Thread Marc


> 
> Isn't this one of the reasons containers were pushed, so that the
> packaging isn't as big a deal?
> Is it the continued push to support lots of distros without using
> containers that is the problem?
> 

apache httpd and openssh sshd are able to support lots of distros without using 
containers, so do many others. I do not see any problem with this.

>- *Use new/er Python packages:* that are not available in all distros
>(e.g.: fastapi), or which are available but in very disparate
> versions (and
>hence with different feature sets, resulting in spaghetti code and
>complicating test matrices),

>- *Improve security and quality*: contrary to common belief, the
> distro
>package approach does not intrinsically ensure a more secure,
> maintained
>environment, just a stable one. Many non-standard Python packages
> come from

Yes there is reasoning behind this of distro's not? So you are presented with 
packages that have some sort of qualification to be stable and secure.

My idea is that developers should use these packages and not others. Because it 
is outside of the scope of some developer to get involved in package selection, 
that is the job of maintainers of distro's. 

Currently I still have more confidence in people doing what is their core 
business, than people hacking on some code finding a quick fix library and 
jamming this into a repository.

PS.
Is this now an official redhat statement, that the advantages of using rhel 
distro are in decline?






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
this looks like an old traceback you would get if you ended up with a
service type that shouldn't be there somehow. The things I'd probably check
are that "cephadm ls" on either host definitely doesn't report and strange
things that aren't actually daemons in your cluster such as
"cephadm.". Another thing you could maybe try, as I believe the
assertion it's giving is for an unknown service type here ("AssertionError:
cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
remove whatever it thinks is this "cephadm" service that it has deployed.
Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
instead of 15.2.17 (I'm assuming here, but the line numbers in that
traceback suggest octopus). The 16.2.10 one is just much less likely to
have a bug that causes something like this.

On Fri, Sep 2, 2022 at 1:41 AM Satish Patel  wrote:

> Now when I run "ceph orch ps" it works but the following command throws an
> error.  Trying to bring up second mgr using ceph orch apply mgr command but
> didn't help
>
> root@ceph1:/ceph-disk# ceph version
> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
> (stable)
>
> root@ceph1:/ceph-disk# ceph orch ls
> Error EINVAL: Traceback (most recent call last):
>   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in _handle_command
> return self.handle_command(inbuf, cmd)
>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
> handle_command
> return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
> return self.func(mgr, **kwargs)
>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
> 
> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in
> wrapper
> return func(*args, **kwargs)
>   File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in
> _list_services
> raise_if_exception(completion)
>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
> raise_if_exception
> raise e
> AssertionError: cephadm
>
> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel  wrote:
>
> > nevermind, i found doc related that and i am able to get 1 mgr up -
> >
> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
> >
> >
> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel 
> wrote:
> >
> >> Folks,
> >>
> >> I am having little fun time with cephadm and it's very annoying to deal
> >> with it
> >>
> >> I have deployed a ceph cluster using cephadm on two nodes. Now when i
> was
> >> trying to upgrade and noticed hiccups where it just upgraded a single
> mgr
> >> with 16.2.10 but not other so i started messing around and somehow I
> >> deleted both mgr in the thought that cephadm will recreate them.
> >>
> >> Now i don't have any single mgr so my ceph orch command hangs forever
> and
> >> looks like a chicken egg issue.
> >>
> >> How do I recover from this? If I can't run the ceph orch command, I
> won't
> >> be able to redeploy my mgr daemons.
> >>
> >> I am not able to find any mgr in the following command on both nodes.
> >>
> >> $ cephadm ls | grep mgr
> >>
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Clarifications about automatic PG scaling

2022-09-02 Thread Nicola Mori

Dear Ceph users,

I'm setting up a cluster, at the moment I have 56 OSDs for a total 
available space of 109 TiB, and an erasure coded pool with a total 
occupancy of just 90 GB. The autoscale mode for the pool is set to "on", 
but I still have just 32 PGs. As far as I understand (admittedly not 
that much, I'm a Ceph beginner) the rule of thumb says that 100 PGs per 
OSD is the reference, so I would expect that the autoscaler should 
increase the PGs count, but that's not the case.
If my expectation is correct then I can't understand whether it's a 
config issue, and eventually which config options should I tweak, or 
it's expected behavior since e.g. the occupancy is very low and thus PG 
are not scaled up.

Any help/hint is really appreciated.
Thanks in advance,

Nicola
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Hi Adam,

In cephadm ls i found the following service but i believe it was there
before also.

{
"style": "cephadm:v1",
"name":
"cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
"fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
"systemd_unit":
"ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
",
"enabled": false,
"state": "stopped",
"container_id": null,
"container_image_name": null,
"container_image_id": null,
"version": null,
"started": null,
"created": null,
"deployed": null,
"configured": null
},

Look like remove didn't work

root@ceph1:~# ceph orch rm cephadm
Failed to remove service.  was not found.

root@ceph1:~# ceph orch rm
cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
Failed to remove service.

was not found.

On Fri, Sep 2, 2022 at 8:27 AM Adam King  wrote:

> this looks like an old traceback you would get if you ended up with a
> service type that shouldn't be there somehow. The things I'd probably check
> are that "cephadm ls" on either host definitely doesn't report and strange
> things that aren't actually daemons in your cluster such as
> "cephadm.". Another thing you could maybe try, as I believe the
> assertion it's giving is for an unknown service type here ("AssertionError:
> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
> remove whatever it thinks is this "cephadm" service that it has deployed.
> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
> instead of 15.2.17 (I'm assuming here, but the line numbers in that
> traceback suggest octopus). The 16.2.10 one is just much less likely to
> have a bug that causes something like this.
>
> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel  wrote:
>
>> Now when I run "ceph orch ps" it works but the following command throws an
>> error.  Trying to bring up second mgr using ceph orch apply mgr command
>> but
>> didn't help
>>
>> root@ceph1:/ceph-disk# ceph version
>> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
>> (stable)
>>
>> root@ceph1:/ceph-disk# ceph orch ls
>> Error EINVAL: Traceback (most recent call last):
>>   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in _handle_command
>> return self.handle_command(inbuf, cmd)
>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
>> handle_command
>> return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>>   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
>> return self.func(mgr, **kwargs)
>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
>> 
>> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args,
>> **l_kwargs)
>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in
>> wrapper
>> return func(*args, **kwargs)
>>   File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in
>> _list_services
>> raise_if_exception(completion)
>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
>> raise_if_exception
>> raise e
>> AssertionError: cephadm
>>
>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel  wrote:
>>
>> > nevermind, i found doc related that and i am able to get 1 mgr up -
>> >
>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
>> >
>> >
>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel 
>> wrote:
>> >
>> >> Folks,
>> >>
>> >> I am having little fun time with cephadm and it's very annoying to deal
>> >> with it
>> >>
>> >> I have deployed a ceph cluster using cephadm on two nodes. Now when i
>> was
>> >> trying to upgrade and noticed hiccups where it just upgraded a single
>> mgr
>> >> with 16.2.10 but not other so i started messing around and somehow I
>> >> deleted both mgr in the thought that cephadm will recreate them.
>> >>
>> >> Now i don't have any single mgr so my ceph orch command hangs forever
>> and
>> >> looks like a chicken egg issue.
>> >>
>> >> How do I recover from this? If I can't run the ceph orch command, I
>> won't
>> >> be able to redeploy my mgr daemons.
>> >>
>> >> I am not able to find any mgr in the following command on both nodes.
>> >>
>> >> $ cephadm ls | grep mgr
>> >>
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
Okay, I'm wondering if this is an issue with version mismatch. Having
previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't
expect this sort of thing to be present. Either way, I'd think just
deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would be
the way forward to get orch ls working again.

On Fri, Sep 2, 2022 at 9:44 AM Satish Patel  wrote:

> Hi Adam,
>
> In cephadm ls i found the following service but i believe it was there
> before also.
>
> {
> "style": "cephadm:v1",
> "name":
> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
> "systemd_unit":
> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
> ",
> "enabled": false,
> "state": "stopped",
> "container_id": null,
> "container_image_name": null,
> "container_image_id": null,
> "version": null,
> "started": null,
> "created": null,
> "deployed": null,
> "configured": null
> },
>
> Look like remove didn't work
>
> root@ceph1:~# ceph orch rm cephadm
> Failed to remove service.  was not found.
>
> root@ceph1:~# ceph orch rm
> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
> Failed to remove service.
> 
> was not found.
>
> On Fri, Sep 2, 2022 at 8:27 AM Adam King  wrote:
>
>> this looks like an old traceback you would get if you ended up with a
>> service type that shouldn't be there somehow. The things I'd probably check
>> are that "cephadm ls" on either host definitely doesn't report and strange
>> things that aren't actually daemons in your cluster such as
>> "cephadm.". Another thing you could maybe try, as I believe the
>> assertion it's giving is for an unknown service type here ("AssertionError:
>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
>> remove whatever it thinks is this "cephadm" service that it has deployed.
>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
>> instead of 15.2.17 (I'm assuming here, but the line numbers in that
>> traceback suggest octopus). The 16.2.10 one is just much less likely to
>> have a bug that causes something like this.
>>
>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel  wrote:
>>
>>> Now when I run "ceph orch ps" it works but the following command throws
>>> an
>>> error.  Trying to bring up second mgr using ceph orch apply mgr command
>>> but
>>> didn't help
>>>
>>> root@ceph1:/ceph-disk# ceph version
>>> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
>>> (stable)
>>>
>>> root@ceph1:/ceph-disk# ceph orch ls
>>> Error EINVAL: Traceback (most recent call last):
>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in _handle_command
>>> return self.handle_command(inbuf, cmd)
>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
>>> handle_command
>>> return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>>>   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
>>> return self.func(mgr, **kwargs)
>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
>>> 
>>> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args,
>>> **l_kwargs)
>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in
>>> wrapper
>>> return func(*args, **kwargs)
>>>   File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in
>>> _list_services
>>> raise_if_exception(completion)
>>>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
>>> raise_if_exception
>>> raise e
>>> AssertionError: cephadm
>>>
>>> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel 
>>> wrote:
>>>
>>> > nevermind, i found doc related that and i am able to get 1 mgr up -
>>> >
>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
>>> >
>>> >
>>> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel 
>>> wrote:
>>> >
>>> >> Folks,
>>> >>
>>> >> I am having little fun time with cephadm and it's very annoying to
>>> deal
>>> >> with it
>>> >>
>>> >> I have deployed a ceph cluster using cephadm on two nodes. Now when i
>>> was
>>> >> trying to upgrade and noticed hiccups where it just upgraded a single
>>> mgr
>>> >> with 16.2.10 but not other so i started messing around and somehow I
>>> >> deleted both mgr in the thought that cephadm will recreate them.
>>> >>
>>> >> Now i don't have any single mgr so my ceph orch command hangs forever
>>> and
>>> >> looks like a chicken egg issue.
>>> >>
>>> >> How do I recover from this? If I can't run the ceph orch command, I
>>> won't
>>> >> be able to redeploy my mgr daemons.
>>> >>
>>> >> I am not able to find any mgr in the following command on both nodes.
>>> >>
>>> >> $ cephadm ls | grep mgr
>>> >>
>>> >
>>> __

[ceph-users] low available space due to unbalanced cluster(?)

2022-09-02 Thread Oebele Drijfhout
Hello,

I'm new to Ceph and I recently inherited a 4 node cluster with 32 OSDs and
about 116TB raw space, which shows low available space, which I'm trying to
increase by enabling the balancer and lowering priority for the most-used
OSDs. My questions are: is what I did correct with the current state of the
cluster and can I do more to speed up rebalancing and will we actually make
more space available this way?

Some background info: Earlier this week MAX AVAIL on the cluster was 0 and
it was only then that we noticed something was wrong. We removed an unused
rbd image (about 3TB) and now we have a little under 1TB in available
space. We are adding about 75GB per day on this cluster.

[xxx@ceph02 ~]$ sudo ceph --cluster xxx df
RAW STORAGE:
CLASS SIZEAVAIL  USED   RAW USED %RAW USED
hdd   116 TiB 47 TiB 69 TiB   69 TiB 59.69
TOTAL 116 TiB 47 TiB 69 TiB   69 TiB 59.69

POOLS:
POOLID PGS  STORED  OBJECTS USED
 %USED MAX AVAIL
xxx-pool1  1024 130 B   3   192 KiB  0
  992 GiB
yyy_data6  128  23 TiB  12.08M  69 TiB
 95.98992 GiB
yyy_metadata7  128  5.6 GiB 2.22M   6.1 GiB
0.21 992 GiB

Cluster status:

[xxx@ceph02 ~]$ sudo ceph --cluster xxx -s
  cluster:
id: 91ba1ea6-bfec-4ddb-a8b5-9faf842f22c3
health: HEALTH_WARN
1 backfillfull osd(s)
1 nearfull osd(s)
3 pool(s) backfillfull
Low space hindering backfill (add storage if this doesn't
resolve itself): 6 pgs backfill_toofull

  services:
mon: 5 daemons, quorum a,b,c,d,e (age 5d)
mgr: b(active, since 22h), standbys: a, c, d, e
mds: registration_docs:1 {0=b=up:active} 3 up:standby
osd: 32 osds: 32 up (since 19M), 32 in (since 3y); 36 remapped pgs

  task status:
scrub status:
mds.b: idle

  data:
pools:   3 pools, 1280 pgs
objects: 14.31M objects, 23 TiB
usage:   70 TiB used, 47 TiB / 116 TiB avail
pgs: 2587772/42925071 objects misplaced (6.029%)
 1244 active+clean
 17   active+remapped+backfilling
 13   active+remapped+backfill_wait
 4active+remapped+backfill_toofull
 2active+remapped+backfill_wait+backfill_toofull

  io:
client:   331 KiB/s wr, 0 op/s rd, 0 op/s wr
recovery: 141 MiB/s, 65 keys/s, 84 objects/s

Versions:

[xxx@ceph02 ~]$ rpm -qa | grep ceph
ceph-common-14.2.13-0.el7.x86_64
ceph-mds-14.2.13-0.el7.x86_64
ceph-osd-14.2.13-0.el7.x86_64
ceph-base-14.2.13-0.el7.x86_64
libcephfs2-14.2.13-0.el7.x86_64
python-ceph-argparse-14.2.13-0.el7.x86_64
ceph-selinux-14.2.13-0.el7.x86_64
ceph-mgr-14.2.13-0.el7.x86_64
ceph-14.2.13-0.el7.x86_64
python-cephfs-14.2.13-0.el7.x86_64
ceph-mon-14.2.13-0.el7.x86_64

It looks like the cluster is severely unbalanced and I guess that's
expected because the balancer was set to "off"

[xxx@ceph02 ~]$ sudo ceph --cluster xxx osd df
ID CLASS WEIGHT  REWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
%USE  VAR  PGS STATUS
 0   hdd 3.63869  1.0 3.6 TiB 1.8 TiB 1.8 TiB 894 MiB 3.6 GiB 1.8 TiB
49.24 0.82 122 up
 1   hdd 3.63869  1.0 3.6 TiB 1.1 TiB 1.1 TiB 581 MiB 2.4 GiB 2.5 TiB
31.07 0.52 123 up
 2   hdd 3.63869  1.0 3.6 TiB 2.2 TiB 2.2 TiB 632 MiB 4.1 GiB 1.5 TiB
60.01 1.00 121 up
 3   hdd 3.63869  1.0 3.6 TiB 2.7 TiB 2.7 TiB 672 MiB 5.5 GiB 975 GiB
73.84 1.24 122 up
 4   hdd 3.63869  0.94983 3.6 TiB 2.9 TiB 2.9 TiB 478 MiB 5.2 GiB 794 GiB
78.69 1.32 111 up
 5   hdd 3.63869  1.0 3.6 TiB 2.5 TiB 2.5 TiB 900 MiB 4.7 GiB 1.1 TiB
69.52 1.16 122 up
 6   hdd 3.63869  1.0 3.6 TiB 2.7 TiB 2.7 TiB 468 MiB 5.5 GiB 929 GiB
75.08 1.26 125 up
 7   hdd 3.63869  1.0 3.6 TiB 1.6 TiB 1.6 TiB 731 MiB 3.2 GiB 2.0 TiB
44.54 0.75 122 up
 8   hdd 3.63869  1.0 3.6 TiB 1.3 TiB 1.3 TiB 626 MiB 2.6 GiB 2.4 TiB
35.41 0.59 120 up
 9   hdd 3.63869  1.0 3.6 TiB 2.5 TiB 2.5 TiB 953 MiB 4.8 GiB 1.1 TiB
69.61 1.17 122 up
10   hdd 3.63869  1.0 3.6 TiB 2.0 TiB 2.0 TiB 526 MiB 3.9 GiB 1.6 TiB
55.64 0.93 121 up
11   hdd 3.63869  0.94983 3.6 TiB 3.4 TiB 3.4 TiB 476 MiB 6.2 GiB 242 GiB
93.50 1.57 101 up
12   hdd 3.63869  1.0 3.6 TiB 1.4 TiB 1.4 TiB 688 MiB 3.0 GiB 2.2 TiB
39.44 0.66 117 up
13   hdd 3.63869  1.0 3.6 TiB 1.3 TiB 1.3 TiB 738 MiB 2.8 GiB 2.3 TiB
35.98 0.60 124 up
14   hdd 3.63869  1.0 3.6 TiB 2.8 TiB 2.8 TiB 582 MiB 5.1 GiB 879 GiB
76.40 1.28 123 up
15   hdd 3.63869  1.0 3.6 TiB 2.5 TiB 2.5 TiB 566 MiB 4.6 GiB 1.1 TiB
68.81 1.15 124 up
16   hdd 3.63869  1.0 3.6 TiB 1.5 TiB 1.5 TiB 625 MiB 3.1 GiB 2.2 TiB
40.23 0.67 121 up
17   hdd 3.63869  0.94983 3.6 TiB 3.2 TiB 3.2 TiB 704 MiB 6.1 GiB 427 GiB
88.55 1.48 112 up
18   hdd 3.63869  1.0 3.6 TiB 2.0 TiB 2.0 TiB 143 MiB 3.6 GiB 1.7 TiB
54.12 0.91 124 up
19   hdd 3.63869  1.000

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Hi Adam,

I have deleted file located here - rm
/var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d

But still getting the same error, do i need to do anything else?

On Fri, Sep 2, 2022 at 9:51 AM Adam King  wrote:

> Okay, I'm wondering if this is an issue with version mismatch. Having
> previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't
> expect this sort of thing to be present. Either way, I'd think just
> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would be
> the way forward to get orch ls working again.
>
> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel  wrote:
>
>> Hi Adam,
>>
>> In cephadm ls i found the following service but i believe it was there
>> before also.
>>
>> {
>> "style": "cephadm:v1",
>> "name":
>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
>> "systemd_unit":
>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>> ",
>> "enabled": false,
>> "state": "stopped",
>> "container_id": null,
>> "container_image_name": null,
>> "container_image_id": null,
>> "version": null,
>> "started": null,
>> "created": null,
>> "deployed": null,
>> "configured": null
>> },
>>
>> Look like remove didn't work
>>
>> root@ceph1:~# ceph orch rm cephadm
>> Failed to remove service.  was not found.
>>
>> root@ceph1:~# ceph orch rm
>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>> Failed to remove service.
>> 
>> was not found.
>>
>> On Fri, Sep 2, 2022 at 8:27 AM Adam King  wrote:
>>
>>> this looks like an old traceback you would get if you ended up with a
>>> service type that shouldn't be there somehow. The things I'd probably check
>>> are that "cephadm ls" on either host definitely doesn't report and strange
>>> things that aren't actually daemons in your cluster such as
>>> "cephadm.". Another thing you could maybe try, as I believe the
>>> assertion it's giving is for an unknown service type here ("AssertionError:
>>> cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
>>> remove whatever it thinks is this "cephadm" service that it has deployed.
>>> Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
>>> instead of 15.2.17 (I'm assuming here, but the line numbers in that
>>> traceback suggest octopus). The 16.2.10 one is just much less likely to
>>> have a bug that causes something like this.
>>>
>>> On Fri, Sep 2, 2022 at 1:41 AM Satish Patel 
>>> wrote:
>>>
 Now when I run "ceph orch ps" it works but the following command throws
 an
 error.  Trying to bring up second mgr using ceph orch apply mgr command
 but
 didn't help

 root@ceph1:/ceph-disk# ceph version
 ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
 (stable)

 root@ceph1:/ceph-disk# ceph orch ls
 Error EINVAL: Traceback (most recent call last):
   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in
 _handle_command
 return self.handle_command(inbuf, cmd)
   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
 handle_command
 return dispatch[cmd['prefix']].call(self, cmd, inbuf)
   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
 return self.func(mgr, **kwargs)
   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
 
 wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args,
 **l_kwargs)
   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in
 wrapper
 return func(*args, **kwargs)
   File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in
 _list_services
 raise_if_exception(completion)
   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
 raise_if_exception
 raise e
 AssertionError: cephadm

 On Fri, Sep 2, 2022 at 1:32 AM Satish Patel 
 wrote:

 > nevermind, i found doc related that and i am able to get 1 mgr up -
 >
 https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
 >
 >
 > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel 
 wrote:
 >
 >> Folks,
 >>
 >> I am having little fun time with cephadm and it's very annoying to
 deal
 >> with it
 >>
 >> I have deployed a ceph cluster using cephadm on two nodes. Now when
 i was
 >> trying to upgrade and noticed hiccups where it just upgraded a
 single mgr
 >> with 16.2.10 but not other so i started messing around and somehow I
 >> deleted both mgr in the thought that cephadm will recreate them.
 >>
>>>

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
maybe also a "ceph orch ps --refresh"? It might still have the old cached
daemon inventory from before you remove the files.

On Fri, Sep 2, 2022 at 9:57 AM Satish Patel  wrote:

> Hi Adam,
>
> I have deleted file located here - rm
> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>
> But still getting the same error, do i need to do anything else?
>
> On Fri, Sep 2, 2022 at 9:51 AM Adam King  wrote:
>
>> Okay, I'm wondering if this is an issue with version mismatch. Having
>> previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't
>> expect this sort of thing to be present. Either way, I'd think just
>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would
>> be the way forward to get orch ls working again.
>>
>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel  wrote:
>>
>>> Hi Adam,
>>>
>>> In cephadm ls i found the following service but i believe it was there
>>> before also.
>>>
>>> {
>>> "style": "cephadm:v1",
>>> "name":
>>> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
>>> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
>>> "systemd_unit":
>>> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>> ",
>>> "enabled": false,
>>> "state": "stopped",
>>> "container_id": null,
>>> "container_image_name": null,
>>> "container_image_id": null,
>>> "version": null,
>>> "started": null,
>>> "created": null,
>>> "deployed": null,
>>> "configured": null
>>> },
>>>
>>> Look like remove didn't work
>>>
>>> root@ceph1:~# ceph orch rm cephadm
>>> Failed to remove service.  was not found.
>>>
>>> root@ceph1:~# ceph orch rm
>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>> Failed to remove service.
>>> 
>>> was not found.
>>>
>>> On Fri, Sep 2, 2022 at 8:27 AM Adam King  wrote:
>>>
 this looks like an old traceback you would get if you ended up with a
 service type that shouldn't be there somehow. The things I'd probably check
 are that "cephadm ls" on either host definitely doesn't report and strange
 things that aren't actually daemons in your cluster such as
 "cephadm.". Another thing you could maybe try, as I believe the
 assertion it's giving is for an unknown service type here ("AssertionError:
 cephadm"), is just "ceph orch rm cephadm" which would maybe cause it to
 remove whatever it thinks is this "cephadm" service that it has deployed.
 Lastly, you could try having the mgr you manually deploy be a 16.2.10 one
 instead of 15.2.17 (I'm assuming here, but the line numbers in that
 traceback suggest octopus). The 16.2.10 one is just much less likely to
 have a bug that causes something like this.

 On Fri, Sep 2, 2022 at 1:41 AM Satish Patel 
 wrote:

> Now when I run "ceph orch ps" it works but the following command
> throws an
> error.  Trying to bring up second mgr using ceph orch apply mgr
> command but
> didn't help
>
> root@ceph1:/ceph-disk# ceph version
> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
> (stable)
>
> root@ceph1:/ceph-disk# ceph orch ls
> Error EINVAL: Traceback (most recent call last):
>   File "/usr/share/ceph/mgr/mgr_module.py", line 1212, in
> _handle_command
> return self.handle_command(inbuf, cmd)
>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in
> handle_command
> return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>   File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call
> return self.func(mgr, **kwargs)
>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in
> 
> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args,
> **l_kwargs)
>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in
> wrapper
> return func(*args, **kwargs)
>   File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in
> _list_services
> raise_if_exception(completion)
>   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in
> raise_if_exception
> raise e
> AssertionError: cephadm
>
> On Fri, Sep 2, 2022 at 1:32 AM Satish Patel 
> wrote:
>
> > nevermind, i found doc related that and i am able to get 1 mgr up -
> >
> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
> >
> >
> > On Fri, Sep 2, 2022 at 1:21 AM Satish Patel 
> wrote:
> >
> >> Folks,
> >>
> >> I am having little fun time with cephadm and it's very annoying to
> deal
> >> with it
> >>
> >> I have deployed a 

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
I can see that in the output but I'm not sure how to get rid of it.

root@ceph1:~# ceph orch ps --refresh
NAME
 HOST   STATUSREFRESHED  AGE  VERSIONIMAGE NAME
IMAGE ID
   CONTAINER ID
alertmanager.ceph1
 ceph1  running (9h)  64s ago2w   0.20.0
quay.io/prometheus/alertmanager:v0.20.0
   0881eb8f169f  ba804b555378
cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
 ceph2  stopped   65s ago-  
 

crash.ceph1
ceph1  running (9h)  64s ago2w   15.2.17quay.io/ceph/ceph:v15
   93146564743f
 a3a431d834fc
crash.ceph2
ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
   93146564743f
 3c963693ff2b
grafana.ceph1
ceph1  running (9h)  64s ago2w   6.7.4
quay.io/ceph/ceph-grafana:6.7.4
   557c83e11646  7583a8dc4c61
mgr.ceph1.smfvfd
 ceph1  running (8h)  64s ago8h   15.2.17
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
 93146564743f  1aab837306d2
mon.ceph1
ceph1  running (9h)  64s ago2w   15.2.17quay.io/ceph/ceph:v15
   93146564743f
 c1d155d8c7ad
node-exporter.ceph1
ceph1  running (9h)  64s ago2w   0.18.1
quay.io/prometheus/node-exporter:v0.18.1
e5a616e4b9cf  2ff235fe0e42
node-exporter.ceph2
ceph2  running (9h)  65s ago13d  0.18.1
quay.io/prometheus/node-exporter:v0.18.1
e5a616e4b9cf  17678b9ba602
osd.0
ceph1  running (9h)  64s ago13d  15.2.17quay.io/ceph/ceph:v15
   93146564743f
 d0fd73b777a3
osd.1
ceph1  running (9h)  64s ago13d  15.2.17quay.io/ceph/ceph:v15
   93146564743f
 049120e83102
osd.2
ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
   93146564743f
 8700e8cefd1f
osd.3
ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
   93146564743f
 9c71bc87ed16
prometheus.ceph1
 ceph1  running (9h)  64s ago2w   2.18.1
quay.io/prometheus/prometheus:v2.18.1
   de242295e225  74a538efd61e

On Fri, Sep 2, 2022 at 10:10 AM Adam King  wrote:

> maybe also a "ceph orch ps --refresh"? It might still have the old cached
> daemon inventory from before you remove the files.
>
> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel  wrote:
>
>> Hi Adam,
>>
>> I have deleted file located here - rm
>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>
>> But still getting the same error, do i need to do anything else?
>>
>> On Fri, Sep 2, 2022 at 9:51 AM Adam King  wrote:
>>
>>> Okay, I'm wondering if this is an issue with version mismatch. Having
>>> previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't
>>> expect this sort of thing to be present. Either way, I'd think just
>>> deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
>>> 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would
>>> be the way forward to get orch ls working again.
>>>
>>> On Fri, Sep 2, 2022 at 9:44 AM Satish Patel 
>>> wrote:
>>>
 Hi Adam,

 In cephadm ls i found the following service but i believe it was there
 before also.

 {
 "style": "cephadm:v1",
 "name":
 "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
 "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
 "systemd_unit":
 "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
 ",
 "enabled": false,
 "state": "stopped",
 "container_id": null,
 "container_image_name": null,
 "container_image_id": null,
 "version": null,
 "started": null,
 "created": null,
 "deployed": null,
 "configured": null
 },

 Look like remove didn't work

 root@ceph1:~# ceph orch rm cephadm
 Failed to remove service.  was not found.

 root@ceph1:~# ceph orch rm
 cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
 Failed to remove service.
 
 was not found.

 On Fri, Sep 2, 2022 at 8:27 AM Adam King  wrote:

> this looks like an old traceback you would get if you ended up with a
> service type that shouldn't be there somehow. The things I'd probably 
> check
> are that "cephadm ls" on either host definitely doesn't report and strange
> t

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Hi Adam,

Wait..wait.. now it's working suddenly without doing anything.. very odd

root@ceph1:~# ceph orch ls
NAME  RUNNING  REFRESHED  AGE  PLACEMENTIMAGE NAME

IMAGE ID
alertmanager  1/1  5s ago 2w   count:1
quay.io/prometheus/alertmanager:v0.20.0
   0881eb8f169f
crash 2/2  5s ago 2w   *
quay.io/ceph/ceph:v15
   93146564743f
grafana   1/1  5s ago 2w   count:1
quay.io/ceph/ceph-grafana:6.7.4
   557c83e11646
mgr   1/2  5s ago 8h   count:2
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
 93146564743f
mon   1/2  5s ago 8h   ceph1;ceph2
quay.io/ceph/ceph:v15
   93146564743f
node-exporter 2/2  5s ago 2w   *
quay.io/prometheus/node-exporter:v0.18.1
e5a616e4b9cf
osd.osd_spec_default  4/0  5s ago -
quay.io/ceph/ceph:v15
   93146564743f
prometheus1/1  5s ago 2w   count:1
quay.io/prometheus/prometheus:v2.18.1

On Fri, Sep 2, 2022 at 10:13 AM Satish Patel  wrote:

> I can see that in the output but I'm not sure how to get rid of it.
>
> root@ceph1:~# ceph orch ps --refresh
> NAME
>  HOST   STATUSREFRESHED  AGE  VERSIONIMAGE NAME
> IMAGE ID
>CONTAINER ID
> alertmanager.ceph1
>  ceph1  running (9h)  64s ago2w   0.20.0
> quay.io/prometheus/alertmanager:v0.20.0
>  0881eb8f169f  ba804b555378
> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>  ceph2  stopped   65s ago-  
>  
> 
> crash.ceph1
> ceph1  running (9h)  64s ago2w   15.2.17quay.io/ceph/ceph:v15
>
>  93146564743f  a3a431d834fc
> crash.ceph2
> ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
>
>  93146564743f  3c963693ff2b
> grafana.ceph1
> ceph1  running (9h)  64s ago2w   6.7.4
> quay.io/ceph/ceph-grafana:6.7.4
>  557c83e11646  7583a8dc4c61
> mgr.ceph1.smfvfd
>  ceph1  running (8h)  64s ago8h   15.2.17
> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>  93146564743f  1aab837306d2
> mon.ceph1
> ceph1  running (9h)  64s ago2w   15.2.17quay.io/ceph/ceph:v15
>
>  93146564743f  c1d155d8c7ad
> node-exporter.ceph1
> ceph1  running (9h)  64s ago2w   0.18.1
> quay.io/prometheus/node-exporter:v0.18.1
>   e5a616e4b9cf  2ff235fe0e42
> node-exporter.ceph2
> ceph2  running (9h)  65s ago13d  0.18.1
> quay.io/prometheus/node-exporter:v0.18.1
>   e5a616e4b9cf  17678b9ba602
> osd.0
> ceph1  running (9h)  64s ago13d  15.2.17quay.io/ceph/ceph:v15
>
>  93146564743f  d0fd73b777a3
> osd.1
> ceph1  running (9h)  64s ago13d  15.2.17quay.io/ceph/ceph:v15
>
>  93146564743f  049120e83102
> osd.2
> ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
>
>  93146564743f  8700e8cefd1f
> osd.3
> ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
>
>  93146564743f  9c71bc87ed16
> prometheus.ceph1
>  ceph1  running (9h)  64s ago2w   2.18.1
> quay.io/prometheus/prometheus:v2.18.1
>  de242295e225  74a538efd61e
>
> On Fri, Sep 2, 2022 at 10:10 AM Adam King  wrote:
>
>> maybe also a "ceph orch ps --refresh"? It might still have the old cached
>> daemon inventory from before you remove the files.
>>
>> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel  wrote:
>>
>>> Hi Adam,
>>>
>>> I have deleted file located here - rm
>>> /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>
>>> But still getting the same error, do i need to do anything else?
>>>
>>> On Fri, Sep 2, 2022 at 9:51 AM Adam King  wrote:
>>>
 Okay, I'm wondering if this is an issue with version mismatch. Having
 previously had a 16.2.10 mgr and then now having a 15.2.17 one that doesn't
 expect this sort of thing to be present. Either way, I'd think just
 deleting this cephadm.7ce656a8721deb5054c37b0cfb9038
 1522d521dde51fb0c5a2142314d663f63d (and any others like it) file would
 be the way forward to get orch ls working again.

 On Fri, Sep 2, 2022 at 9:44 AM Satish Patel 
 wrote:

> Hi Adam,
>
> In cephadm ls i found the following service but i believe it was there
> before also.
>
> {
> "style": "cephadm:v1",
> "name":
> "cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
> "fsid": "f270ad9e-1f6f-11ed-b6f8-a539d87379ea",
> "systemd_unit":
> "ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
> ",
> "enabled": false,
> "state": "stopped"

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Let's come back to the original question: how to bring back the second mgr?

root@ceph1:~# ceph orch apply mgr 2
Scheduled mgr update...

Nothing happened with above command, logs saying nothing

2022-09-02T14:16:20.407927+ mgr.ceph1.smfvfd (mgr.334626) 16939 :
cephadm [INF] refreshing ceph2 facts
2022-09-02T14:16:40.247195+ mgr.ceph1.smfvfd (mgr.334626) 16952 :
cephadm [INF] Saving service mgr spec with placement count:2
2022-09-02T14:16:53.106919+ mgr.ceph1.smfvfd (mgr.334626) 16961 :
cephadm [INF] Saving service mgr spec with placement count:2
2022-09-02T14:17:19.135203+ mgr.ceph1.smfvfd (mgr.334626) 16975 :
cephadm [INF] refreshing ceph1 facts
2022-09-02T14:17:20.780496+ mgr.ceph1.smfvfd (mgr.334626) 16977 :
cephadm [INF] refreshing ceph2 facts
2022-09-02T14:18:19.502034+ mgr.ceph1.smfvfd (mgr.334626) 17008 :
cephadm [INF] refreshing ceph1 facts
2022-09-02T14:18:21.127973+ mgr.ceph1.smfvfd (mgr.334626) 17010 :
cephadm [INF] refreshing ceph2 facts







On Fri, Sep 2, 2022 at 10:15 AM Satish Patel  wrote:

> Hi Adam,
>
> Wait..wait.. now it's working suddenly without doing anything.. very odd
>
> root@ceph1:~# ceph orch ls
> NAME  RUNNING  REFRESHED  AGE  PLACEMENTIMAGE NAME
>
> IMAGE ID
> alertmanager  1/1  5s ago 2w   count:1
> quay.io/prometheus/alertmanager:v0.20.0
>  0881eb8f169f
> crash 2/2  5s ago 2w   *
> quay.io/ceph/ceph:v15
>  93146564743f
> grafana   1/1  5s ago 2w   count:1
> quay.io/ceph/ceph-grafana:6.7.4
>  557c83e11646
> mgr   1/2  5s ago 8h   count:2
> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>  93146564743f
> mon   1/2  5s ago 8h   ceph1;ceph2
> quay.io/ceph/ceph:v15
>  93146564743f
> node-exporter 2/2  5s ago 2w   *
> quay.io/prometheus/node-exporter:v0.18.1
>   e5a616e4b9cf
> osd.osd_spec_default  4/0  5s ago -
> quay.io/ceph/ceph:v15
>  93146564743f
> prometheus1/1  5s ago 2w   count:1
> quay.io/prometheus/prometheus:v2.18.1
>
> On Fri, Sep 2, 2022 at 10:13 AM Satish Patel  wrote:
>
>> I can see that in the output but I'm not sure how to get rid of it.
>>
>> root@ceph1:~# ceph orch ps --refresh
>> NAME
>>  HOST   STATUSREFRESHED  AGE  VERSIONIMAGE NAME
>> IMAGE ID
>>CONTAINER ID
>> alertmanager.ceph1
>>  ceph1  running (9h)  64s ago2w   0.20.0
>> quay.io/prometheus/alertmanager:v0.20.0
>>0881eb8f169f  ba804b555378
>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>  ceph2  stopped   65s ago-  
>>  
>> 
>> crash.ceph1
>> ceph1  running (9h)  64s ago2w   15.2.17quay.io/ceph/ceph:v15
>>
>>  93146564743f  a3a431d834fc
>> crash.ceph2
>> ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
>>
>>  93146564743f  3c963693ff2b
>> grafana.ceph1
>> ceph1  running (9h)  64s ago2w   6.7.4
>> quay.io/ceph/ceph-grafana:6.7.4
>>557c83e11646  7583a8dc4c61
>> mgr.ceph1.smfvfd
>>  ceph1  running (8h)  64s ago8h   15.2.17
>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>  93146564743f  1aab837306d2
>> mon.ceph1
>> ceph1  running (9h)  64s ago2w   15.2.17quay.io/ceph/ceph:v15
>>
>>  93146564743f  c1d155d8c7ad
>> node-exporter.ceph1
>> ceph1  running (9h)  64s ago2w   0.18.1
>> quay.io/prometheus/node-exporter:v0.18.1
>>   e5a616e4b9cf  2ff235fe0e42
>> node-exporter.ceph2
>> ceph2  running (9h)  65s ago13d  0.18.1
>> quay.io/prometheus/node-exporter:v0.18.1
>>   e5a616e4b9cf  17678b9ba602
>> osd.0
>> ceph1  running (9h)  64s ago13d  15.2.17quay.io/ceph/ceph:v15
>>
>>  93146564743f  d0fd73b777a3
>> osd.1
>> ceph1  running (9h)  64s ago13d  15.2.17quay.io/ceph/ceph:v15
>>
>>  93146564743f  049120e83102
>> osd.2
>> ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
>>
>>  93146564743f  8700e8cefd1f
>> osd.3
>> ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
>>
>>  93146564743f  9c71bc87ed16
>> prometheus.ceph1
>>  ceph1  running (9h)  64s ago2w   2.18.1
>> quay.io/prometheus/prometheus:v2.18.1
>>de242295e225  74a538efd61e
>>
>> On Fri, Sep 2, 2022 at 10:10 AM Adam King  wrote:
>>
>>> maybe also a "ceph orch ps --refresh"? It might still have the old
>>> cached daemon inventory from before you remove the files.
>>>
>>> On Fri, Sep 2, 2022 at 9:57 AM Satish Patel 
>>> wrote:
>>>
 Hi Adam,

 I have deleted file located here - rm
 /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb9038152

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
It Looks like I did it with the following command.

$ ceph orch daemon add mgr ceph2:10.73.0.192

Now i can see two with same version 15.x

root@ceph1:~# ceph orch ps --daemon-type mgr
NAME  HOST   STATUS REFRESHED  AGE  VERSION  IMAGE NAME

IMAGE ID  CONTAINER ID
mgr.ceph1.smfvfd  ceph1  running (8h)   41s ago8h   15.2.17
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
 93146564743f  1aab837306d2
mgr.ceph2.huidoh  ceph2  running (60s)  110s ago   60s  15.2.17
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
 93146564743f  294fd6ab6c97

On Fri, Sep 2, 2022 at 10:19 AM Satish Patel  wrote:

> Let's come back to the original question: how to bring back the second mgr?
>
> root@ceph1:~# ceph orch apply mgr 2
> Scheduled mgr update...
>
> Nothing happened with above command, logs saying nothing
>
> 2022-09-02T14:16:20.407927+ mgr.ceph1.smfvfd (mgr.334626) 16939 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T14:16:40.247195+ mgr.ceph1.smfvfd (mgr.334626) 16952 :
> cephadm [INF] Saving service mgr spec with placement count:2
> 2022-09-02T14:16:53.106919+ mgr.ceph1.smfvfd (mgr.334626) 16961 :
> cephadm [INF] Saving service mgr spec with placement count:2
> 2022-09-02T14:17:19.135203+ mgr.ceph1.smfvfd (mgr.334626) 16975 :
> cephadm [INF] refreshing ceph1 facts
> 2022-09-02T14:17:20.780496+ mgr.ceph1.smfvfd (mgr.334626) 16977 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T14:18:19.502034+ mgr.ceph1.smfvfd (mgr.334626) 17008 :
> cephadm [INF] refreshing ceph1 facts
> 2022-09-02T14:18:21.127973+ mgr.ceph1.smfvfd (mgr.334626) 17010 :
> cephadm [INF] refreshing ceph2 facts
>
>
>
>
>
>
>
> On Fri, Sep 2, 2022 at 10:15 AM Satish Patel  wrote:
>
>> Hi Adam,
>>
>> Wait..wait.. now it's working suddenly without doing anything.. very odd
>>
>> root@ceph1:~# ceph orch ls
>> NAME  RUNNING  REFRESHED  AGE  PLACEMENTIMAGE NAME
>>
>>   IMAGE ID
>> alertmanager  1/1  5s ago 2w   count:1
>> quay.io/prometheus/alertmanager:v0.20.0
>>0881eb8f169f
>> crash 2/2  5s ago 2w   *
>> quay.io/ceph/ceph:v15
>>93146564743f
>> grafana   1/1  5s ago 2w   count:1
>> quay.io/ceph/ceph-grafana:6.7.4
>>557c83e11646
>> mgr   1/2  5s ago 8h   count:2
>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>  93146564743f
>> mon   1/2  5s ago 8h   ceph1;ceph2
>> quay.io/ceph/ceph:v15
>>93146564743f
>> node-exporter 2/2  5s ago 2w   *
>> quay.io/prometheus/node-exporter:v0.18.1
>>   e5a616e4b9cf
>> osd.osd_spec_default  4/0  5s ago -
>> quay.io/ceph/ceph:v15
>>93146564743f
>> prometheus1/1  5s ago 2w   count:1
>> quay.io/prometheus/prometheus:v2.18.1
>>
>> On Fri, Sep 2, 2022 at 10:13 AM Satish Patel 
>> wrote:
>>
>>> I can see that in the output but I'm not sure how to get rid of it.
>>>
>>> root@ceph1:~# ceph orch ps --refresh
>>> NAME
>>>  HOST   STATUSREFRESHED  AGE  VERSIONIMAGE NAME
>>> IMAGE ID
>>>CONTAINER ID
>>> alertmanager.ceph1
>>>  ceph1  running (9h)  64s ago2w   0.20.0
>>> quay.io/prometheus/alertmanager:v0.20.0
>>>0881eb8f169f  ba804b555378
>>> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>>>  ceph2  stopped   65s ago-  
>>>  
>>> 
>>> crash.ceph1
>>>   ceph1  running (9h)  64s ago2w   15.2.17quay.io/ceph/ceph:v15
>>>
>>>  93146564743f  a3a431d834fc
>>> crash.ceph2
>>>   ceph2  running (9h)  65s ago13d  15.2.17quay.io/ceph/ceph:v15
>>>
>>>  93146564743f  3c963693ff2b
>>> grafana.ceph1
>>>   ceph1  running (9h)  64s ago2w   6.7.4
>>> quay.io/ceph/ceph-grafana:6.7.4
>>>557c83e11646  7583a8dc4c61
>>> mgr.ceph1.smfvfd
>>>  ceph1  running (8h)  64s ago8h   15.2.17
>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>  93146564743f  1aab837306d2
>>> mon.ceph1
>>>   ceph1  running (9h)  64s ago2w   15.2.17quay.io/ceph/ceph:v15
>>>
>>>  93146564743f  c1d155d8c7ad
>>> node-exporter.ceph1
>>>   ceph1  running (9h)  64s ago2w   0.18.1
>>> quay.io/prometheus/node-exporter:v0.18.1
>>> e5a616e4b9cf  2ff235fe0e42
>>> node-exporter.ceph2
>>>   ceph2  running (9h)  65s ago13d  0.18.1
>>> quay.io/prometheus/node-exporter:v0.18.1
>>> e5a616e4b9cf  17678b9ba602
>>> osd.0
>>>   ceph1  running (9h)  64s ago13d  15.2.17quay.io/ceph/ceph:v15
>>>
>>>  93146564743f  d0fd73b777a3
>>> osd.1
>>>   ceph1  running (9h)  64s ago13

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Hi Adam,

I run the following command to upgrade but it looks like nothing is
happening

$ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10

Status message is empty..

root@ceph1:~# ceph orch upgrade status
{
"target_image": "quay.io/ceph/ceph:v16.2.10",
"in_progress": true,
"services_complete": [],
"message": ""
}

Nothing in Logs

root@ceph1:~# tail -f
/var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
2022-09-02T14:31:52.597661+ mgr.ceph2.huidoh (mgr.344392) 174 : cephadm
[INF] refreshing ceph2 facts
2022-09-02T14:31:52.991450+ mgr.ceph2.huidoh (mgr.344392) 176 : cephadm
[INF] refreshing ceph1 facts
2022-09-02T14:32:52.965092+ mgr.ceph2.huidoh (mgr.344392) 207 : cephadm
[INF] refreshing ceph2 facts
2022-09-02T14:32:53.369789+ mgr.ceph2.huidoh (mgr.344392) 208 : cephadm
[INF] refreshing ceph1 facts
2022-09-02T14:33:53.367986+ mgr.ceph2.huidoh (mgr.344392) 239 : cephadm
[INF] refreshing ceph2 facts
2022-09-02T14:33:53.760427+ mgr.ceph2.huidoh (mgr.344392) 240 : cephadm
[INF] refreshing ceph1 facts
2022-09-02T14:34:53.754277+ mgr.ceph2.huidoh (mgr.344392) 272 : cephadm
[INF] refreshing ceph2 facts
2022-09-02T14:34:54.162503+ mgr.ceph2.huidoh (mgr.344392) 273 : cephadm
[INF] refreshing ceph1 facts
2022-09-02T14:35:54.133467+ mgr.ceph2.huidoh (mgr.344392) 305 : cephadm
[INF] refreshing ceph2 facts
2022-09-02T14:35:54.522171+ mgr.ceph2.huidoh (mgr.344392) 306 : cephadm
[INF] refreshing ceph1 facts

In progress that mesg stuck there for long time

root@ceph1:~# ceph -s
  cluster:
id: f270ad9e-1f6f-11ed-b6f8-a539d87379ea
health: HEALTH_OK

  services:
mon: 1 daemons, quorum ceph1 (age 9h)
mgr: ceph2.huidoh(active, since 9m), standbys: ceph1.smfvfd
osd: 4 osds: 4 up (since 9h), 4 in (since 11h)

  data:
pools:   5 pools, 129 pgs
objects: 20.06k objects, 83 GiB
usage:   168 GiB used, 632 GiB / 800 GiB avail
pgs: 129 active+clean

  io:
client:   12 KiB/s wr, 0 op/s rd, 1 op/s wr

  progress:
Upgrade to quay.io/ceph/ceph:v16.2.10 (0s)
  []

On Fri, Sep 2, 2022 at 10:25 AM Satish Patel  wrote:

> It Looks like I did it with the following command.
>
> $ ceph orch daemon add mgr ceph2:10.73.0.192
>
> Now i can see two with same version 15.x
>
> root@ceph1:~# ceph orch ps --daemon-type mgr
> NAME  HOST   STATUS REFRESHED  AGE  VERSION  IMAGE
> NAME
>   IMAGE ID  CONTAINER ID
> mgr.ceph1.smfvfd  ceph1  running (8h)   41s ago8h   15.2.17
> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>  93146564743f  1aab837306d2
> mgr.ceph2.huidoh  ceph2  running (60s)  110s ago   60s  15.2.17
> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>  93146564743f  294fd6ab6c97
>
> On Fri, Sep 2, 2022 at 10:19 AM Satish Patel  wrote:
>
>> Let's come back to the original question: how to bring back the second
>> mgr?
>>
>> root@ceph1:~# ceph orch apply mgr 2
>> Scheduled mgr update...
>>
>> Nothing happened with above command, logs saying nothing
>>
>> 2022-09-02T14:16:20.407927+ mgr.ceph1.smfvfd (mgr.334626) 16939 :
>> cephadm [INF] refreshing ceph2 facts
>> 2022-09-02T14:16:40.247195+ mgr.ceph1.smfvfd (mgr.334626) 16952 :
>> cephadm [INF] Saving service mgr spec with placement count:2
>> 2022-09-02T14:16:53.106919+ mgr.ceph1.smfvfd (mgr.334626) 16961 :
>> cephadm [INF] Saving service mgr spec with placement count:2
>> 2022-09-02T14:17:19.135203+ mgr.ceph1.smfvfd (mgr.334626) 16975 :
>> cephadm [INF] refreshing ceph1 facts
>> 2022-09-02T14:17:20.780496+ mgr.ceph1.smfvfd (mgr.334626) 16977 :
>> cephadm [INF] refreshing ceph2 facts
>> 2022-09-02T14:18:19.502034+ mgr.ceph1.smfvfd (mgr.334626) 17008 :
>> cephadm [INF] refreshing ceph1 facts
>> 2022-09-02T14:18:21.127973+ mgr.ceph1.smfvfd (mgr.334626) 17010 :
>> cephadm [INF] refreshing ceph2 facts
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Sep 2, 2022 at 10:15 AM Satish Patel 
>> wrote:
>>
>>> Hi Adam,
>>>
>>> Wait..wait.. now it's working suddenly without doing anything.. very odd
>>>
>>> root@ceph1:~# ceph orch ls
>>> NAME  RUNNING  REFRESHED  AGE  PLACEMENTIMAGE NAME
>>>
>>>   IMAGE ID
>>> alertmanager  1/1  5s ago 2w   count:1
>>> quay.io/prometheus/alertmanager:v0.20.0
>>>0881eb8f169f
>>> crash 2/2  5s ago 2w   *
>>> quay.io/ceph/ceph:v15
>>>93146564743f
>>> grafana   1/1  5s ago 2w   count:1
>>> quay.io/ceph/ceph-grafana:6.7.4
>>>557c83e11646
>>> mgr   1/2  5s ago 8h   count:2
>>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>>  93146564743f
>>> mon   1/2  5s ago 8h   ceph1;ceph2
>>> quay.io/ceph/ceph:v15
>>>93146564743f
>>> node-exporter 

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
hmm, at this point, maybe we should just try manually upgrading the mgr
daemons and then move from there. First, just stop the upgrade "ceph orch
upgrade stop". If you figure out which of the two mgr daemons is the
standby (it should say which one is active in "ceph -s" output) and then do
a "ceph orch daemon redeploy  quay.io/ceph/ceph:v16.2.10"
it should redeploy that specific mgr with the new version. You could then
do a "ceph mgr fail" to swap which of the mgr daemons is active, then do
another "ceph orch daemon redeploy 
quay.io/ceph/ceph:v16.2.10" where the standby is now the other mgr still on
15.2.17. Once the mgr daemons are both upgraded to the new version, run a
"ceph orch redeploy mgr" and then "ceph orch upgrade start --image
quay.io/ceph/ceph:v16.2.10" and see if it goes better.

On Fri, Sep 2, 2022 at 10:36 AM Satish Patel  wrote:

> Hi Adam,
>
> I run the following command to upgrade but it looks like nothing is
> happening
>
> $ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10
>
> Status message is empty..
>
> root@ceph1:~# ceph orch upgrade status
> {
> "target_image": "quay.io/ceph/ceph:v16.2.10",
> "in_progress": true,
> "services_complete": [],
> "message": ""
> }
>
> Nothing in Logs
>
> root@ceph1:~# tail -f
> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
> 2022-09-02T14:31:52.597661+ mgr.ceph2.huidoh (mgr.344392) 174 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T14:31:52.991450+ mgr.ceph2.huidoh (mgr.344392) 176 :
> cephadm [INF] refreshing ceph1 facts
> 2022-09-02T14:32:52.965092+ mgr.ceph2.huidoh (mgr.344392) 207 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T14:32:53.369789+ mgr.ceph2.huidoh (mgr.344392) 208 :
> cephadm [INF] refreshing ceph1 facts
> 2022-09-02T14:33:53.367986+ mgr.ceph2.huidoh (mgr.344392) 239 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T14:33:53.760427+ mgr.ceph2.huidoh (mgr.344392) 240 :
> cephadm [INF] refreshing ceph1 facts
> 2022-09-02T14:34:53.754277+ mgr.ceph2.huidoh (mgr.344392) 272 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T14:34:54.162503+ mgr.ceph2.huidoh (mgr.344392) 273 :
> cephadm [INF] refreshing ceph1 facts
> 2022-09-02T14:35:54.133467+ mgr.ceph2.huidoh (mgr.344392) 305 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T14:35:54.522171+ mgr.ceph2.huidoh (mgr.344392) 306 :
> cephadm [INF] refreshing ceph1 facts
>
> In progress that mesg stuck there for long time
>
> root@ceph1:~# ceph -s
>   cluster:
> id: f270ad9e-1f6f-11ed-b6f8-a539d87379ea
> health: HEALTH_OK
>
>   services:
> mon: 1 daemons, quorum ceph1 (age 9h)
> mgr: ceph2.huidoh(active, since 9m), standbys: ceph1.smfvfd
> osd: 4 osds: 4 up (since 9h), 4 in (since 11h)
>
>   data:
> pools:   5 pools, 129 pgs
> objects: 20.06k objects, 83 GiB
> usage:   168 GiB used, 632 GiB / 800 GiB avail
> pgs: 129 active+clean
>
>   io:
> client:   12 KiB/s wr, 0 op/s rd, 1 op/s wr
>
>   progress:
> Upgrade to quay.io/ceph/ceph:v16.2.10 (0s)
>   []
>
> On Fri, Sep 2, 2022 at 10:25 AM Satish Patel  wrote:
>
>> It Looks like I did it with the following command.
>>
>> $ ceph orch daemon add mgr ceph2:10.73.0.192
>>
>> Now i can see two with same version 15.x
>>
>> root@ceph1:~# ceph orch ps --daemon-type mgr
>> NAME  HOST   STATUS REFRESHED  AGE  VERSION  IMAGE
>> NAME
>>   IMAGE ID  CONTAINER ID
>> mgr.ceph1.smfvfd  ceph1  running (8h)   41s ago8h   15.2.17
>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>  93146564743f  1aab837306d2
>> mgr.ceph2.huidoh  ceph2  running (60s)  110s ago   60s  15.2.17
>> quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
>>  93146564743f  294fd6ab6c97
>>
>> On Fri, Sep 2, 2022 at 10:19 AM Satish Patel 
>> wrote:
>>
>>> Let's come back to the original question: how to bring back the second
>>> mgr?
>>>
>>> root@ceph1:~# ceph orch apply mgr 2
>>> Scheduled mgr update...
>>>
>>> Nothing happened with above command, logs saying nothing
>>>
>>> 2022-09-02T14:16:20.407927+ mgr.ceph1.smfvfd (mgr.334626) 16939 :
>>> cephadm [INF] refreshing ceph2 facts
>>> 2022-09-02T14:16:40.247195+ mgr.ceph1.smfvfd (mgr.334626) 16952 :
>>> cephadm [INF] Saving service mgr spec with placement count:2
>>> 2022-09-02T14:16:53.106919+ mgr.ceph1.smfvfd (mgr.334626) 16961 :
>>> cephadm [INF] Saving service mgr spec with placement count:2
>>> 2022-09-02T14:17:19.135203+ mgr.ceph1.smfvfd (mgr.334626) 16975 :
>>> cephadm [INF] refreshing ceph1 facts
>>> 2022-09-02T14:17:20.780496+ mgr.ceph1.smfvfd (mgr.334626) 16977 :
>>> cephadm [INF] refreshing ceph2 facts
>>> 2022-09-02T14:18:19.502034+ mgr.ceph1.smfvfd (mgr.334626) 17008 :
>>> cephadm [INF] refreshing ceph1 facts
>>> 2022-09-02T14:18:21.127973+ mgr.ceph1.smfvfd (mgr.334626) 17010 :
>>> cephadm [INF] r

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Hi Adam,

As you said, i did following

$ ceph orch daemon redeploy mgr.ceph1.smfvfd  quay.io/ceph/ceph:v16.2.10

Noticed following line in logs but then no activity nothing, still standby
mgr running in older version

2022-09-02T15:35:45.753093+ mgr.ceph2.huidoh (mgr.344392) 2226 :
cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd
2022-09-02T15:36:17.279190+ mgr.ceph2.huidoh (mgr.344392) 2245 :
cephadm [INF] refreshing ceph2 facts
2022-09-02T15:36:17.984478+ mgr.ceph2.huidoh (mgr.344392) 2246 :
cephadm [INF] refreshing ceph1 facts
2022-09-02T15:37:17.663730+ mgr.ceph2.huidoh (mgr.344392) 2284 :
cephadm [INF] refreshing ceph2 facts
2022-09-02T15:37:18.386586+ mgr.ceph2.huidoh (mgr.344392) 2285 :
cephadm [INF] refreshing ceph1 facts

I am not seeing any image get downloaded also

root@ceph1:~# docker image ls
REPOSITORY TAG   IMAGE ID   CREATED
SIZE
quay.io/ceph/ceph  v15   93146564743f   3 weeks ago
1.2GB
quay.io/ceph/ceph-grafana  8.3.5 dad864ee21e9   4 months ago
 558MB
quay.io/prometheus/prometheus  v2.33.4   514e6a882f6e   6 months ago
 204MB
quay.io/prometheus/alertmanagerv0.23.0   ba2b418f427c   12 months ago
57.5MB
quay.io/ceph/ceph-grafana  6.7.4 557c83e11646   13 months ago
486MB
quay.io/prometheus/prometheus  v2.18.1   de242295e225   2 years ago
140MB
quay.io/prometheus/alertmanagerv0.20.0   0881eb8f169f   2 years ago
52.1MB
quay.io/prometheus/node-exporter   v0.18.1   e5a616e4b9cf   3 years ago
22.9MB


On Fri, Sep 2, 2022 at 11:06 AM Adam King  wrote:

> hmm, at this point, maybe we should just try manually upgrading the mgr
> daemons and then move from there. First, just stop the upgrade "ceph orch
> upgrade stop". If you figure out which of the two mgr daemons is the
> standby (it should say which one is active in "ceph -s" output) and then do
> a "ceph orch daemon redeploy  quay.io/ceph/ceph:v16.2.10"
> it should redeploy that specific mgr with the new version. You could then
> do a "ceph mgr fail" to swap which of the mgr daemons is active, then do
> another "ceph orch daemon redeploy 
> quay.io/ceph/ceph:v16.2.10" where the standby is now the other mgr still
> on 15.2.17. Once the mgr daemons are both upgraded to the new version, run
> a "ceph orch redeploy mgr" and then "ceph orch upgrade start --image
> quay.io/ceph/ceph:v16.2.10" and see if it goes better.
>
> On Fri, Sep 2, 2022 at 10:36 AM Satish Patel  wrote:
>
>> Hi Adam,
>>
>> I run the following command to upgrade but it looks like nothing is
>> happening
>>
>> $ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10
>>
>> Status message is empty..
>>
>> root@ceph1:~# ceph orch upgrade status
>> {
>> "target_image": "quay.io/ceph/ceph:v16.2.10",
>> "in_progress": true,
>> "services_complete": [],
>> "message": ""
>> }
>>
>> Nothing in Logs
>>
>> root@ceph1:~# tail -f
>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
>> 2022-09-02T14:31:52.597661+ mgr.ceph2.huidoh (mgr.344392) 174 :
>> cephadm [INF] refreshing ceph2 facts
>> 2022-09-02T14:31:52.991450+ mgr.ceph2.huidoh (mgr.344392) 176 :
>> cephadm [INF] refreshing ceph1 facts
>> 2022-09-02T14:32:52.965092+ mgr.ceph2.huidoh (mgr.344392) 207 :
>> cephadm [INF] refreshing ceph2 facts
>> 2022-09-02T14:32:53.369789+ mgr.ceph2.huidoh (mgr.344392) 208 :
>> cephadm [INF] refreshing ceph1 facts
>> 2022-09-02T14:33:53.367986+ mgr.ceph2.huidoh (mgr.344392) 239 :
>> cephadm [INF] refreshing ceph2 facts
>> 2022-09-02T14:33:53.760427+ mgr.ceph2.huidoh (mgr.344392) 240 :
>> cephadm [INF] refreshing ceph1 facts
>> 2022-09-02T14:34:53.754277+ mgr.ceph2.huidoh (mgr.344392) 272 :
>> cephadm [INF] refreshing ceph2 facts
>> 2022-09-02T14:34:54.162503+ mgr.ceph2.huidoh (mgr.344392) 273 :
>> cephadm [INF] refreshing ceph1 facts
>> 2022-09-02T14:35:54.133467+ mgr.ceph2.huidoh (mgr.344392) 305 :
>> cephadm [INF] refreshing ceph2 facts
>> 2022-09-02T14:35:54.522171+ mgr.ceph2.huidoh (mgr.344392) 306 :
>> cephadm [INF] refreshing ceph1 facts
>>
>> In progress that mesg stuck there for long time
>>
>> root@ceph1:~# ceph -s
>>   cluster:
>> id: f270ad9e-1f6f-11ed-b6f8-a539d87379ea
>> health: HEALTH_OK
>>
>>   services:
>> mon: 1 daemons, quorum ceph1 (age 9h)
>> mgr: ceph2.huidoh(active, since 9m), standbys: ceph1.smfvfd
>> osd: 4 osds: 4 up (since 9h), 4 in (since 11h)
>>
>>   data:
>> pools:   5 pools, 129 pgs
>> objects: 20.06k objects, 83 GiB
>> usage:   168 GiB used, 632 GiB / 800 GiB avail
>> pgs: 129 active+clean
>>
>>   io:
>> client:   12 KiB/s wr, 0 op/s rd, 1 op/s wr
>>
>>   progress:
>> Upgrade to quay.io/ceph/ceph:v16.2.10 (0s)
>>   []
>>
>> On Fri, Sep 2, 2022 at 10:25 AM Satish Patel 
>> wrote:
>>
>>> It Looks like I did it with the following command.
>>>
>>> $ ceph orch daemon add mgr ceph2:10.73.0.192
>>>
>>> Now i

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2022-09-02 Thread Ernesto Puerta
Hi Kevin,


> Isn't this one of the reasons containers were pushed, so that the
> packaging isn't as big a deal?
>

Yes, but the Ceph community has a strong commitment to provide distro
packages for those users who are not interested in moving to containers.

Is it the continued push to support lots of distros without using
> containers that is the problem?
>

If not a problem, it definitely makes it more challenging. Compiled
components often sort this out by statically linking deps whose packages
are not widely available in distros. The approach we're proposing here
would be the closest equivalent to static linking for interpreted code
(bundling).

Thanks for sharing your questions!

Kind regards,
Ernesto
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
hmm, okay. It seems like cephadm is stuck in general rather than an issue
specific to the upgrade. I'd first make sure the orchestrator isn't paused
(just running "ceph orch resume" should be enough, it's idempotent).

Beyond that, there was someone else who had an issue with things getting
stuck that was resolved in this thread
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M

that
might be worth a look.

If you haven't already, it's possible stopping the upgrade is a good idea,
as maybe that's interfering with it getting to the point where it does the
redeploy.

If none of those help, it might be worth setting the log level to debug and
seeing where things are ending up ("ceph config set mgr
mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then
waiting a few minutes before running "ceph log last 100 debug cephadm" (not
100% on format of that command, if it fails try just "ceph log last
cephadm"). We could maybe get more info on why it's not performing the
redeploy from those debug logs. Just remember to set the log level back
after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug
logs are quite verbose.

On Fri, Sep 2, 2022 at 11:39 AM Satish Patel  wrote:

> Hi Adam,
>
> As you said, i did following
>
> $ ceph orch daemon redeploy mgr.ceph1.smfvfd  quay.io/ceph/ceph:v16.2.10
>
> Noticed following line in logs but then no activity nothing, still standby
> mgr running in older version
>
> 2022-09-02T15:35:45.753093+ mgr.ceph2.huidoh (mgr.344392) 2226 :
> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd
> 2022-09-02T15:36:17.279190+ mgr.ceph2.huidoh (mgr.344392) 2245 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T15:36:17.984478+ mgr.ceph2.huidoh (mgr.344392) 2246 :
> cephadm [INF] refreshing ceph1 facts
> 2022-09-02T15:37:17.663730+ mgr.ceph2.huidoh (mgr.344392) 2284 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T15:37:18.386586+ mgr.ceph2.huidoh (mgr.344392) 2285 :
> cephadm [INF] refreshing ceph1 facts
>
> I am not seeing any image get downloaded also
>
> root@ceph1:~# docker image ls
> REPOSITORY TAG   IMAGE ID   CREATED
>   SIZE
> quay.io/ceph/ceph  v15   93146564743f   3 weeks ago
>   1.2GB
> quay.io/ceph/ceph-grafana  8.3.5 dad864ee21e9   4 months ago
>558MB
> quay.io/prometheus/prometheus  v2.33.4   514e6a882f6e   6 months ago
>204MB
> quay.io/prometheus/alertmanagerv0.23.0   ba2b418f427c   12 months ago
>   57.5MB
> quay.io/ceph/ceph-grafana  6.7.4 557c83e11646   13 months ago
>   486MB
> quay.io/prometheus/prometheus  v2.18.1   de242295e225   2 years ago
>   140MB
> quay.io/prometheus/alertmanagerv0.20.0   0881eb8f169f   2 years ago
>   52.1MB
> quay.io/prometheus/node-exporter   v0.18.1   e5a616e4b9cf   3 years ago
>   22.9MB
>
>
> On Fri, Sep 2, 2022 at 11:06 AM Adam King  wrote:
>
>> hmm, at this point, maybe we should just try manually upgrading the mgr
>> daemons and then move from there. First, just stop the upgrade "ceph orch
>> upgrade stop". If you figure out which of the two mgr daemons is the
>> standby (it should say which one is active in "ceph -s" output) and then do
>> a "ceph orch daemon redeploy 
>> quay.io/ceph/ceph:v16.2.10" it should redeploy that specific mgr with
>> the new version. You could then do a "ceph mgr fail" to swap which of the
>> mgr daemons is active, then do another "ceph orch daemon redeploy
>>  quay.io/ceph/ceph:v16.2.10" where the standby is now
>> the other mgr still on 15.2.17. Once the mgr daemons are both upgraded to
>> the new version, run a "ceph orch redeploy mgr" and then "ceph orch upgrade
>> start --image quay.io/ceph/ceph:v16.2.10" and see if it goes better.
>>
>> On Fri, Sep 2, 2022 at 10:36 AM Satish Patel 
>> wrote:
>>
>>> Hi Adam,
>>>
>>> I run the following command to upgrade but it looks like nothing is
>>> happening
>>>
>>> $ ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.10
>>>
>>> Status message is empty..
>>>
>>> root@ceph1:~# ceph orch upgrade status
>>> {
>>> "target_image": "quay.io/ceph/ceph:v16.2.10",
>>> "in_progress": true,
>>> "services_complete": [],
>>> "message": ""
>>> }
>>>
>>> Nothing in Logs
>>>
>>> root@ceph1:~# tail -f
>>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
>>> 2022-09-02T14:31:52.597661+ mgr.ceph2.huidoh (mgr.344392) 174 :
>>> cephadm [INF] refreshing ceph2 facts
>>> 2022-09-02T14:31:52.991450+ mgr.ceph2.huidoh (mgr.344392) 176 :
>>> cephadm [INF] refreshing ceph1 facts
>>> 2022-09-02T14:32:52.965092+ mgr.ceph2.huidoh (mgr.344392) 207 :
>>> cephadm [INF] refreshing ceph2 facts
>>> 2022-09-02T14:32:53.369789+ mgr.ceph2.huidoh (mgr.344392) 208 :
>>> cephadm [INF] ref

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Adam,

I have enabled debug and my logs flood with the following. I am going to
try some stuff from your provided mailing list and see..

root@ceph1:~# tail -f
/var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
2022-09-02T18:38:21.754391+ mgr.ceph2.huidoh (mgr.344392) 211198 :
cephadm [DBG] 0 OSDs are scheduled for removal: []
2022-09-02T18:38:21.754519+ mgr.ceph2.huidoh (mgr.344392) 211199 :
cephadm [DBG] Saving [] to store
2022-09-02T18:38:21.757155+ mgr.ceph2.huidoh (mgr.344392) 211200 :
cephadm [DBG] refreshing hosts and daemons
2022-09-02T18:38:21.758065+ mgr.ceph2.huidoh (mgr.344392) 211201 :
cephadm [DBG] _check_for_strays
2022-09-02T18:38:21.758334+ mgr.ceph2.huidoh (mgr.344392) 211202 :
cephadm [DBG] 0 OSDs are scheduled for removal: []
2022-09-02T18:38:21.758455+ mgr.ceph2.huidoh (mgr.344392) 211203 :
cephadm [DBG] Saving [] to store
2022-09-02T18:38:21.761001+ mgr.ceph2.huidoh (mgr.344392) 211204 :
cephadm [DBG] refreshing hosts and daemons
2022-09-02T18:38:21.762092+ mgr.ceph2.huidoh (mgr.344392) 211205 :
cephadm [DBG] _check_for_strays
2022-09-02T18:38:21.762357+ mgr.ceph2.huidoh (mgr.344392) 211206 :
cephadm [DBG] 0 OSDs are scheduled for removal: []
2022-09-02T18:38:21.762480+ mgr.ceph2.huidoh (mgr.344392) 211207 :
cephadm [DBG] Saving [] to store

On Fri, Sep 2, 2022 at 12:17 PM Adam King  wrote:

> hmm, okay. It seems like cephadm is stuck in general rather than an issue
> specific to the upgrade. I'd first make sure the orchestrator isn't paused
> (just running "ceph orch resume" should be enough, it's idempotent).
>
> Beyond that, there was someone else who had an issue with things getting
> stuck that was resolved in this thread
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M
> 
>  that
> might be worth a look.
>
> If you haven't already, it's possible stopping the upgrade is a good idea,
> as maybe that's interfering with it getting to the point where it does the
> redeploy.
>
> If none of those help, it might be worth setting the log level to debug
> and seeing where things are ending up ("ceph config set mgr
> mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then
> waiting a few minutes before running "ceph log last 100 debug cephadm" (not
> 100% on format of that command, if it fails try just "ceph log last
> cephadm"). We could maybe get more info on why it's not performing the
> redeploy from those debug logs. Just remember to set the log level back
> after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug
> logs are quite verbose.
>
> On Fri, Sep 2, 2022 at 11:39 AM Satish Patel  wrote:
>
>> Hi Adam,
>>
>> As you said, i did following
>>
>> $ ceph orch daemon redeploy mgr.ceph1.smfvfd  quay.io/ceph/ceph:v16.2.10
>>
>> Noticed following line in logs but then no activity nothing, still
>> standby mgr running in older version
>>
>> 2022-09-02T15:35:45.753093+ mgr.ceph2.huidoh (mgr.344392) 2226 :
>> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd
>> 2022-09-02T15:36:17.279190+ mgr.ceph2.huidoh (mgr.344392) 2245 :
>> cephadm [INF] refreshing ceph2 facts
>> 2022-09-02T15:36:17.984478+ mgr.ceph2.huidoh (mgr.344392) 2246 :
>> cephadm [INF] refreshing ceph1 facts
>> 2022-09-02T15:37:17.663730+ mgr.ceph2.huidoh (mgr.344392) 2284 :
>> cephadm [INF] refreshing ceph2 facts
>> 2022-09-02T15:37:18.386586+ mgr.ceph2.huidoh (mgr.344392) 2285 :
>> cephadm [INF] refreshing ceph1 facts
>>
>> I am not seeing any image get downloaded also
>>
>> root@ceph1:~# docker image ls
>> REPOSITORY TAG   IMAGE ID   CREATED
>>   SIZE
>> quay.io/ceph/ceph  v15   93146564743f   3 weeks ago
>> 1.2GB
>> quay.io/ceph/ceph-grafana  8.3.5 dad864ee21e9   4 months ago
>>558MB
>> quay.io/prometheus/prometheus  v2.33.4   514e6a882f6e   6 months ago
>>204MB
>> quay.io/prometheus/alertmanagerv0.23.0   ba2b418f427c   12 months
>> ago   57.5MB
>> quay.io/ceph/ceph-grafana  6.7.4 557c83e11646   13 months
>> ago   486MB
>> quay.io/prometheus/prometheus  v2.18.1   de242295e225   2 years ago
>> 140MB
>> quay.io/prometheus/alertmanagerv0.20.0   0881eb8f169f   2 years ago
>> 52.1MB
>> quay.io/prometheus/node-exporter   v0.18.1   e5a616e4b9cf   3 years ago
>> 22.9MB
>>
>>
>> On Fri, Sep 2, 2022 at 11:06 AM Adam King  wrote:
>>
>>> hmm, at this point, maybe we should just try manually upgrading the mgr
>>> daemons and then move from there. First, just stop the upgrade "ceph orch
>>> upgrade stop". If you figure out which of the two mgr daemons is the
>>> standby (it should say which one is active in "ceph -s" output) and then do
>>> a "ceph orch daemon redeploy 
>>> quay.io/ceph/ceph:v16.2.10

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Do you think this is because I have only a single MON daemon running?  I
have only two nodes.

On Fri, Sep 2, 2022 at 2:39 PM Satish Patel  wrote:

> Adam,
>
> I have enabled debug and my logs flood with the following. I am going to
> try some stuff from your provided mailing list and see..
>
> root@ceph1:~# tail -f
> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
> 2022-09-02T18:38:21.754391+ mgr.ceph2.huidoh (mgr.344392) 211198 :
> cephadm [DBG] 0 OSDs are scheduled for removal: []
> 2022-09-02T18:38:21.754519+ mgr.ceph2.huidoh (mgr.344392) 211199 :
> cephadm [DBG] Saving [] to store
> 2022-09-02T18:38:21.757155+ mgr.ceph2.huidoh (mgr.344392) 211200 :
> cephadm [DBG] refreshing hosts and daemons
> 2022-09-02T18:38:21.758065+ mgr.ceph2.huidoh (mgr.344392) 211201 :
> cephadm [DBG] _check_for_strays
> 2022-09-02T18:38:21.758334+ mgr.ceph2.huidoh (mgr.344392) 211202 :
> cephadm [DBG] 0 OSDs are scheduled for removal: []
> 2022-09-02T18:38:21.758455+ mgr.ceph2.huidoh (mgr.344392) 211203 :
> cephadm [DBG] Saving [] to store
> 2022-09-02T18:38:21.761001+ mgr.ceph2.huidoh (mgr.344392) 211204 :
> cephadm [DBG] refreshing hosts and daemons
> 2022-09-02T18:38:21.762092+ mgr.ceph2.huidoh (mgr.344392) 211205 :
> cephadm [DBG] _check_for_strays
> 2022-09-02T18:38:21.762357+ mgr.ceph2.huidoh (mgr.344392) 211206 :
> cephadm [DBG] 0 OSDs are scheduled for removal: []
> 2022-09-02T18:38:21.762480+ mgr.ceph2.huidoh (mgr.344392) 211207 :
> cephadm [DBG] Saving [] to store
>
> On Fri, Sep 2, 2022 at 12:17 PM Adam King  wrote:
>
>> hmm, okay. It seems like cephadm is stuck in general rather than an issue
>> specific to the upgrade. I'd first make sure the orchestrator isn't paused
>> (just running "ceph orch resume" should be enough, it's idempotent).
>>
>> Beyond that, there was someone else who had an issue with things getting
>> stuck that was resolved in this thread
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M
>> 
>>  that
>> might be worth a look.
>>
>> If you haven't already, it's possible stopping the upgrade is a good
>> idea, as maybe that's interfering with it getting to the point where it
>> does the redeploy.
>>
>> If none of those help, it might be worth setting the log level to debug
>> and seeing where things are ending up ("ceph config set mgr
>> mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then
>> waiting a few minutes before running "ceph log last 100 debug cephadm" (not
>> 100% on format of that command, if it fails try just "ceph log last
>> cephadm"). We could maybe get more info on why it's not performing the
>> redeploy from those debug logs. Just remember to set the log level back
>> after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug
>> logs are quite verbose.
>>
>> On Fri, Sep 2, 2022 at 11:39 AM Satish Patel 
>> wrote:
>>
>>> Hi Adam,
>>>
>>> As you said, i did following
>>>
>>> $ ceph orch daemon redeploy mgr.ceph1.smfvfd  quay.io/ceph/ceph:v16.2.10
>>>
>>> Noticed following line in logs but then no activity nothing, still
>>> standby mgr running in older version
>>>
>>> 2022-09-02T15:35:45.753093+ mgr.ceph2.huidoh (mgr.344392) 2226 :
>>> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd
>>> 2022-09-02T15:36:17.279190+ mgr.ceph2.huidoh (mgr.344392) 2245 :
>>> cephadm [INF] refreshing ceph2 facts
>>> 2022-09-02T15:36:17.984478+ mgr.ceph2.huidoh (mgr.344392) 2246 :
>>> cephadm [INF] refreshing ceph1 facts
>>> 2022-09-02T15:37:17.663730+ mgr.ceph2.huidoh (mgr.344392) 2284 :
>>> cephadm [INF] refreshing ceph2 facts
>>> 2022-09-02T15:37:18.386586+ mgr.ceph2.huidoh (mgr.344392) 2285 :
>>> cephadm [INF] refreshing ceph1 facts
>>>
>>> I am not seeing any image get downloaded also
>>>
>>> root@ceph1:~# docker image ls
>>> REPOSITORY TAG   IMAGE ID   CREATED
>>> SIZE
>>> quay.io/ceph/ceph  v15   93146564743f   3 weeks ago
>>> 1.2GB
>>> quay.io/ceph/ceph-grafana  8.3.5 dad864ee21e9   4 months
>>> ago558MB
>>> quay.io/prometheus/prometheus  v2.33.4   514e6a882f6e   6 months
>>> ago204MB
>>> quay.io/prometheus/alertmanagerv0.23.0   ba2b418f427c   12 months
>>> ago   57.5MB
>>> quay.io/ceph/ceph-grafana  6.7.4 557c83e11646   13 months
>>> ago   486MB
>>> quay.io/prometheus/prometheus  v2.18.1   de242295e225   2 years ago
>>> 140MB
>>> quay.io/prometheus/alertmanagerv0.20.0   0881eb8f169f   2 years ago
>>> 52.1MB
>>> quay.io/prometheus/node-exporter   v0.18.1   e5a616e4b9cf   3 years ago
>>> 22.9MB
>>>
>>>
>>> On Fri, Sep 2, 2022 at 11:06 AM Adam King  wrote:
>>>
 hmm, at this point, maybe we should just try manually upgrading the mgr
 daemons 

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
I don't think the number of mons should have any effect on this. Looking at
your logs, the interesting thing is that all the messages are so close
together. Was this before having stopped the upgrade?

On Fri, Sep 2, 2022 at 2:53 PM Satish Patel  wrote:

> Do you think this is because I have only a single MON daemon running?  I
> have only two nodes.
>
> On Fri, Sep 2, 2022 at 2:39 PM Satish Patel  wrote:
>
>> Adam,
>>
>> I have enabled debug and my logs flood with the following. I am going to
>> try some stuff from your provided mailing list and see..
>>
>> root@ceph1:~# tail -f
>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
>> 2022-09-02T18:38:21.754391+ mgr.ceph2.huidoh (mgr.344392) 211198 :
>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>> 2022-09-02T18:38:21.754519+ mgr.ceph2.huidoh (mgr.344392) 211199 :
>> cephadm [DBG] Saving [] to store
>> 2022-09-02T18:38:21.757155+ mgr.ceph2.huidoh (mgr.344392) 211200 :
>> cephadm [DBG] refreshing hosts and daemons
>> 2022-09-02T18:38:21.758065+ mgr.ceph2.huidoh (mgr.344392) 211201 :
>> cephadm [DBG] _check_for_strays
>> 2022-09-02T18:38:21.758334+ mgr.ceph2.huidoh (mgr.344392) 211202 :
>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>> 2022-09-02T18:38:21.758455+ mgr.ceph2.huidoh (mgr.344392) 211203 :
>> cephadm [DBG] Saving [] to store
>> 2022-09-02T18:38:21.761001+ mgr.ceph2.huidoh (mgr.344392) 211204 :
>> cephadm [DBG] refreshing hosts and daemons
>> 2022-09-02T18:38:21.762092+ mgr.ceph2.huidoh (mgr.344392) 211205 :
>> cephadm [DBG] _check_for_strays
>> 2022-09-02T18:38:21.762357+ mgr.ceph2.huidoh (mgr.344392) 211206 :
>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>> 2022-09-02T18:38:21.762480+ mgr.ceph2.huidoh (mgr.344392) 211207 :
>> cephadm [DBG] Saving [] to store
>>
>> On Fri, Sep 2, 2022 at 12:17 PM Adam King  wrote:
>>
>>> hmm, okay. It seems like cephadm is stuck in general rather than an
>>> issue specific to the upgrade. I'd first make sure the orchestrator isn't
>>> paused (just running "ceph orch resume" should be enough, it's idempotent).
>>>
>>> Beyond that, there was someone else who had an issue with things getting
>>> stuck that was resolved in this thread
>>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M
>>> 
>>>  that
>>> might be worth a look.
>>>
>>> If you haven't already, it's possible stopping the upgrade is a good
>>> idea, as maybe that's interfering with it getting to the point where it
>>> does the redeploy.
>>>
>>> If none of those help, it might be worth setting the log level to debug
>>> and seeing where things are ending up ("ceph config set mgr
>>> mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then
>>> waiting a few minutes before running "ceph log last 100 debug cephadm" (not
>>> 100% on format of that command, if it fails try just "ceph log last
>>> cephadm"). We could maybe get more info on why it's not performing the
>>> redeploy from those debug logs. Just remember to set the log level back
>>> after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug
>>> logs are quite verbose.
>>>
>>> On Fri, Sep 2, 2022 at 11:39 AM Satish Patel 
>>> wrote:
>>>
 Hi Adam,

 As you said, i did following

 $ ceph orch daemon redeploy mgr.ceph1.smfvfd
 quay.io/ceph/ceph:v16.2.10

 Noticed following line in logs but then no activity nothing, still
 standby mgr running in older version

 2022-09-02T15:35:45.753093+ mgr.ceph2.huidoh (mgr.344392) 2226 :
 cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd
 2022-09-02T15:36:17.279190+ mgr.ceph2.huidoh (mgr.344392) 2245 :
 cephadm [INF] refreshing ceph2 facts
 2022-09-02T15:36:17.984478+ mgr.ceph2.huidoh (mgr.344392) 2246 :
 cephadm [INF] refreshing ceph1 facts
 2022-09-02T15:37:17.663730+ mgr.ceph2.huidoh (mgr.344392) 2284 :
 cephadm [INF] refreshing ceph2 facts
 2022-09-02T15:37:18.386586+ mgr.ceph2.huidoh (mgr.344392) 2285 :
 cephadm [INF] refreshing ceph1 facts

 I am not seeing any image get downloaded also

 root@ceph1:~# docker image ls
 REPOSITORY TAG   IMAGE ID   CREATED
 SIZE
 quay.io/ceph/ceph  v15   93146564743f   3 weeks
 ago 1.2GB
 quay.io/ceph/ceph-grafana  8.3.5 dad864ee21e9   4 months
 ago558MB
 quay.io/prometheus/prometheus  v2.33.4   514e6a882f6e   6 months
 ago204MB
 quay.io/prometheus/alertmanagerv0.23.0   ba2b418f427c   12 months
 ago   57.5MB
 quay.io/ceph/ceph-grafana  6.7.4 557c83e11646   13 months
 ago   486MB
 quay.io/prometheus/prometheus  v2.18.1   de242295e225   2 y

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Yes, i have stopped upgrade and those log before upgrade

On Fri, Sep 2, 2022 at 3:27 PM Adam King  wrote:

> I don't think the number of mons should have any effect on this. Looking
> at your logs, the interesting thing is that all the messages are so close
> together. Was this before having stopped the upgrade?
>
> On Fri, Sep 2, 2022 at 2:53 PM Satish Patel  wrote:
>
>> Do you think this is because I have only a single MON daemon running?  I
>> have only two nodes.
>>
>> On Fri, Sep 2, 2022 at 2:39 PM Satish Patel  wrote:
>>
>>> Adam,
>>>
>>> I have enabled debug and my logs flood with the following. I am going to
>>> try some stuff from your provided mailing list and see..
>>>
>>> root@ceph1:~# tail -f
>>> /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
>>> 2022-09-02T18:38:21.754391+ mgr.ceph2.huidoh (mgr.344392) 211198 :
>>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>>> 2022-09-02T18:38:21.754519+ mgr.ceph2.huidoh (mgr.344392) 211199 :
>>> cephadm [DBG] Saving [] to store
>>> 2022-09-02T18:38:21.757155+ mgr.ceph2.huidoh (mgr.344392) 211200 :
>>> cephadm [DBG] refreshing hosts and daemons
>>> 2022-09-02T18:38:21.758065+ mgr.ceph2.huidoh (mgr.344392) 211201 :
>>> cephadm [DBG] _check_for_strays
>>> 2022-09-02T18:38:21.758334+ mgr.ceph2.huidoh (mgr.344392) 211202 :
>>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>>> 2022-09-02T18:38:21.758455+ mgr.ceph2.huidoh (mgr.344392) 211203 :
>>> cephadm [DBG] Saving [] to store
>>> 2022-09-02T18:38:21.761001+ mgr.ceph2.huidoh (mgr.344392) 211204 :
>>> cephadm [DBG] refreshing hosts and daemons
>>> 2022-09-02T18:38:21.762092+ mgr.ceph2.huidoh (mgr.344392) 211205 :
>>> cephadm [DBG] _check_for_strays
>>> 2022-09-02T18:38:21.762357+ mgr.ceph2.huidoh (mgr.344392) 211206 :
>>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>>> 2022-09-02T18:38:21.762480+ mgr.ceph2.huidoh (mgr.344392) 211207 :
>>> cephadm [DBG] Saving [] to store
>>>
>>> On Fri, Sep 2, 2022 at 12:17 PM Adam King  wrote:
>>>
 hmm, okay. It seems like cephadm is stuck in general rather than an
 issue specific to the upgrade. I'd first make sure the orchestrator isn't
 paused (just running "ceph orch resume" should be enough, it's idempotent).

 Beyond that, there was someone else who had an issue with things
 getting stuck that was resolved in this thread
 https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M
 
  that
 might be worth a look.

 If you haven't already, it's possible stopping the upgrade is a good
 idea, as maybe that's interfering with it getting to the point where it
 does the redeploy.

 If none of those help, it might be worth setting the log level to debug
 and seeing where things are ending up ("ceph config set mgr
 mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then
 waiting a few minutes before running "ceph log last 100 debug cephadm" (not
 100% on format of that command, if it fails try just "ceph log last
 cephadm"). We could maybe get more info on why it's not performing the
 redeploy from those debug logs. Just remember to set the log level back
 after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug
 logs are quite verbose.

 On Fri, Sep 2, 2022 at 11:39 AM Satish Patel 
 wrote:

> Hi Adam,
>
> As you said, i did following
>
> $ ceph orch daemon redeploy mgr.ceph1.smfvfd
> quay.io/ceph/ceph:v16.2.10
>
> Noticed following line in logs but then no activity nothing, still
> standby mgr running in older version
>
> 2022-09-02T15:35:45.753093+ mgr.ceph2.huidoh (mgr.344392) 2226 :
> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd
> 2022-09-02T15:36:17.279190+ mgr.ceph2.huidoh (mgr.344392) 2245 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T15:36:17.984478+ mgr.ceph2.huidoh (mgr.344392) 2246 :
> cephadm [INF] refreshing ceph1 facts
> 2022-09-02T15:37:17.663730+ mgr.ceph2.huidoh (mgr.344392) 2284 :
> cephadm [INF] refreshing ceph2 facts
> 2022-09-02T15:37:18.386586+ mgr.ceph2.huidoh (mgr.344392) 2285 :
> cephadm [INF] refreshing ceph1 facts
>
> I am not seeing any image get downloaded also
>
> root@ceph1:~# docker image ls
> REPOSITORY TAG   IMAGE ID   CREATED
>   SIZE
> quay.io/ceph/ceph  v15   93146564743f   3 weeks
> ago 1.2GB
> quay.io/ceph/ceph-grafana  8.3.5 dad864ee21e9   4 months
> ago558MB
> quay.io/prometheus/prometheus  v2.33.4   514e6a882f6e   6 months
> ago204MB
> quay.io/prometheus/alertmanager 

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Satish Patel
Adam,

In google someone suggested a manual upgrade using the following method and
it seems to work but I am stuck in MON redeploy.. haha

Go to mgr container and edit /var/lib/ceph/$fsid/mgr.$whatever/unit.run
file and change ceph/ceph:v16.2.10 on both mgr and restart mgr service
using systemctl restart 

After a few minutes I noticed the docker downloaded image and I can see
both mgr running with the 16.2.10 version.

Now i have tried to do an upgrade and nothing happened so I used the same
manual method with MON node and did use command ceph orch daemon redeploy
mon.ceph1 which destroyed mon service and now i can't do anything because i
don't have mon. ceph -s and all other command hangs

Try to find out how to get back mon :)



On Fri, Sep 2, 2022 at 3:34 PM Satish Patel  wrote:

> Yes, i have stopped upgrade and those log before upgrade
>
> On Fri, Sep 2, 2022 at 3:27 PM Adam King  wrote:
>
>> I don't think the number of mons should have any effect on this. Looking
>> at your logs, the interesting thing is that all the messages are so close
>> together. Was this before having stopped the upgrade?
>>
>> On Fri, Sep 2, 2022 at 2:53 PM Satish Patel  wrote:
>>
>>> Do you think this is because I have only a single MON daemon running?  I
>>> have only two nodes.
>>>
>>> On Fri, Sep 2, 2022 at 2:39 PM Satish Patel 
>>> wrote:
>>>
 Adam,

 I have enabled debug and my logs flood with the following. I am going
 to try some stuff from your provided mailing list and see..

 root@ceph1:~# tail -f
 /var/log/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ceph.cephadm.log
 2022-09-02T18:38:21.754391+ mgr.ceph2.huidoh (mgr.344392) 211198 :
 cephadm [DBG] 0 OSDs are scheduled for removal: []
 2022-09-02T18:38:21.754519+ mgr.ceph2.huidoh (mgr.344392) 211199 :
 cephadm [DBG] Saving [] to store
 2022-09-02T18:38:21.757155+ mgr.ceph2.huidoh (mgr.344392) 211200 :
 cephadm [DBG] refreshing hosts and daemons
 2022-09-02T18:38:21.758065+ mgr.ceph2.huidoh (mgr.344392) 211201 :
 cephadm [DBG] _check_for_strays
 2022-09-02T18:38:21.758334+ mgr.ceph2.huidoh (mgr.344392) 211202 :
 cephadm [DBG] 0 OSDs are scheduled for removal: []
 2022-09-02T18:38:21.758455+ mgr.ceph2.huidoh (mgr.344392) 211203 :
 cephadm [DBG] Saving [] to store
 2022-09-02T18:38:21.761001+ mgr.ceph2.huidoh (mgr.344392) 211204 :
 cephadm [DBG] refreshing hosts and daemons
 2022-09-02T18:38:21.762092+ mgr.ceph2.huidoh (mgr.344392) 211205 :
 cephadm [DBG] _check_for_strays
 2022-09-02T18:38:21.762357+ mgr.ceph2.huidoh (mgr.344392) 211206 :
 cephadm [DBG] 0 OSDs are scheduled for removal: []
 2022-09-02T18:38:21.762480+ mgr.ceph2.huidoh (mgr.344392) 211207 :
 cephadm [DBG] Saving [] to store

 On Fri, Sep 2, 2022 at 12:17 PM Adam King  wrote:

> hmm, okay. It seems like cephadm is stuck in general rather than an
> issue specific to the upgrade. I'd first make sure the orchestrator isn't
> paused (just running "ceph orch resume" should be enough, it's 
> idempotent).
>
> Beyond that, there was someone else who had an issue with things
> getting stuck that was resolved in this thread
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M/#NKKLV5TMHFA3ERGCMJ3M7BVLA5PGIR4M
> 
>  that
> might be worth a look.
>
> If you haven't already, it's possible stopping the upgrade is a good
> idea, as maybe that's interfering with it getting to the point where it
> does the redeploy.
>
> If none of those help, it might be worth setting the log level to
> debug and seeing where things are ending up ("ceph config set mgr
> mgr/cephadm/log_to_cluster_level debug; ceph orch ps --refresh" then
> waiting a few minutes before running "ceph log last 100 debug cephadm" 
> (not
> 100% on format of that command, if it fails try just "ceph log last
> cephadm"). We could maybe get more info on why it's not performing the
> redeploy from those debug logs. Just remember to set the log level back
> after 'ceph config set mgr mgr/cephadm/log_to_cluster_level info' as debug
> logs are quite verbose.
>
> On Fri, Sep 2, 2022 at 11:39 AM Satish Patel 
> wrote:
>
>> Hi Adam,
>>
>> As you said, i did following
>>
>> $ ceph orch daemon redeploy mgr.ceph1.smfvfd
>> quay.io/ceph/ceph:v16.2.10
>>
>> Noticed following line in logs but then no activity nothing, still
>> standby mgr running in older version
>>
>> 2022-09-02T15:35:45.753093+ mgr.ceph2.huidoh (mgr.344392) 2226 :
>> cephadm [INF] Schedule redeploy daemon mgr.ceph1.smfvfd
>> 2022-09-02T15:36:17.279190+ mgr.ceph2.huidoh (mgr.344392) 2245 :
>>

[ceph-users] Re: Changing the cluster network range

2022-09-02 Thread Gregory Farnum
On Mon, Aug 29, 2022 at 12:49 AM Burkhard Linke
 wrote:
>
> Hi,
>
>
> some years ago we changed our setup from a IPoIB cluster network to a
> single network setup, which is a similar operation.
>
>
> The OSD use the cluster network for heartbeats and backfilling
> operation; both use standard tcp connection. There is no "global view"
> on the networks involved; OSDs announce their public and private network
> (if present) via an update to the OSD map on OSD boot. OSDs expect to be
> able to create TCP connections to the announced IP addresses and ports.
> Mon and mgr instances do not use the cluster network at all.
>
> If you want to change the networks (either public or private), you need
> to ensure that during the migration TCP connectivity between the old
> networks and the new networks is possible, e.g. via a route on some
> router. Since we had an isolated IPoIB networks without any connections
> to some router, we used one of the ceph hosts as router. Worked fine for
> a migration in live production ;-)

To be a little more explicit about this: Ceph stores the IP addresses
of live OSDs and MDSes in their respective cluster maps, but otherwise
does not care about them at all — they are updated to the daemon's
current IP on every boot. The monitor IP addresses are fixed
identities, so moving them requires either adding new monitors and
removing old ones, or else doing surgery on their databases to change
the IPs by editing the monmap they store (and then updating the local
config for clients and OSDs so they point to the new locations and can
find them on bootup.)
But for the OSDs etc, you're really just worried about the local
configs or your deployment tool. (And, depending on how you arrange
things, you may need to take care to avoid the OSDs moving into new
CRUSH buckets and migrating all their data.)
-Greg

>
> Regarding the network size: I'm not sure whether the code requires an
> exact CIDR match for the interface. If in doubt, have a look at the
> source code
>
> As already mentioned in another answer, most setups do not require an
> extra cluster network. It is extra effort both in setup, maintenance and
> operating. Unless your network is the bottleneck you might want to use
> this pending configuration change to switch to a single network setup.
>
>
> Regards,
>
> Burkhard Linke
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS MDS sizing

2022-09-02 Thread Gregory Farnum
On Sun, Aug 28, 2022 at 12:19 PM Vladimir Brik
 wrote:
>
> Hello
>
> Is there a way to query or get an approximate value of an
> MDS's cache hit ratio without using "dump loads" command
> (which seems to be a relatively expensive operation) for
> monitoring and such?
Unfortunately, I'm not seeing one. What problem are you actually
trying to solve with that information? The expensive part of that
command should be dumping all the directories in cache, so a new admin
command to dump just the cache hit rate and related statistics should
be a pretty simple feature — PRs welcome! ;)
-Greg

>
>
> Vlad
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Help] Does MSGR2 protocol use openssl for encryption

2022-09-02 Thread Gregory Farnum
We partly rolled our own with AES-GCM. See
https://docs.ceph.com/en/quincy/rados/configuration/msgr2/#connection-modes
and https://docs.ceph.com/en/quincy/dev/msgr2/#frame-format
-Greg

On Wed, Aug 24, 2022 at 4:50 PM Jinhao Hu  wrote:
>
> Hi,
>
> I have a question about the MSGR protocol Ceph used for in-transit
> encryption. Does it use openssl for encryption? If not, what tools does it
> use to encrypt the data? or Ceph implemented its own encryption method?
>
> Thanks,
> Jinhao
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [cephadm] not detecting new disk

2022-09-02 Thread Satish Patel
Folks,

I have created a new lab using cephadm and installed a new 1TB spinning
disk which is trying to add in a cluster but somehow ceph is not detecting
it.

$ parted /dev/sda print
Model: ATA WDC WD10EZEX-00B (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start  End  Size  File system  Name  Flags

Trying following but no luck

$ cephadm shell -- ceph orch daemon add osd os-ctrl-1:/dev/sda
Inferring fsid 351f8a26-2b31-11ed-b555-494149d85a01
Using recent ceph image
quay.io/ceph/ceph@sha256:c5fd9d806c54e5cc9db8efd50363e1edf7af62f101b264dccacb9d6091dcf7aa
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1446, in _handle_command
return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in
handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 414, in call
return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in

wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
 # noqa: E731
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 803, in
_daemon_add_osd
raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in
raise_if_exception
raise e
RuntimeError: cephadm exited with an error code: 1, stderr:Inferring config
/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/mon.os-ctrl-1/config
Non-zero exit code 2 from /usr/bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
--privileged --group-add=disk --init -e CONTAINER_IMAGE=
quay.io/ceph/ceph@sha256:c5fd9d806c54e5cc9db8efd50363e1edf7af62f101b264dccacb9d6091dcf7aa
-e NODE_NAME=os-ctrl-1 -e CEPH_USE_RANDOM_NONCE=1 -e
CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e
CEPH_VOLUME_DEBUG=1 -v
/var/run/ceph/351f8a26-2b31-11ed-b555-494149d85a01:/var/run/ceph:z -v
/var/log/ceph/351f8a26-2b31-11ed-b555-494149d85a01:/var/log/ceph:z -v
/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/crash:/var/lib/ceph/crash:z
-v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
/tmp/ceph-tmpznn3t_7i:/etc/ceph/ceph.conf:z -v
/tmp/ceph-tmpun8t5_ej:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
quay.io/ceph/ceph@sha256:c5fd9d806c54e5cc9db8efd50363e1edf7af62f101b264dccacb9d6091dcf7aa
lvm batch --no-auto /dev/sda --yes --no-systemd
/usr/bin/docker: stderr usage: ceph-volume lvm batch [-h] [--db-devices
[DB_DEVICES [DB_DEVICES ...]]]
/usr/bin/docker: stderr  [--wal-devices
[WAL_DEVICES [WAL_DEVICES ...]]]
/usr/bin/docker: stderr  [--journal-devices
[JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
/usr/bin/docker: stderr  [--auto] [--no-auto]
[--bluestore] [--filestore]
/usr/bin/docker: stderr  [--report] [--yes]
/usr/bin/docker: stderr  [--format
{json,json-pretty,pretty}] [--dmcrypt]
/usr/bin/docker: stderr  [--crush-device-class
CRUSH_DEVICE_CLASS]
/usr/bin/docker: stderr  [--no-systemd]
/usr/bin/docker: stderr  [--osds-per-device
OSDS_PER_DEVICE]
/usr/bin/docker: stderr  [--data-slots
DATA_SLOTS]
/usr/bin/docker: stderr  [--block-db-size
BLOCK_DB_SIZE]
/usr/bin/docker: stderr  [--block-db-slots
BLOCK_DB_SLOTS]
/usr/bin/docker: stderr  [--block-wal-size
BLOCK_WAL_SIZE]
/usr/bin/docker: stderr  [--block-wal-slots
BLOCK_WAL_SLOTS]
/usr/bin/docker: stderr  [--journal-size
JOURNAL_SIZE]
/usr/bin/docker: stderr  [--journal-slots
JOURNAL_SLOTS] [--prepare]
/usr/bin/docker: stderr  [--osd-ids [OSD_IDS
[OSD_IDS ...]]]
/usr/bin/docker: stderr  [DEVICES [DEVICES ...]]
/usr/bin/docker: stderr ceph-volume lvm batch: error: GPT headers found,
they must be removed on: /dev/sda
Traceback (most recent call last):
  File
"/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
line 8971, in 
main()
  File
"/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
line 8959, in main
r = ctx.func(ctx)
  File
"/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
line 1902, in _infer_config
return func(ctx)
  File
"/var/lib/ceph/351f8a26-2b31-1

[ceph-users] Re: [cephadm] not detecting new disk

2022-09-02 Thread Eugen Block
It is detecting the disk, but it contains a partition table so it  
can’t use it. Wipe the disk properly first.


Zitat von Satish Patel :


Folks,

I have created a new lab using cephadm and installed a new 1TB spinning
disk which is trying to add in a cluster but somehow ceph is not detecting
it.

$ parted /dev/sda print
Model: ATA WDC WD10EZEX-00B (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start  End  Size  File system  Name  Flags

Trying following but no luck

$ cephadm shell -- ceph orch daemon add osd os-ctrl-1:/dev/sda
Inferring fsid 351f8a26-2b31-11ed-b555-494149d85a01
Using recent ceph image
quay.io/ceph/ceph@sha256:c5fd9d806c54e5cc9db8efd50363e1edf7af62f101b264dccacb9d6091dcf7aa
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1446, in _handle_command
return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in
handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 414, in call
return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in

wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
 # noqa: E731
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 803, in
_daemon_add_osd
raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in
raise_if_exception
raise e
RuntimeError: cephadm exited with an error code: 1, stderr:Inferring config
/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/mon.os-ctrl-1/config
Non-zero exit code 2 from /usr/bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
--privileged --group-add=disk --init -e CONTAINER_IMAGE=
quay.io/ceph/ceph@sha256:c5fd9d806c54e5cc9db8efd50363e1edf7af62f101b264dccacb9d6091dcf7aa
-e NODE_NAME=os-ctrl-1 -e CEPH_USE_RANDOM_NONCE=1 -e
CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e
CEPH_VOLUME_DEBUG=1 -v
/var/run/ceph/351f8a26-2b31-11ed-b555-494149d85a01:/var/run/ceph:z -v
/var/log/ceph/351f8a26-2b31-11ed-b555-494149d85a01:/var/log/ceph:z -v
/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/crash:/var/lib/ceph/crash:z
-v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
/tmp/ceph-tmpznn3t_7i:/etc/ceph/ceph.conf:z -v
/tmp/ceph-tmpun8t5_ej:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
quay.io/ceph/ceph@sha256:c5fd9d806c54e5cc9db8efd50363e1edf7af62f101b264dccacb9d6091dcf7aa
lvm batch --no-auto /dev/sda --yes --no-systemd
/usr/bin/docker: stderr usage: ceph-volume lvm batch [-h] [--db-devices
[DB_DEVICES [DB_DEVICES ...]]]
/usr/bin/docker: stderr  [--wal-devices
[WAL_DEVICES [WAL_DEVICES ...]]]
/usr/bin/docker: stderr  [--journal-devices
[JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
/usr/bin/docker: stderr  [--auto] [--no-auto]
[--bluestore] [--filestore]
/usr/bin/docker: stderr  [--report] [--yes]
/usr/bin/docker: stderr  [--format
{json,json-pretty,pretty}] [--dmcrypt]
/usr/bin/docker: stderr  [--crush-device-class
CRUSH_DEVICE_CLASS]
/usr/bin/docker: stderr  [--no-systemd]
/usr/bin/docker: stderr  [--osds-per-device
OSDS_PER_DEVICE]
/usr/bin/docker: stderr  [--data-slots
DATA_SLOTS]
/usr/bin/docker: stderr  [--block-db-size
BLOCK_DB_SIZE]
/usr/bin/docker: stderr  [--block-db-slots
BLOCK_DB_SLOTS]
/usr/bin/docker: stderr  [--block-wal-size
BLOCK_WAL_SIZE]
/usr/bin/docker: stderr  [--block-wal-slots
BLOCK_WAL_SLOTS]
/usr/bin/docker: stderr  [--journal-size
JOURNAL_SIZE]
/usr/bin/docker: stderr  [--journal-slots
JOURNAL_SLOTS] [--prepare]
/usr/bin/docker: stderr  [--osd-ids [OSD_IDS
[OSD_IDS ...]]]
/usr/bin/docker: stderr  [DEVICES [DEVICES ...]]
/usr/bin/docker: stderr ceph-volume lvm batch: error: GPT headers found,
they must be removed on: /dev/sda
Traceback (most recent call last):
  File
"/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
line 8971, in 
main()
  File
"/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d",
line 8959, in main
r = ctx.func(ctx)
  File
"/var/lib/ceph/351f8a26-2b31-11ed-b555-494149d85a01/cephadm.7ce656a8721d