[ceph-users] Re: 'ceph orch upgrade...' causes an rbd outage on a proxmox cluster

2023-02-03 Thread Pierre BELLEMAIN
Hi @all,

I have good news!
Indeed, by passing the datastore in proxmox conf in kernel rbd (krbd) and by 
putting the IPs separated by commas (separated by spaces before), the VMs do 
not shut down.
I will do further testing to confirm the info.

Pierre.

- Mail original -
> De: "Pierre Bellemain" 
> À: "ceph-users" 
> Envoyé: Jeudi 2 Février 2023 18:37:19
> Objet: [ceph-users] 'ceph orch upgrade...' causes an rbd outage on a proxmox 
> cluster

> Hi everyone,
> (sorry for the spam, apparently I was not subscribed to the ml)
> 
> I have a ceph test cluster and a proxmox test cluster (for try upgrade in test
> before the prod).
> My ceph cluster is made up of three servers running debian 11, with two 
> separate
> networks (cluster_network and public_network, in VLANs).
> In ceph version 16.2.10 (cephadm with docker).
> Each server has one MGR, one MON and 8 OSDs.
> cluster:
> id: xxx
> health: HEALTH_OK
> 
> services:
> mon: 3 daemons, quorum ceph01,ceph03,ceph02 (age 2h)
> mgr: ceph03(active, since 77m), standbys: ceph01, ceph02
> osd: 24 osds: 24 up (since 7w), 24 in (since 6M)
> 
> data:
> pools: 3 pools, 65 pgs
> objects: 29.13k objects, 113 GiB
> usage: 344 GiB used, 52 TiB / 52 TiB avail
> pgs: 65 active+clean
> 
> io:
> client: 1.3 KiB/s wr, 0 op/s rd, 0 op/s wr
> 
> The proxmox cluster is also made up of 3 servers running proxmox 7.2-7 (with
> proxmox ceph pacific which is on 16.2.9 version). The ceph storage used is RBD
> (on the ceph public_network). I added the RBD datastores simply via the GUI.
> 
> So far so good. I have several VMs, on each of the proxmox.
> 
> When I update ceph to 16.2.11, that's where things go wrong.
> I don't like when the update does everything for me without control, so I did 
> a
> "staggered upgrade", following the official procedure
> (https://docs.ceph.com/en/pacific/cephadm/upgrade/#staggered-upgrade). As the
> version I'm starting from doesn't support staggered upgrade, I follow the
> procedure
> (https://docs.ceph.com/en/pacific/cephadm/upgrade/#upgrading-to-a-version-that-supports-staggered-upgrade-from-one-that-doesn-t).
> When I do the "ceph orch redeploy" of the two standby MGRs, everything is 
> fine.
> I do the "sudo ceph mgr fail", everything is fine (it switches well to an mgr
> which was standby, so I get an MGR 16.2.11).
> However, when I do the "sudo ceph orch upgrade start --image
> quay.io/ceph/ceph:v16.2.11 --daemon-types mgr", it updates me the last MGR
> which was not updated (so far everything is still fine), but it does a last
> restart of all the MGRs to finish, and there, the proxmox visibly loses the 
> RBD
> and turns off all my VMs.
> Here is the message in the proxmox syslog:
> Feb 2 16:20:52 pmox01 QEMU[436706]: terminate called after throwing an 
> instance
> of 'std::system_error'
> Feb 2 16:20:52 pmox01 QEMU[436706]: what(): Resource deadlock avoided
> Feb 2 16:20:52 pmox01 kernel: [17038607.686686] vmbr0: port 2(tap102i0) 
> entered
> disabled state
> Feb 2 16:20:52 pmox01 kernel: [17038607.779049] vmbr0: port 2(tap102i0) 
> entered
> disabled state
> Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Succeeded.
> Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Consumed 43.136s CPU time.
> Feb 2 16:20:53 pmox01 qmeventd[446872]: Starting cleanup for 102
> Feb 2 16:20:53 pmox01 qmeventd[446872]: Finished cleanup for 102
> 
> For ceph, everything is fine, it does the update, and tells me everything is 
> OK
> in the end.
> Ceph is now on 16.2.11 and the health is OK.
> 
> When I redo a downgrade of the MGRs, I have the problem again and when I start
> the procedure again, I still have the problem. It's very reproducible.
> According to my tests, the "sudo ceph orch upgrade" command always gives me
> trouble, even when trying a real staggered upgrade from and to version 16.2.11
> with the command:
> sudo ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.11 --daemon-types
> mgr --hosts ceph01 --limit 1
> 
> Does anyone have an idea?
> 
> Thank you everyone !
> Pierre.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: deploying Ceph using FQDN for MON / MDS Services

2023-02-03 Thread Ruidong Gao
Hi Lokendra,

To make monitor looked up through dns, ceph-mon also need to be resolved 
correctly by dns server just like _ceph-mon._tcp.
And ceph-mon is default service name, which doesn’t need to be in the conf file 
anyway.

Ben
> 2023年2月3日 12:14,Lokendra Rathour  写道:
> 
> Hi Robert and Team,
> 
> 
> 
> Thank you for the help. We had previously referred to the link:
> https://docs.ceph.com/en/quincy/rados/configuration/mon-lookup-dns/
> But we were not able to configure mon_dns_srv_name correctly.
> 
> 
> 
> We find the following link:
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/configuration_guide/ceph-monitor-configuration
> 
> 
> 
> Which gives just a little more information about the DNS lookup.
> 
> 
> 
> After following the link, we updated the ceph.conf as follows:
> ```
> [root@storagenode3 ~]# cat /etc/ceph/ceph.conf
> [global]
> ms bind ipv6 = true
> ms bind ipv4 = false
> mon initial members = storagenode1,storagenode2,storagenode3
> osd pool default crush rule = -1
> mon dns srv name = ceph-mon
> fsid = ce479912-a277-45b6-87b1-203d3e43d776
> public network = abcd:abcd:abcd::/64
> cluster network = eff0:eff0:eff0::/64
> 
> 
> 
> [osd]
> osd memory target = 4294967296
> 
> 
> 
> [client.rgw.storagenode3.rgw0]
> host = storagenode3
> keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode3.rgw0/keyring
> log file = /var/log/ceph/ceph-rgw-storagenode3.rgw0.log
> rgw frontends = beast endpoint=[abcd:abcd:abcd::23]:8080
> rgw thread pool size = 512
> 
> 
> 
> [root@storagenode3 ~]#
> ```
> 
> We also updated the dns server as follows:
> 
> ```
> storagenode1.storage.com  IN    abcd:abcd:abcd::21
> storagenode2.storage.com  IN    abcd:abcd:abcd::22
> storagenode3.storage.com  IN    abcd:abcd:abcd::23
> 
> 
> 
> _ceph-mon._tcp.storage.com 60 IN SRV 10 60 6789 storagenode1.storage.com
> _ceph-mon._tcp.storage.com 60 IN SRV 10 60 6789 storagenode2.storage.com
> _ceph-mon._tcp.storage.com 60 IN SRV 10 60 6789 storagenode3.storage.com
> _ceph-mon._tcp.storage.com 60 IN SRV 10 60 3300 storagenode1.storage.com
> _ceph-mon._tcp.storage.com 60 IN SRV 10 60 3300 storagenode2.storage.com
> _ceph-mon._tcp.storage.com 60 IN SRV 10 60 3300 storagenode3.storage.com
> 
> 
> ```
> 
> But when we run the command ceph -s, we get the following error:
> 
> ```
> [root@storagenode3 ~]# ceph -s
> unable to get monitor info from DNS SRV with service name: ceph-mon
> 2023-02-02T15:18:14.098+0530 7f1313a58700 -1 failed for service
> _ceph-mon._tcp
> 2023-02-02T15:18:14.098+0530 7f1313a58700 -1 monclient:
> get_monmap_and_config cannot identify monitors to contact
> [errno 2] RADOS object not found (error connecting to the cluster)
> [root@storagenode3 ~]#
> ```
> 
> Could you please help us to configure the server using mon_dns_srv_name
> correctly?
> 
> 
> 
> On Wed, Jan 25, 2023 at 9:06 PM John Mulligan 
> wrote:
> 
>> On Tuesday, January 24, 2023 9:02:41 AM EST Lokendra Rathour wrote:
>>> Hi Team,
>>> 
>>> 
>>> 
>>> We have a ceph cluster with 3 storage nodes:
>>> 
>>> 1. storagenode1 - abcd:abcd:abcd::21
>>> 
>>> 2. storagenode2 - abcd:abcd:abcd::22
>>> 
>>> 3. storagenode3 - abcd:abcd:abcd::23
>>> 
>>> 
>>> 
>>> The requirement is to mount ceph using the domain name of MON node:
>>> 
>>> Note: we resolved the domain name via DNS server.
>>> 
>>> 
>>> For this we are using the command:
>>> 
>>> ```
>>> 
>>> mount -t ceph [storagenode.storage.com]:6789:/  /backup -o
>>> name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg==
>>> 
>>> ```
>>> 
>>> 
>>> 
>>> We are getting the following logs in /var/log/messages:
>>> 
>>> ```
>>> 
>>> Jan 24 17:23:17 localhost kernel: libceph: resolve '
>> storagenode.storage.com'
>>> (ret=-3): failed
>>> 
>>> Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip '
>>> storagenode.storage.com:6789'
>>> 
>>> ```
>>> 
>> 
>> 
>> I saw a similar log message recently when I had forgotten to install the
>> ceph
>> mount helper.
>> Check to see if you have a binary 'mount.ceph' on the system. If you don't
>> try
>> to install it from packages. On fedora I needed to install 'ceph-common'.
>> 
>> 
>>> 
>>> 
>>> We also tried mounting ceph storage using IP of MON which is working
>> fine.
>>> 
>>> 
>>> 
>>> Query:
>>> 
>>> 
>>> Could you please help us out with how we can mount ceph using FQDN.
>>> 
>>> 
>>> 
>>> My /etc/ceph/ceph.conf is as follows:
>>> 
>>> [global]
>>> 
>>> ms bind ipv6 = true
>>> 
>>> ms bind ipv4 = false
>>> 
>>> mon initial members = storagenode1,storagenode2,storagenode3
>>> 
>>> osd pool default crush rule = -1
>>> 
>>> fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe
>>> 
>>> mon host =
>>> 
>> [v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:a
>>> 
>> bcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1
>>> :[abcd:abcd:abcd::23]:6789]
>>> 
>>> public network = abcd:abcd:abcd::/64
>>> 
>>> cluster network = eff0:eff0:eff0::/64
>>> 
>>> 
>>> 
>>> [osd]
>>> 
>>> osd memory target =

[ceph-users] Exit yolo mode by increasing size/min_size does not (really) work

2023-02-03 Thread Stefan Pinter
Hi! 😊

It would be very kind of you to help us with that!

We have pools in our ceph cluster that are set to replicated size 2 min_size 1.
Obviously we want to go to size 3 / min_size 2 but we experience problems with 
that.

USED goes to 100% instantly and MAX AVAIL goes to 0. Write operations seemed to 
stop.

   POOLS:
   NAME   ID USED   %USED   
   MAX AVAIL OBJECTS
   Pool1 24 35791G  35.0466339G 
8927762
   Pool2 25 11610G  14.89
66339G 3004740
   Pool3  26 17557G 100.00  
   0 2666972


Before the change it was like this:

   NAME   ID 
USED   %USED MAX AVAIL OBJECTS
   Pool1  24 35791G 35.04
66339G 8927762
   Pool2 25 11610G 14.89
66339G 3004740
   Pool3  26 17558G 20.93
66339G 2667013


This was quite surprising to us as we’d expect USED to go to something like 30%.
Going back to 2/1 also gave us back the 20.93% usage instantly.

What’s the matter here?

Thank you and best regards
Stefan

BearingPoint GmbH
Sitz: Wien
Firmenbuchgericht: Handelsgericht Wien
Firmenbuchnummer: FN 175524z

The information in this email is confidential and may be legally privileged. If 
you are not the intended recipient of this message, any review, disclosure, 
copying, distribution, retention, or any action taken or omitted to be taken in 
reliance on it is prohibited and may be unlawful. If you are not the intended 
recipient, please reply to or forward a copy of this message to the sender and 
delete the message, any attachments, and any copies thereof from your system.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-fuse in infinite loop reading objects without client requests

2023-02-03 Thread Andras Pataki
We've been running into a strange issue with ceph-fuse on some nodes 
lately.  After some job runs on the node (and finishes or gets killed), 
ceph-fuse gets stuck busy requesting objects from the OSDs without any 
processes on the node using cephfs.  When this happens, ceph-fuse uses 
2-3 cores, spinning in what seems like an infinite loop making objecter 
requests to various OSDs of files that were perhaps requested by some 
process that is long gone.  The same object gets requested every few 
seconds cycling through a list of objects.  This close to saturates the 
traffic in the 25Gbps NIC of the node.  I have a one minute log file 
with debug_objecter/objectcacher/client/context/finisher set to 20 that 
traces this, but I don't see why this is happening exactly.  Can someone 
with a better understanding on the client->cache->OSD flow share some 
insights into what is going wrong here?


Here is an example:

2023-02-03T14:42:47.593-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002b '@9' '@9' [read 0~25165824] tid 
4227 osd.4349
2023-02-03T14:43:00.056-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002b '@9' '@9' [read 0~25165824] tid 
42778921 osd.4349
2023-02-03T14:43:02.048-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002b '@9' '@9' [read 0~25165824] tid 
42779104 osd.4349
2023-02-03T14:43:05.392-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002b '@9' '@9' [read 0~25165824] tid 
42779408 osd.4349
2023-02-03T14:43:10.076-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002b '@9' '@9' [read 0~25165824] tid 
42779847 osd.4349
2023-02-03T14:43:13.288-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002b '@9' '@9' [read 0~25165824] tid 
42780156 osd.4349
2023-02-03T14:43:18.908-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002b '@9' '@9' [read 0~25165824] tid 
42780697 osd.4349
2023-02-03T14:43:29.056-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002b '@9' '@9' [read 0~25165824] tid 
42781660 osd.4349
2023-02-03T14:43:33.707-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002b '@9' '@9' [read 0~25165824] tid 
42782079 osd.4349
2023-02-03T14:42:41.609-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002c '@9' '@9' [read 0~25165824] tid 
42777194 osd.3251
2023-02-03T14:42:49.809-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002c '@9' '@9' [read 0~25165824] tid 
42777954 osd.3251
2023-02-03T14:43:07.884-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002c '@9' '@9' [read 0~25165824] tid 
42779646 osd.3251
2023-02-03T14:43:16.736-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002c '@9' '@9' [read 0~25165824] tid 
42780500 osd.3251
2023-02-03T14:43:22.160-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002c '@9' '@9' [read 0~25165824] tid 
42781009 osd.3251
2023-02-03T14:43:31.603-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002c '@9' '@9' [read 0~25165824] tid 
42781892 osd.3251
2023-02-03T14:43:35.503-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit oid 10097d5e07f.002c '@9' '@9' [read 0~25165824] tid 
42782258 osd.3251


Taking a specific object as an example: 100dbad3fce.00b3

2023-02-03T14:42:46.293-0500 7fffdbfff700 10 objectcacher readx 
extent(100dbad3fce.00b3 (179) in @9 0~25165824 -> [729808896,25165824])
2023-02-03T14:42:46.293-0500 7fffdbfff700 10 
objectcacher.object(100dbad3fce.00b3/head) map_read 
100dbad3fce.00b3 0~25165824
2023-02-03T14:42:46.293-0500 7fffdbfff700 20 
objectcacher.object(100dbad3fce.00b3/head) map_read miss 25165824 
left, bh[ 0x7fffb0f461a0 0~25165824 0x7fffb0e4b720 (0) v 0 missing] 
waiters = {}

... a few times the above ...
... then an OSD read ...
2023-02-03T14:42:48.557-0500 7fffdbfff700 10 objectcacher readx 
extent(100dbad3fce.00b3 (179) in @9 0~25165824 -> [50331648,25165824])
2023-02-03T14:42:48.557-0500 7fffdbfff700 10 
objectcacher.object(100dbad3fce.00b3/head) map_read 
100dbad3fce.00b3 0~25165824
2023-02-03T14:42:48.557-0500 7fffdbfff700 20 
objectcacher.object(100dbad3fce.00b3/head) map_read miss 25165824 
left, bh[ 0x7fffb123acd0 0~25165824 0x7fffb0e4b720 (0) v 0 missing] 
waiters = {}
2023-02-03T14:42:48.557-0500 7fffdbfff700  7 objectcacher bh_read on bh[ 
0x7fffb123acd0 0~25165824 0x7fffb0e4b720 (0) v 0 missing] waiters = {} 
outstanding reads 170
2023-02-03T14:42:48.557-0500 7fffdbfff700 10 client.151672236.objecter 
_op_submit op 0x7fffb12de170
2023-02-03T14:42:48.557-0500 7fffdbfff700 20 client.151672236.objecter 
_calc_target epoch 2609365 base 100dbad3fce.00b3 @9 precalc_pgid 0 
pgid 0.0 is_read
2023-02-03T14:42:48.557-0500 7f

[ceph-users] cephadm and the future

2023-02-03 Thread Christopher Durham


Question:
What does the future hold with regard to cephadm vs rpm/deb packages? If it is 
now suggested to use cephadm and thus containers to deploy new clusters, what 
does the future hold? Is there an intent, at sometime in the future, to no 
longer support rpm/deb packages for Linux systems, and only support the cephadm 
container method?
I am not asking to argue containers vs traditional bare metal installs. I am 
just trying to plan for the future. Thanks
-Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Any ceph constants available?

2023-02-03 Thread Thomas Cannon

Hello Ceph community.

The company that recently hired me has a 3 mode ceph cluster that has been 
running and stable. I am the new lone administrator here and do not know ceph 
and this is my first experience with it. 

The issue was that it is/was running out of space, which is why I made a 4th 
node and attempted to add it into the cluster. Along the way, things have begun 
to break. The manager daemon on boreal-01 failed to boreal-02 along the way and 
I tried to get it to fail back to boreal-01, but was unable, and realized while 
working on it yesterday I realized that the nodes in the cluster are all 
running different versions of the software. I suspect that might be a huge part 
of why things aren’t working as expected. 

Boreal-01 - the host - 17.2.5:

root@boreal-01:/home/kadmin# ceph -v
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
root@boreal-01:/home/kadmin# 

Boreal-01 - the admin docker instance running on the host 17.2.1:

root@boreal-01:/home/kadmin# cephadm shell
Inferring fsid 951fa730-0228-11ed-b1ef-f925f77b75d3
Inferring config 
/var/lib/ceph/951fa730-0228-11ed-b1ef-f925f77b75d3/mon.boreal-01/config
Using ceph image with id 'e5af760fa1c1' and tag 'v17' created on 2022-06-23 
19:49:45 + UTC
quay.io/ceph/ceph@sha256:d3f3e1b59a304a280a3a81641ca730982da141dad41e942631e4c5d88711a66b
 

root@boreal-01:/# ceph -v
ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)
root@boreal-01:/# 

Boreal-02 - 15.2.6:

root@boreal-02:/home/kadmin# ceph -v
ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
root@boreal-02:/home/kadmin# 


Boreal-03 - 15.2.8:

root@boreal-03:/home/kadmin# ceph -v
ceph version 15.2.18 (f2877ae32a72fc25acadef57597f44988b805c38) octopus (stable)
root@boreal-03:/home/kadmin# 

And the host I added - Boreal-04 - 17.2.5:

root@boreal-04:/home/kadmin# ceph -v
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
root@boreal-04:/home/kadmin# 

The cluster ins’t rebalancing data, and drives are filling up unevenly, despite 
auto balancing being on. I can run a df and see that it isn’t working. However 
it says it is:

root@boreal-01:/# ceph balancer status 
{
"active": true,
"last_optimize_duration": "0:00:00.011905",
"last_optimize_started": "Fri Feb  3 18:39:02 2023",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num 
is decreasing, or distribution is already perfect",
"plans": []
}
root@boreal-01:/# 

root@boreal-01:/# ceph -s
  cluster:
id: 951fa730-0228-11ed-b1ef-f925f77b75d3
health: HEALTH_WARN
There are daemons running an older version of ceph
6 nearfull osd(s)
3 pgs not deep-scrubbed in time
3 pgs not scrubbed in time
4 pool(s) nearfull
1 daemons have recently crashed
 
  services:
mon: 4 daemons, quorum boreal-01,boreal-02,boreal-03,boreal-04 (age 22h)
mgr: boreal-02.lqxcvk(active, since 19h), standbys: boreal-03.vxhpad, 
boreal-01.ejaggu
mds: 2/2 daemons up, 2 standby
osd: 89 osds: 89 up (since 5d), 89 in (since 45h)
 
  data:
volumes: 2/2 healthy
pools:   7 pools, 549 pgs
objects: 227.23M objects, 193 TiB
usage:   581 TiB used, 356 TiB / 937 TiB avail
pgs: 533 active+clean
 16  active+clean+scrubbing+deep
 
  io:
client:   55 MiB/s rd, 330 KiB/s wr, 21 op/s rd, 45 op/s wr
 
root@boreal-01:/# 

Part of me suspects that I exacerbated the problems by trying to monkey with 
boreal-04 for several days, trying to get the drives inside the machine turned 
into OSDs so that they would be used. One thing I did was attempt to upgrade 
the code on that machine, and I could have triggered a cluster-wide upgrade 
that failed outside of 1 and 4. With 2 and 3 not even running the same major 
release, if I did make that mistake, I can see why instead of an upgrade, 
things would be worse. 

According to the documentation, I should be able to upgrade the entire cluster 
by running a single command on the admin node, but when I go to run commands I 
get errors that even google can’t solve:

root@boreal-01:/# ceph orch host ls
Error ENOENT: Module not found
root@boreal-01:/# 

Consequently, I have very little faith that running commands to upgrade 
everything so that it’s all running the same code will work. I think each host 
could be upgraded and fix things, but do not feel confident doing so and 
risking our data.

Hopefully that gives a better idea of the problems I am facing. I am hoping for 
some professional services hours with someone who is a true expert with this 
software, to get us to a stable and sane deployment that can be managed without 
it being a terrifying guessing game, trying to get it to work.

If that is you, or if you know someone who can help — please 

[ceph-users] Re: [EXTERNAL] Any ceph constants available?

2023-02-03 Thread Beaman, Joshua (Contractor)
Congrats landing a fun new job!  That’s quite the mess you have to untangle 
there.

I’d suggest, since all of those versions will support orchestrator/cephadm, 
running through the cephadm conversion process here: 
https://docs.ceph.com/en/latest/cephadm/adoption/

That should get you to the point that you can use ceph orch commands to get 
your versions aligned.

As for why the balancer isn’t working, first is 89 the correct number of OSDs 
after you added the 4th host?  I’d wonder if your new host is in the correct 
root of the crush map.  Check `ceph osd tree` to ensure that all storage hosts 
are equal and subordinate to the same root (probably default).

At 62% raw utilization you should be OK to rebalance, but things get more 
challenging above 70% full, and downright painful above 80%.

You should also check your pool pg_nums with `ceph osd pool autoscale-status`.  
 If the autoscaler isn’t enabled, some pg_num adjustments might bump loose the 
balancer.

It’s concerning that you have 4 pools warning nearful, but 7 pools in the 
cluster.  This may imply that the pools are not distributed equally among your 
osds and buckets in your crush map.  Check `ceph osd pool ls detail` and see 
what crush_rule is assigned to each pool.  If they’re not all the same, you’re 
going to need to do some digging into your crush map to figure out why and if 
it’s for a good reason, or poor design or implementation.


Best of luck,
Josh

From: Thomas Cannon 
Date: Friday, February 3, 2023 at 5:02 PM
To: ceph-users@ceph.io 
Subject: [EXTERNAL] [ceph-users] Any ceph constants available?

Hello Ceph community.

The company that recently hired me has a 3 mode ceph cluster that has been 
running and stable. I am the new lone administrator here and do not know ceph 
and this is my first experience with it.

The issue was that it is/was running out of space, which is why I made a 4th 
node and attempted to add it into the cluster. Along the way, things have begun 
to break. The manager daemon on boreal-01 failed to boreal-02 along the way and 
I tried to get it to fail back to boreal-01, but was unable, and realized while 
working on it yesterday I realized that the nodes in the cluster are all 
running different versions of the software. I suspect that might be a huge part 
of why things aren’t working as expected.

Boreal-01 - the host - 17.2.5:

root@boreal-01:/home/kadmin# ceph -v
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
root@boreal-01:/home/kadmin#

Boreal-01 - the admin docker instance running on the host 17.2.1:

root@boreal-01:/home/kadmin# cephadm shell
Inferring fsid 951fa730-0228-11ed-b1ef-f925f77b75d3
Inferring config 
/var/lib/ceph/951fa730-0228-11ed-b1ef-f925f77b75d3/mon.boreal-01/config
Using ceph image with id 'e5af760fa1c1' and tag 'v17' created on 2022-06-23 
19:49:45 + UTC
quay.io/ceph/ceph@sha256:d3f3e1b59a304a280a3a81641ca730982da141dad41e942631e4c5d88711a66b
 

root@boreal-01:/# ceph -v
ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)
root@boreal-01:/#

Boreal-02 - 15.2.6:

root@boreal-02:/home/kadmin# ceph -v
ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
root@boreal-02:/home/kadmin#


Boreal-03 - 15.2.8:

root@boreal-03:/home/kadmin# ceph -v
ceph version 15.2.18 (f2877ae32a72fc25acadef57597f44988b805c38) octopus (stable)
root@boreal-03:/home/kadmin#

And the host I added - Boreal-04 - 17.2.5:

root@boreal-04:/home/kadmin# ceph -v
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
root@boreal-04:/home/kadmin#

The cluster ins’t rebalancing data, and drives are filling up unevenly, despite 
auto balancing being on. I can run a df and see that it isn’t working. However 
it says it is:

root@boreal-01:/# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.011905",
"last_optimize_started": "Fri Feb  3 18:39:02 2023",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num 
is decreasing, or distribution is already perfect",
"plans": []
}
root@boreal-01:/#

root@boreal-01:/# ceph -s
  cluster:
id: 951fa730-0228-11ed-b1ef-f925f77b75d3
health: HEALTH_WARN
There are daemons running an older version of ceph
6 nearfull osd(s)
3 pgs not deep-scrubbed in time
3 pgs not scrubbed in time
4 pool(s) nearfull
1 daemons have recently crashed

  services:
mon: 4 daemons, quorum boreal-01,boreal-02,boreal-03,boreal-04 (age 22h)
mgr: boreal-02.lqxcvk(active, since 19h), standbys: boreal-03.vxhpad, 
boreal-01.ejaggu
mds: 2/2 daemons up, 2 standby
osd: 89 osds: 89 up (since 5d), 8

[ceph-users] Re: [EXTERNAL] Any ceph constants available?

2023-02-03 Thread Thomas Cannon
> 
> As for why the balancer isn’t working, first is 89 the correct number of OSDs 
> after you added the 4th host? 

OSD 0-88 is on hosts 1-3 for a total of 89 OSDs. Host #4 has 20 drives and 
while ceph is trying to add them, it gets as far as trying to mkfs and then it 
errors out — you can see the whole error here:

https://pastebin.com/STg4t8FJ

> I’d wonder if your new host is in the correct root of the crush map.  Check 
> `ceph osd tree` to ensure that all storage hosts are equal and subordinate to 
> the same root (probably default).

No OSDs so it isn’t in the crush map yet. Or am I doing that wrong?

The results of the command:

https://pastebin.com/kwCcKJ5f
>  
> At 62% raw utilization you should be OK to rebalance, but things get more 
> challenging above 70% full, and downright painful above 80%.
>  
> You should also check your pool pg_nums with `ceph osd pool 
> autoscale-status`.   If the autoscaler isn’t enabled, some pg_num adjustments 
> might bump loose the balancer.

Here is gets very odd.

root@boreal-01:/var/lib/ceph# ceph osd pool autoscale-status
root@boreal-01:/var/lib/ceph# 

Nothing?

It seems to be on?

root@boreal-01:/var/lib/ceph# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins 
pg_num 1 pgp_num 1 autoscale_mode on last_change 26582 flags 
hashpspool,nearfull stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 2 'fs_pool' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins 
pg_num 512 pgp_num 512 autoscale_mode on last_change 26582 lfor 0/0/6008 flags 
hashpspool,nearfull,selfmanaged_snaps stripe_width 0 application cephfs,rbd
pool 5 'fs_metadata_pool' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 26582 lfor 0/0/300 
flags hashpspool,nearfull stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 
recovery_priority 5 application cephfs,rbd
pool 10 'mlScratchStorage_metadata' replicated size 2 min_size 1 crush_rule 2 
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 10500 
flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 
recovery_priority 5 application cephfs
pool 11 'mlScratchStorage' replicated size 2 min_size 1 crush_rule 2 
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 10498 
flags hashpspool stripe_width 0 application cephfs
pool 12 'RBD_block_HDD_slow' replicated size 3 min_size 2 crush_rule 3 
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 26582 
flags hashpspool,nearfull stripe_width 0 application rbd
pool 13 'RBD_block_SSD_fast' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 11287 
flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd


I see there are different rules applied to the pools but honestly have no idea 
what that means. Sadly, I am learning on the job here and the curve is pretty 
steep. Are the drives not balancing because of rules being misapplied?

Thank you for all of your help here.

Thomas

>  
> It’s concerning that you have 4 pools warning nearful, but 7 pools in the 
> cluster.  This may imply that the pools are not distributed equally among 
> your osds and buckets in your crush map.  Check `ceph osd pool ls detail` and 
> see what crush_rule is assigned to each pool.  If they’re not all the same, 
> you’re going to need to do some digging into your crush map to figure out why 
> and if it’s for a good reason, or poor design or implementation.
>  
>  
> Best of luck,
> Josh 
>  
> From: Thomas Cannon mailto:thomas.can...@pronto.ai>>
> Date: Friday, February 3, 2023 at 5:02 PM
> To: ceph-users@ceph.io   >
> Subject: [EXTERNAL] [ceph-users] Any ceph constants available?
> 
> 
> Hello Ceph community.
> 
> The company that recently hired me has a 3 mode ceph cluster that has been 
> running and stable. I am the new lone administrator here and do not know ceph 
> and this is my first experience with it. 
> 
> The issue was that it is/was running out of space, which is why I made a 4th 
> node and attempted to add it into the cluster. Along the way, things have 
> begun to break. The manager daemon on boreal-01 failed to boreal-02 along the 
> way and I tried to get it to fail back to boreal-01, but was unable, and 
> realized while working on it yesterday I realized that the nodes in the 
> cluster are all running different versions of the software. I suspect that 
> might be a huge part of why things aren’t working as expected. 
> 
> Boreal-01 - the host - 17.2.5:
> 
> root@boreal-01:/home/kadmin# ceph -v
> ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
> root@boreal-01:/home/kadmin# 
> 
> Boreal-01 - the admin docker instance running on the host 17.2.1:
> 
> root@boreal-01:/home/kadmin# cephadm shell
> Inferring fsid 951fa730-0228-11ed-b1ef-f925f77b75d3
> Inferring