Sam, see https://tracker.ceph.com/issues/49938 and https://github.com/ceph/ceph/pull/40334
On Tue, Mar 23, 2021 at 8:29 AM Dan van der Ster <d...@vanderster.com> wrote: > > Hi Sam, > > Yeah somehow `lo:` is not getting skipped, probably due to those > patches. (I guess it is because the 2nd patch looks for `lo:` but in > fact the ifa_name is probably just `lo` without the colon) > > https://github.com/ceph/ceph/blob/master/src/common/ipaddr.cc#L110 > > I don't know why this impacts you but not us -- we already upgraded > one of our clusters to 14.2.18 on Centos 8, and ceph is choosing the > correct interface without needing any network options. And lo: is the > first interface [1] here too. > Could it be as simple as the iface names being sorted alphabetically? > Here we have ens785f0 which would come before lo, but your interface > `p2p2` would come after. > > -- dan > > [1] > # ip a > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN > group default qlen 1000 > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > inet 127.0.0.1/8 scope host lo > valid_lft forever preferred_lft forever > inet6 ::1/128 scope host > valid_lft forever preferred_lft forever > 2: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state > DOWN group default qlen 1000 > link/ether a4:bf:01:60:67:a0 brd ff:ff:ff:ff:ff:ff > 3: ens785f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state > UP group default qlen 1000 > link/ether 0c:42:a1:ad:36:9a brd ff:ff:ff:ff:ff:ff > inet 10.116.6.8/26 brd 10.116.6.63 scope global dynamic > noprefixroute ens785f0 > valid_lft 432177sec preferred_lft 432177sec > inet6 fd01:1458:e00:1e::100:5/128 scope global dynamic noprefixroute > valid_lft 513502sec preferred_lft 513502sec > inet6 fe80::bdbd:76be:63fd:a4c2/64 scope link noprefixroute > valid_lft forever preferred_lft forever > 4: ens785f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq > state DOWN group default qlen 1000 > link/ether 0c:42:a1:ad:36:9b brd ff:ff:ff:ff:ff:ff > 5: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state > DOWN group default qlen 1000 > link/ether a4:bf:01:60:67:a1 brd ff:ff:ff:ff:ff:ff\ > > On Mon, Mar 22, 2021 at 8:35 PM Sam Skipsey <aoa...@gmail.com> wrote: > > > > Hi Dan: > > > > Aha - I think the first commit is probably it - before that commit, the > > fact that lo is highest in the interfaces enumeration didn't matter for us > > [since it would always be skipped]. > > > > This actually almost certainly also is associated with that other site with > > a similar problem (OSDs drop out until you restart the network interface), > > since I imagine that would reorder the interface list. > > > > Playing with our public and cluster bind address explicitly does seem to > > help, so we'll iterate on that and get to a suitable ceph.conf. > > > > Thanks for the help [and it was the network all along]! > > > > > > Sam > > > > On Mon, 22 Mar 2021 at 19:12, Dan van der Ster <d...@vanderster.com> wrote: > >> > >> There are two commits between 14.2.16 and 14.2.18 related to loopback > >> network. Perhaps one of these is responsible for your issue [1]. > >> > >> I'd try playing with the options like cluster/public bind addr and > >> cluster/public bind interface until you can convince the osd to bind to > >> the correct listening IP. > >> > >> (That said, i don't know which version you're running on the logs shared > >> earlier. But I think you should try to get 14.2.18 working anyway). > >> > >> .. dan > >> > >> [1] > >> > >> > git log v14.2.18...v14.2.16 ipaddr.cc commit > >> > 89321762ad4cfdd1a68cae467181bdd1a501f14d > >> Author: Thomas Goirand <z...@debian.org> > >> Date: Fri Jan 15 10:50:05 2021 +0100 > >> > >> common/ipaddr: Allow binding on lo > >> > >> Commmit 5cf0fa872231f4eaf8ce6565a04ed675ba5b689b, solves the issue that > >> the osd can't restart after seting a virtual local loopback IP. > >> However, > >> this commit also prevents a bgp-to-the-host over unumbered Ipv6 > >> local-link is setup, where OSD typically are bound to the lo interface. > >> > >> To solve this, this single char patch simply checks against "lo:" to > >> match only virtual interfaces instead of anything that starts with > >> "lo". > >> > >> Fixes: https://tracker.ceph.com/issues/48893 > >> Signed-off-by: Thomas Goirand <z...@debian.org> > >> (cherry picked from commit 201b59204374ebdab91bb554b986577a97b19c36) > >> > >> commit b52cae90d67eb878b3ddfe547b8bf16e0d4d1a45 > >> Author: lijaiwei1 <lijiaw...@chinatelecom.cn> > >> Date: Tue Dec 24 22:34:46 2019 +0800 > >> > >> common: skip interfaces starting with "lo" in find_ipv{4,6}_in_subnet() > >> > >> This will solve the issue that the osd can't restart after seting a > >> virtual local loopback IP. > >> In find_ipv4_in_subnet() and find_ipv6_in_subnet(), I use > >> boost::starts_with(addrs->ifa_name, "lo") to ship the interfaces > >> starting with "lo". > >> > >> Fixes: https://tracker.ceph.com/issues/43417 > >> Signed-off-by: Jiawei Li <lijiaw...@chinatelecom.cn> > >> (cherry picked from commit 5cf0fa872231f4eaf8ce6565a04ed675ba5b689b) > >> > >> > >> > >> > >> > >> On Mon, Mar 22, 2021, 7:42 PM Sam Skipsey <aoa...@gmail.com> wrote: > >>> > >>> I don't think we explicitly set any ms settings in the OSD host ceph.conf > >>> [all the OSDs ceph.confs are identical across the entire cluster]. > >>> > >>> ip a gives: > >>> > >>> ip a > >>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group > >>> default qlen 1000 > >>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > >>> inet 127.0.0.1/8 scope host lo > >>> valid_lft forever preferred_lft forever > >>> inet6 ::1/128 scope host > >>> valid_lft forever preferred_lft forever > >>> 2: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN > >>> group default qlen 1000 > >>> link/ether 4c:d9:8f:55:92:f6 brd ff:ff:ff:ff:ff:ff > >>> 3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN > >>> group default qlen 1000 > >>> link/ether 4c:d9:8f:55:92:f7 brd ff:ff:ff:ff:ff:ff > >>> 4: p2p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN > >>> group default qlen 1000 > >>> link/ether b4:96:91:3f:62:20 brd ff:ff:ff:ff:ff:ff > >>> 5: p2p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP > >>> group default qlen 1000 > >>> link/ether b4:96:91:3f:62:22 brd ff:ff:ff:ff:ff:ff > >>> inet 10.1.50.21/8 brd 10.255.255.255 scope global noprefixroute p2p2 > >>> valid_lft forever preferred_lft forever > >>> inet6 fe80::b696:91ff:fe3f:6222/64 scope link noprefixroute > >>> valid_lft forever preferred_lft forever > >>> > >>> (where here p2p2 is the only active network link, and is also the private > >>> and public network for the ceph cluster) > >>> > >>> The output is similar on other hosts - with p2p2 either at position 3 or > >>> 5 depending on the order the interfaces were enumerated. > >>> > >>> Sam > >>> > >>> On Mon, 22 Mar 2021 at 17:34, Dan van der Ster <d...@vanderster.com> > >>> wrote: > >>>> > >>>> Which `ms` settings do you have in the OSD host's ceph.conf or the ceph > >>>> config dump? > >>>> > >>>> And how does `ip a` look on one of these hosts where the osd is > >>>> registering itself as 127.0.0.1? > >>>> > >>>> > >>>> You might as well set nodown again now. This will make ops pile up, but > >>>> that's the least of your concerns at the moment. > >>>> (With osds flapping the osdmaps churn and that inflates the mon store) > >>>> > >>>> .. Dan > >>>> > >>>> On Mon, Mar 22, 2021, 6:28 PM Sam Skipsey <aoa...@gmail.com> wrote: > >>>>> > >>>>> Hm, yes it does [and I was wondering why loopbacks were showing up > >>>>> suddenly in the logs]. This wasn't happening with 14.2.16 so what's > >>>>> changed about how we specify stuff? > >>>>> > >>>>> This might correlate with the other person on the IRC list who has > >>>>> problems with 14.2.18 and their OSDs deciding they don't work sometimes > >>>>> until they forcibly restart their network links... > >>>>> > >>>>> > >>>>> Sam > >>>>> > >>>>> On Mon, 22 Mar 2021 at 17:20, Dan van der Ster <d...@vanderster.com> > >>>>> wrote: > >>>>>> > >>>>>> What's with the OSDs having loopback addresses? E.g. > >>>>>> v2:127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667 > >>>>>> > >>>>>> Does `ceph osd dump` show those same loopback addresses for each OSD? > >>>>>> > >>>>>> This sounds familiar... I'm trying to find the recent ticket. > >>>>>> > >>>>>> .. dan > >>>>>> > >>>>>> > >>>>>> On Mon, Mar 22, 2021, 6:07 PM Sam Skipsey <aoa...@gmail.com> wrote: > >>>>>>> > >>>>>>> hi Dan: > >>>>>>> > >>>>>>> So, unsetting nodown results in... almost all of the OSDs being > >>>>>>> marked down. (231 down out of 328). > >>>>>>> Checking the actual OSD services, most of them were actually up and > >>>>>>> active on the nodes, even when the mons had marked them down. > >>>>>>> (On a few nodes, the down services corresponded to OSDs that had been > >>>>>>> flapping - but increasing osd_max_markdown locally to keep them up > >>>>>>> despite the previous flapping, and restarting the services... didn't > >>>>>>> help.) > >>>>>>> > >>>>>>> In fact, starting up the few OSD services which had actually stopped, > >>>>>>> resulted in a different set of OSDs being marked down, and some > >>>>>>> others coming up. > >>>>>>> We currently have a sort of "rolling OSD outness" passing through the > >>>>>>> cluster - there's always ~230 OSDs marked down now, but which ones > >>>>>>> those are changes (we've had everything from 1 HOST down to 4 HOSTS > >>>>>>> down over the past 14 minutes as things fluctuate. > >>>>>>> > >>>>>>> A log from one of the "down" OSDs [which is actually running, and on > >>>>>>> the same host as OSDs which are marked up] shows this worrying snippet > >>>>>>> > >>>>>>> 2021-03-22 17:01:45.298 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:45.298 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:46.340 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:46.340 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:47.376 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:47.376 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:48.395 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:48.395 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:49.407 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:49.407 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:50.400 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:50.400 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:50.922 7f6c9f088700 -1 --2- 10.1.50.21:0/23673 >> > >>>>>>> [v2:127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667] > >>>>>>> conn(0x56010903e400 0x56011a71fc00 unknown :-1 s=BANNER_CONNECTING > >>>>>>> pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer > >>>>>>> [v2:127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667] is using msgr > >>>>>>> V1 protocol > >>>>>>> 2021-03-22 17:01:50.922 7f6c9f889700 -1 --2- 10.1.50.21:0/23673 >> > >>>>>>> [v2:127.0.0.1:6821/13015214,v1:127.0.0.1:6831/13015214] > >>>>>>> conn(0x5600df434000 0x56011718e000 unknown :-1 s=BANNER_CONNECTING > >>>>>>> pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer > >>>>>>> [v2:127.0.0.1:6821/13015214,v1:127.0.0.1:6831/13015214] is using msgr > >>>>>>> V1 protocol > >>>>>>> 2021-03-22 17:01:50.922 7f6ca008a700 -1 --2- 10.1.50.21:0/23673 >> > >>>>>>> [v2:127.0.0.1:6826/11091658,v1:127.0.0.1:6828/11091658] > >>>>>>> conn(0x5600f85ed800 0x560109df2a00 unknown :-1 s=BANNER_CONNECTING > >>>>>>> pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer > >>>>>>> [v2:127.0.0.1:6826/11091658,v1:127.0.0.1:6828/11091658] is using msgr > >>>>>>> V1 protocol > >>>>>>> 2021-03-22 17:01:50.922 7f6ca008a700 -1 --2- 10.1.50.21:0/23673 >> > >>>>>>> [v2:127.0.0.1:6859/2683393,v1:127.0.0.1:6862/2683393] > >>>>>>> conn(0x5600f22ea000 0x560117182300 unknown :-1 s=BANNER_CONNECTING > >>>>>>> pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer > >>>>>>> [v2:127.0.0.1:6859/2683393,v1:127.0.0.1:6862/2683393] is using msgr > >>>>>>> V1 protocol > >>>>>>> 2021-03-22 17:01:50.922 7f6ca008a700 -1 --2- 10.1.50.21:0/23673 >> > >>>>>>> [v2:127.0.0.1:6901/15090566,v1:127.0.0.1:6907/15090566] > >>>>>>> conn(0x5600df435c00 0x560139370300 unknown :-1 s=BANNER_CONNECTING > >>>>>>> pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer > >>>>>>> [v2:127.0.0.1:6901/15090566,v1:127.0.0.1:6907/15090566] is using msgr > >>>>>>> V1 protocol > >>>>>>> 2021-03-22 17:01:51.377 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:51.377 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:52.370 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:52.370 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:53.377 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:53.377 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:54.385 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:54.385 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:55.385 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:55.385 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:56.362 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:56.362 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> 2021-03-22 17:01:57.324 7f6c9c883700 1 osd.127 253515 is_healthy > >>>>>>> false -- only 0/10 up peers (less than 33%) > >>>>>>> 2021-03-22 17:01:57.324 7f6c9c883700 1 osd.127 253515 not healthy; > >>>>>>> waiting to boot > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Any suggestions? > >>>>>>> > >>>>>>> Sam > >>>>>>> > >>>>>>> P.S. an example ceph status as it is now [with everything now on > >>>>>>> 14.2.18, since we had to restart osds anyway]: > >>>>>>> > >>>>>>> cluster: > >>>>>>> id: a1148af2-6eaf-4486-a27e-a05a78c2b378 > >>>>>>> health: HEALTH_WARN > >>>>>>> pauserd,pausewr,noout,nobackfill,norebalance flag(s) set > >>>>>>> 230 osds down > >>>>>>> 4 hosts (80 osds) down > >>>>>>> Reduced data availability: 2048 pgs inactive > >>>>>>> 8 slow ops, oldest one blocked for 901 sec, mon.cephs01 > >>>>>>> has slow ops > >>>>>>> > >>>>>>> services: > >>>>>>> mon: 3 daemons, quorum cephs01,cephs02,cephs03 (age 2h) > >>>>>>> mgr: cephs01(active, since 77m) > >>>>>>> osd: 329 osds: 98 up (since 4s), 328 in (since 4d) > >>>>>>> flags pauserd,pausewr,noout,nobackfill,norebalance > >>>>>>> > >>>>>>> data: > >>>>>>> pools: 3 pools, 2048 pgs > >>>>>>> objects: 0 objects, 0 B > >>>>>>> usage: 0 B used, 0 B / 0 B avail > >>>>>>> pgs: 100.000% pgs unknown > >>>>>>> 2048 unknown > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Mon, 22 Mar 2021 at 14:57, Dan van der Ster <d...@vanderster.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I would unset nodown (hiding osd failures) and norecover (blcoking > >>>>>>>> PGs > >>>>>>>> from recovering degraded objects), then start starting osds. > >>>>>>>> As soon as you have some osd logs reporting some failures, then > >>>>>>>> share those... > >>>>>>>> > >>>>>>>> - Dan > >>>>>>>> > >>>>>>>> On Mon, Mar 22, 2021 at 3:49 PM Sam Skipsey <aoa...@gmail.com> wrote: > >>>>>>>> > > >>>>>>>> > So, we started the mons and mgr up again, and here's the relevant > >>>>>>>> > logs, including also ceph versions. We've also turned off all of > >>>>>>>> > the firewalls on all of the nodes so we know that there can't be > >>>>>>>> > network issues [and, indeed, all of our management of the OSDs > >>>>>>>> > happens via logins from the service nodes or to each other] > >>>>>>>> > > >>>>>>>> > > ceph status > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > cluster: > >>>>>>>> > id: a1148af2-6eaf-4486-a27e-a05a78c2b378 > >>>>>>>> > health: HEALTH_WARN > >>>>>>>> > > >>>>>>>> > pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover > >>>>>>>> > flag(s) set > >>>>>>>> > 1 nearfull osd(s) > >>>>>>>> > 3 pool(s) nearfull > >>>>>>>> > Reduced data availability: 2048 pgs inactive > >>>>>>>> > mons cephs01,cephs02,cephs03 are using a lot of disk > >>>>>>>> > space > >>>>>>>> > > >>>>>>>> > services: > >>>>>>>> > mon: 3 daemons, quorum cephs01,cephs02,cephs03 (age 61s) > >>>>>>>> > mgr: cephs01(active, since 76s) > >>>>>>>> > osd: 329 osds: 329 up (since 63s), 328 in (since 4d); 466 > >>>>>>>> > remapped pgs > >>>>>>>> > flags > >>>>>>>> > pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover > >>>>>>>> > > >>>>>>>> > data: > >>>>>>>> > pools: 3 pools, 2048 pgs > >>>>>>>> > objects: 0 objects, 0 B > >>>>>>>> > usage: 0 B used, 0 B / 0 B avail > >>>>>>>> > pgs: 100.000% pgs unknown > >>>>>>>> > 2048 unknown > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > ceph health detail > >>>>>>>> > > >>>>>>>> > HEALTH_WARN > >>>>>>>> > pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover > >>>>>>>> > flag(s) set; 1 nearfull osd(s); 3 pool(s) nearfull; Reduced data > >>>>>>>> > availability: 2048 pgs inactive; mons cephs01,cephs02,cephs03 are > >>>>>>>> > using a lot of disk space > >>>>>>>> > OSDMAP_FLAGS > >>>>>>>> > pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover > >>>>>>>> > flag(s) set > >>>>>>>> > OSD_NEARFULL 1 nearfull osd(s) > >>>>>>>> > osd.63 is near full > >>>>>>>> > POOL_NEARFULL 3 pool(s) nearfull > >>>>>>>> > pool 'dteam' is nearfull > >>>>>>>> > pool 'atlas' is nearfull > >>>>>>>> > pool 'atlas-localgroup' is nearfull > >>>>>>>> > PG_AVAILABILITY Reduced data availability: 2048 pgs inactive > >>>>>>>> > pg 13.1ef is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f0 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f1 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f2 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f3 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f4 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f5 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f6 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f7 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f8 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1f9 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1fa is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1fb is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1fc is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1fd is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1fe is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 13.1ff is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1ec is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f0 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f1 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f2 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f3 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f4 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f5 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f6 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f7 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f8 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1f9 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1fa is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1fb is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1fc is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1fd is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1fe is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 14.1ff is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1ed is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f0 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f1 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f2 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f3 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f4 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f5 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f6 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f7 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f8 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1f9 is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1fa is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1fb is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1fc is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1fd is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1fe is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > pg 15.1ff is stuck inactive for 89.322981, current state > >>>>>>>> > unknown, last acting [] > >>>>>>>> > MON_DISK_BIG mons cephs01,cephs02,cephs03 are using a lot of disk > >>>>>>>> > space > >>>>>>>> > mon.cephs01 is 96 GiB >= mon_data_size_warn (15 GiB) > >>>>>>>> > mon.cephs02 is 96 GiB >= mon_data_size_warn (15 GiB) > >>>>>>>> > mon.cephs03 is 96 GiB >= mon_data_size_warn (15 GiB) > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > ceph versions > >>>>>>>> > > >>>>>>>> > { > >>>>>>>> > "mon": { > >>>>>>>> > "ceph version 14.2.18 > >>>>>>>> > (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3 > >>>>>>>> > }, > >>>>>>>> > "mgr": { > >>>>>>>> > "ceph version 14.2.18 > >>>>>>>> > (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1 > >>>>>>>> > }, > >>>>>>>> > "osd": { > >>>>>>>> > "ceph version 14.2.10 > >>>>>>>> > (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)": 1, > >>>>>>>> > "ceph version 14.2.15 > >>>>>>>> > (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 188, > >>>>>>>> > "ceph version 14.2.16 > >>>>>>>> > (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus (stable)": 18, > >>>>>>>> > "ceph version 14.2.18 > >>>>>>>> > (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 122 > >>>>>>>> > }, > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > >>>>>> > >>>>>>>> > > >>>>>>>> > As a note, the log where the mgr explodes (which precipitated all > >>>>>>>> > of this) definitely shows the problem occurring on the 12th [when > >>>>>>>> > 14.2.17 dropped], but things didn't "break" until we tried > >>>>>>>> > upgrading OSDs to 14.2.18... > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > Sam > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > On Mon, 22 Mar 2021 at 12:20, Sam Skipsey <aoa...@gmail.com> wrote: > >>>>>>>> >> > >>>>>>>> >> Hi Dan: > >>>>>>>> >> > >>>>>>>> >> Thanks for the reply - at present, our mons and mgrs are off > >>>>>>>> >> [because of the unsustainable nature of the filesystem usage]. > >>>>>>>> >> We'll try putting them on again for long enough to get "ceph > >>>>>>>> >> status" out of them, but because the mgr was unable to actually > >>>>>>>> >> talk to anything, and reply at that point. > >>>>>>>> >> > >>>>>>>> >> (And thanks for the link to the bug tracker - I guess this > >>>>>>>> >> mismatch of expectations is why the devs are so keen to move to > >>>>>>>> >> containerised deployments where there is no co-location of > >>>>>>>> >> different types of server, as it means they don't need to worry > >>>>>>>> >> as much about the assumptions about when it's okay to restart a > >>>>>>>> >> service on package update. Disappointing that it seems stale > >>>>>>>> >> after 2 years...) > >>>>>>>> >> > >>>>>>>> >> Sam > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> On Mon, 22 Mar 2021 at 12:11, Dan van der Ster > >>>>>>>> >> <d...@vanderster.com> wrote: > >>>>>>>> >>> > >>>>>>>> >>> Hi Sam, > >>>>>>>> >>> > >>>>>>>> >>> The daemons restart (for *some* releases) because of this: > >>>>>>>> >>> https://tracker.ceph.com/issues/21672 > >>>>>>>> >>> In short, if the selinux module changes, and if you have selinux > >>>>>>>> >>> enabled, then midway through yum update, there will be a > >>>>>>>> >>> systemctl > >>>>>>>> >>> restart ceph.target issued. > >>>>>>>> >>> > >>>>>>>> >>> For the rest -- I think you should focus on getting the PGs all > >>>>>>>> >>> active+clean as soon as possible, because the degraded and > >>>>>>>> >>> remapped > >>>>>>>> >>> states are what leads to mon / osdmap growth. > >>>>>>>> >>> This kind of scenario is why we wrote this tool: > >>>>>>>> >>> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py > >>>>>>>> >>> It will use pg-upmap-items to force the PGs to the OSDs where > >>>>>>>> >>> they are > >>>>>>>> >>> currently residing. > >>>>>>>> >>> > >>>>>>>> >>> But there is some clarification needed before you go ahead with > >>>>>>>> >>> that. > >>>>>>>> >>> Could you share `ceph status`, `ceph health detail`? > >>>>>>>> >>> > >>>>>>>> >>> Cheers, Dan > >>>>>>>> >>> > >>>>>>>> >>> > >>>>>>>> >>> On Mon, Mar 22, 2021 at 12:05 PM Sam Skipsey <aoa...@gmail.com> > >>>>>>>> >>> wrote: > >>>>>>>> >>> > > >>>>>>>> >>> > Hi everyone: > >>>>>>>> >>> > > >>>>>>>> >>> > I posted to the list on Friday morning (UK time), but > >>>>>>>> >>> > apparently my email > >>>>>>>> >>> > is still in moderation (I have an email from the list bot > >>>>>>>> >>> > telling me that > >>>>>>>> >>> > it's held for moderation but no updates). > >>>>>>>> >>> > > >>>>>>>> >>> > Since this is a bit urgent - we have ~3PB of storage offline - > >>>>>>>> >>> > I'm posting > >>>>>>>> >>> > again. > >>>>>>>> >>> > > >>>>>>>> >>> > To save retyping the whole thing, I will direct you to a copy > >>>>>>>> >>> > of the email > >>>>>>>> >>> > I wrote on Friday: > >>>>>>>> >>> > > >>>>>>>> >>> > http://aoanla.pythonanywhere.com/Logs/EmailToCephUsers.txt > >>>>>>>> >>> > > >>>>>>>> >>> > (Since that was sent, we did successfully add big SSDs to the > >>>>>>>> >>> > MON hosts so > >>>>>>>> >>> > they don't fill up their disks with store.db s). > >>>>>>>> >>> > > >>>>>>>> >>> > I would appreciate any advice - assuming this also doesn't get > >>>>>>>> >>> > stuck in > >>>>>>>> >>> > moderation queues. > >>>>>>>> >>> > > >>>>>>>> >>> > -- > >>>>>>>> >>> > Sam Skipsey (he/him, they/them) > >>>>>>>> >>> > _______________________________________________ > >>>>>>>> >>> > ceph-users mailing list -- ceph-users@ceph.io > >>>>>>>> >>> > To unsubscribe send an email to ceph-users-le...@ceph.io > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> -- > >>>>>>>> >> Sam Skipsey (he/him, they/them) > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > -- > >>>>>>>> > Sam Skipsey (he/him, they/them) > >>>>>>>> > > >>>>>>>> > > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Sam Skipsey (he/him, they/them) > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Sam Skipsey (he/him, they/them) > >>>>> > >>>>> > >>> > >>> > >>> -- > >>> Sam Skipsey (he/him, they/them) > >>> > >>> > > > > > > -- > > Sam Skipsey (he/him, they/them) > > > > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io