ok now I understand, thanks for all this helpful answers! On Sat, Apr 7, 2018, 15:26 David Turner <drakonst...@gmail.com> wrote:
> I'm seconding what Greg is saying There is no reason to set nobackfill > and norecover just for restarting OSDs. That will only cause the problems > you're seeing without giving you any benefit. There are reasons to use > norecover and nobackfill but unless you're manually editing the crush map, > having osds consistently segfault, or for any other reason you really just > need to stop the io from recovery, then they aren't the flags for you. Even > at that, nobackfill is most likely what you need and norecover is still > probably not helpful. > > On Wed, Apr 4, 2018, 6:59 PM Gregory Farnum <gfar...@redhat.com> wrote: > >> On Thu, Mar 29, 2018 at 3:17 PM Damian Dabrowski <scoot...@gmail.com> >> wrote: >> >>> Greg, thanks for Your reply! >>> >>> I think Your idea makes sense, I've did tests and its quite hard to >>> understand for me. I'll try to explain my situation in few steps >>> below. >>> I think that ceph showing progress in recovery but it can only solve >>> objects which doesn't really changed. It won't try to repair objects >>> which are really degraded because of norecovery flag. Am i right? >>> After a while I see blocked requests(as You can see below). >>> >> >> Yeah, so the implementation of this is a bit funky. Basically, when the >> OSD gets a map specifying norecovery, it will prevent any new recovery ops >> from starting once it processes that map. But it doesn't change the state >> of the PGs out of recovery; they just won't queue up more work. >> >> So probably the existing recovery IO was from OSDs that weren't >> up-to-date yet. Or maybe there's a bug in the norecover implementation; it >> definitely looks a bit fragile. >> >> But really I just wouldn't use that command. It's an expert flag that you >> shouldn't use except in some extreme wonky cluster situations (and even >> those may no longer exist in modern Ceph). For the use case you shared in >> your first email, I'd just stick with noout. >> -Greg >> >> >>> >>> ----- FEW SEC AFTER START OSD ----- >>> # ceph status >>> cluster 848b340a-be27-45cb-ab66-3151d877a5a0 >>> health HEALTH_WARN >>> 140 pgs degraded >>> 1 pgs recovering >>> 92 pgs recovery_wait >>> 140 pgs stuck unclean >>> recovery 942/5772119 objects degraded (0.016%) >>> noout,nobackfill,norecover flag(s) set >>> monmap e10: 3 mons at >>> {node-19= >>> 172.31.0.2:6789/0,node-20=172.31.0.8:6789/0,node-21=172.31.0.6:6789/0} >>> election epoch 724, quorum 0,1,2 node-19,node-21,node-20 >>> osdmap e18727: 36 osds: 36 up, 30 in >>> flags noout,nobackfill,norecover >>> pgmap v20851644: 1472 pgs, 7 pools, 8510 GB data, 1880 kobjects >>> 25204 GB used, 17124 GB / 42329 GB avail >>> 942/5772119 objects degraded (0.016%) >>> 1332 active+clean >>> 92 active+recovery_wait+degraded >>> 47 active+degraded >>> 1 active+recovering+degraded >>> recovery io 31608 kB/s, 4 objects/s >>> client io 73399 kB/s rd, 80233 kB/s wr, 1218 op/s >>> >>> ----- 1 MIN AFTER OSD START, RECOVERY STUCK, BLOCKED REQUESTS ----- >>> # ceph status >>> cluster 848b340a-be27-45cb-ab66-3151d877a5a0 >>> health HEALTH_WARN >>> 140 pgs degraded >>> 1 pgs recovering >>> 109 pgs recovery_wait >>> 140 pgs stuck unclean >>> 80 requests are blocked > 32 sec >>> recovery 847/5775929 <(847)%20577-5929> objects degraded >>> (0.015%) >>> noout,nobackfill,norecover flag(s) set >>> monmap e10: 3 mons at >>> {node-19= >>> 172.31.0.2:6789/0,node-20=172.31.0.8:6789/0,node-21=172.31.0.6:6789/0} >>> election epoch 724, quorum 0,1,2 node-19,node-21,node-20 >>> osdmap e18727: 36 osds: 36 up, 30 in >>> flags noout,nobackfill,norecover >>> pgmap v20851812: 1472 pgs, 7 pools, 8520 GB data, 1881 kobjects >>> 25234 GB used, 17094 GB / 42329 GB avail >>> 847/5775929 <(847)%20577-5929> objects degraded (0.015%) >>> 1332 active+clean >>> 109 active+recovery_wait+degraded >>> 30 active+degraded <---- degraded objects count got >>> stuck >>> 1 active+recovering+degraded >>> recovery io 3743 kB/s, 0 objects/s <---- depend on command execution >>> this line showing 0 objects/s or doesn't exists >>> client io 26521 kB/s rd, 64211 kB/s wr, 1212 op/s >>> >>> ----- FEW SECONDS AFTER UNSETTING FLAGS NOOUT, NORECOVERY, NOBACKFILL >>> ----- >>> # ceph status >>> cluster 848b340a-be27-45cb-ab66-3151d877a5a0 >>> health HEALTH_WARN >>> 134 pgs degraded >>> 134 pgs recovery_wait >>> 134 pgs stuck degraded >>> 134 pgs stuck unclean >>> recovery 591/5778179 objects degraded (0.010%) >>> monmap e10: 3 mons at >>> {node-19= >>> 172.31.0.2:6789/0,node-20=172.31.0.8:6789/0,node-21=172.31.0.6:6789/0} >>> election epoch 724, quorum 0,1,2 node-19,node-21,node-20 >>> osdmap e18730: 36 osds: 36 up, 30 in >>> pgmap v20851909: 1472 pgs, 7 pools, 8526 GB data, 1881 kobjects >>> 25252 GB used, 17076 GB / 42329 GB avail >>> 591/5778179 objects degraded (0.010%) >>> 1338 active+clean >>> 134 active+recovery_wait+degraded >>> recovery io 191 MB/s, 26 objects/s >>> client io 100654 kB/s rd, 184 MB/s wr, 6303 op/s >>> >>> >>> >>> 2018-03-29 18:22 GMT+02:00 Gregory Farnum <gfar...@redhat.com>: >>> > >>> > On Thu, Mar 29, 2018 at 7:27 AM Damian Dabrowski <scoot...@gmail.com> >>> wrote: >>> >> >>> >> Hello, >>> >> >>> >> Few days ago I had very strange situation. >>> >> >>> >> I had to turn off few OSDs for a while. So I've set flags:noout, >>> >> nobackfill, norecover and then turned off selected OSDs. >>> >> All was ok, but when I started these OSDs again all VMs went down due >>> >> to recovery process(even when recovery priority was very low). >>> > >>> > >>> > So you forbade the OSDs from doing any recovery work, but then you >>> turned on >>> > old ones that required recovery work to function properly? >>> > >>> > And your cluster stopped functioning? >>> > >>> > >>> >> >>> >> >>> >> There's more important config values: >>> >> "osd_recovery_threads": "1", >>> >> "osd_recovery_thread_timeout": "30", >>> >> "osd_recovery_thread_suicide_timeout": "300", >>> >> "osd_recovery_delay_start": "0", >>> >> "osd_recovery_max_active": "1", >>> >> "osd_recovery_max_single_start": "5", >>> >> "osd_recovery_max_chunk": "8388608", >>> >> "osd_client_op_priority": "63", >>> >> "osd_recovery_op_priority": "1", >>> >> "osd_recovery_op_warn_multiple": "16", >>> >> "osd_backfill_full_ratio": "0.85", >>> >> "osd_backfill_retry_interval": "10", >>> >> "osd_backfill_scan_min": "64", >>> >> "osd_backfill_scan_max": "512", >>> >> "osd_kill_backfill_at": "0", >>> >> "osd_max_backfills": "1", >>> >> >>> >> >>> >> >>> >> I don't know why ceph started recovery process if there was >>> >> norecovery&nobackfill flags enabled but the fact is that it killed all >>> >> VMs. >>> > >>> > >>> > Did it actually start recovering? Or you just saw client IO pause? >>> > I confess I don’t know what the behavior will be like with that >>> combined set >>> > of flags, but I rather suspect it did what you told it to, and some >>> PGs went >>> > down as a result. >>> > -Greg >>> > >>> > >>> >> >>> >> >>> >> Next, I've turned off noout, nobackfill, norecover flags and it >>> >> started to look better. VM's went back online and recovery process was >>> >> still going. I didn't saw performance impact on SSD disks but there >>> >> was huge impact on spinners. >>> >> Normally %util is about 25%, but during recovery it was nearly 100%. >>> >> CPU Load increased on HDD based VMs by ~400%. >>> >> >>> >> iostat fragment(during recovery): >>> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >>> >> avgrq-sz avgqu-sz await r_await w_await svctm %util >>> >> sdh 0.30 1.00 150.90 36.00 13665.60 954.60 >>> >> 156.45 10.63 56.88 25.60 188.02 5.34 99.80 >>> >> >>> >> >>> >> Now, I'm little lost, I don't know answers for few questions. >>> >> 1. Why ceph started recovery even if nobackfill&norecovery option was >>> >> enabled? >>> >> 2. Why recovery caused much more performance impact when >>> >> norecovery&nobackfill options was enabled? >>> >> 3. Why when norecovery&nobackfill was turned off, cluster started to >>> >> look better but %util on HDD disks was so big(while >>> >> recovery_op_priority=1 and client_op_priority=63)? 25% is normal, >>> >> increased to 100% during recovery? >>> >> >>> >> >>> >> Cluster information: >>> >> ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90) >>> >> 3x nodes(CPU E5-2630, 32GB RAM, 6xHDD 2TB with SSD journal, 3x SSD 1TB >>> >> with NVMe journal), triple replication >>> >> >>> >> >>> >> I would be very grateful If somebody can help me. >>> >> Sorry if I've done something in wrong way - this is my first time >>> >> writing on mailing list. >>> >> _______________________________________________ >>> >> ceph-users mailing list >>> >> ceph-users@lists.ceph.com >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com