hi - thank you for closing Bug#1027846, my turn to apologize for the delay. february was busy...
On Tue, 2023-02-07 at 18:13 -0500, Nicholas D Steeves wrote: > [...] > > regarding balance recommendations, my initial thought would be to > > make > > the language in the README.Debian stronger. > > Would this be enough to guard naive users from potential data loss? > You > know, the people who read docs as an afterthought, or the "it will be > fine; it won't happen to me" crowd... At the same time, I worry that > there might be a social cost to using stronger language (it could > demotivate developers or scare people off of trying btrfs). What do > you > think? one question would be: should the btrfsmaintenance package provide a convenient way to use an (unnecessary?) feature that might cause data loss? are there cases in which it is recommended for the average (or even somewhat adventurous) user to regularly schedule a btrfs balance? if so, maybe spell out those cases explicitly in the README? from this conversation it does not sound like something that i would regard as "maintenance" in the vein of max-mount-counts/interval-between-checks driven fsck's in the ext{2,3,4} world or periodic btrfs scrubs. otoh, see below... > I agree, the docs should be updated; however, I'll also need to be > prepared to provide citations and reasoning. To be honest, I'm > trusting > a recurring upstream statements on the question of metadata > rebalancing. from a user's perspective it's really hard to judge. for instance, https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Balancing states: "It is _quite_ [their emphasis] useful to balance periodically any Btrfs volume subject to updates." > > i had read that "Some advocate not running it at all" but to me > > that > > implies "may be unnecessary" rather than "may be actively harmful." > > on the basis of the latter i am certainly considering setting the > > balance.timer back to disabled. > > Btw, I'm curious to learn why you've enabled balancing! been a long time, but i _think_ that at one point (related to a disk failure/replacement in raid1 that may not have gone 100% smoothly?) i ended up in an unbalanced situation where i couldn't write even though there was apparently space available. iirc `balance` was needed to get me out of that corner. and then i found Marc MERLIN's `btrfs-scrub` script which included `btrfs balance` commands and he seemed like a knowledgeable source. that in combination w/the btrfs wiki led me to believe it was a safe operation. e.g., in addition to the above, https://btrfs.wiki.kernel.org/index.php/Status lists it as "stable" with only a note about performance. > The "may be actively harmful" bit is tricky, because Btrfs gets way > too > complicated way too fast... My hope to put in place safe defaults > that > work for most people, that don't bait users/sysadmins into taking on > risk, and that are good enough for the general case. Of course you > know > that a solid enough replication and backup strategy makes it ok to > take > risks for optimisation, but that's the "educated/experienced > sysadmin" > class of cases ;) If rebalancing does something like keeping > database > performance from degrading, then it would be worth documenting this > somewhere, along with the fact that several core devs upstream have > written statements to the effect of this: never balance metadata, > unless > necessary. Rebalancing to a new profile counts as "necessary", > obviously, and the only other corner case I'm aware is noted two > paragraphs below. yes, it sounds like one viable approach may be to explicitly document cases where balance is recommended. > > alternatively/additionally are there default options that might > > make > > it safe(r)? > > I believe that [metadata] balancing should probably be disabled in > upstream btrfsmaintenance. If I remember correctly, it's enabled by > default because upstream targets 10year LTS releases such as SUSE's > 11 > series, which uses linux 3.0.76, where the harm reduction of metadata > balancing is significant enough to make the risk worthwhile on > server-grade hardware. based on what has come up in this thread, my feedback would be that the debian pkg should diverge from upstream wrt balancing. > > e.g., Marc MERLIN's > > `btrfs-scrub`[1] (which i used previously) suggested that "a null > > [metadata] rebalance should help corner cases." > > > > Has that corner case existed since linux-3.18? > > https://btrfs.wiki.kernel.org/index.php/Balance_Filters > > Or are you referring to cleaning up inefficient use of metadata after > a batch deletion of thousands of snapshots or subvolumes (all at > once)? sorry, i don't know any details beyond the comments in the script. > > from my pov i'd still like to see the values harmonized. i > > originally > > noticed the inconsistency because what i believed was the default > > setting was creating a seemingly unnecessary systemd override file. > > Sorry, what is this "unnecessary systemd override file"? when i enabled the balance component i initially left `BTRFS_BALANCE_PERIOD="weekly"` which is noted as the default in `/etc/default/btrfsmaintenance`. but then it created a systemd override file with `OnCalendar=weekly` in `/etc/systemd/system/btrfs- balance.timer.d/schedule.conf`. this surprised me because i wouldn't expect this to be necessary when using the default. then i checked `/lib/systemd/system/btrfs-balance.timer` and found that the _actual_ default is `OnCalendar=monthly` and, well, here we are :) [...] > > > > Thus, if I do anything, I'm inclined to set the period for > > > balance to > > > "none" everywhere. > > > > given that one already needs to manually enable the service via > > `systemctl enable btrfs-balance.timer` i don't think it's necessary > > to > > set the default value to "none." this would result in the (imho) > > counter-intuitive behavior of enabling something only to have it do > > nothing. although an add'l comment re: the potential for harm in > > `/etc/default/btrfs` that one would hopefully see when changing the > > value from "none" to e.g., "monthly" may be the best way to ensure > > that > > they are an informed user. so i could go either way on that. > > > > Good point, and yes, I agree that two knobs seems silly; however, > there > is already precedent for this with '"BTRFS_TRIM_PERIOD="none"'. As > there doesn't seem to be any interest in #887461, I'm wondering if > might > be time to follow the consensus of the General Resolution in favour > of > systemd, and soon start shipping systemd timers in an enabled state, > and > disable everything in the config file, but provide suggested > values... sure, that would eliminate the "two knobs" and makes sense from my pov. > > > Also, what do you think about enabling the systemd patch watcher, > > > so > > > that the timers are updated automatically when > > > /etc/default/btrfsmaintenance is modified? > > > > as a sysadmin the steps of modifying a file and then running a > > command > > for it to take effect is a normal part of my workflow. so i'm fine > > leaving it disabled by default. other users might feel > > differently. > > > > I guess keeping it the way it is right now can act as a kind of > safety > gate, so if I install the timers in an enabled state, but disabled, > then > only users who read the docs will be able to update those times. > Maybe > that, plus some "NOT RECOMMENDED" comments in the config file would > be a > nice compromise? also makes sense to me, with a pointer to the README for add'l information. best, andy -- andy <and...@diatribes.org> diatribes