On 10/17/2016 18:32, Steven Hartland wrote: > > > On 17/10/2016 22:50, Karl Denninger wrote: >> I will make some effort on the sandbox machine to see if I can come up >> with a way to replicate this. I do have plenty of spare larger drives >> laying around that used to be in service and were obsolesced due to >> capacity -- but what I don't know if whether the system will misbehave >> if the source is all spinning rust. >> >> In other words: >> >> 1. Root filesystem is mirrored spinning rust (production is mirrored >> SSDs) >> >> 2. Backup is mirrored spinning rust (of approx the same size) >> >> 3. Set up auto-snapshot exactly as the production system has now (which >> the sandbox is NOT since I don't care about incremental recovery on that >> machine; it's a sandbox!) >> >> 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for >> the Pi2s I have here, etc) to generate a LOT of filesystem entropy >> across lots of snapshots. >> >> 5. Back that up. >> >> 6. Export the backup pool. >> >> 7. Re-import it and "zfs destroy -r" the backup filesystem. >> >> That is what got me in a reboot loop after the *first* panic; I was >> simply going to destroy the backup filesystem and re-run the backup, but >> as soon as I issued that zfs destroy the machine panic'd and as soon as >> I re-attached it after a reboot it panic'd again. Repeat until I set >> trim=0. >> >> But... if I CAN replicate it that still shouldn't be happening, and the >> system should *certainly* survive attempting to TRIM on a vdev that >> doesn't support TRIMs, even if the removal is for a large amount of >> space and/or files on the target, without blowing up. >> >> BTW I bet it isn't that rare -- if you're taking timed snapshots on an >> active filesystem (with lots of entropy) and then make the mistake of >> trying to remove those snapshots (as is the case with a zfs destroy -r >> or a zfs recv of an incremental copy that attempts to sync against a >> source) on a pool that has been imported before the system realizes that >> TRIM is unavailable on those vdevs. >> >> Noting this: >> >> Yes need to find some time to have a look at it, but given how rare >> this is and with TRIM being re-implemented upstream in a totally >> different manor I'm reticent to spend any real time on it. >> >> What's in-process in this regard, if you happen to have a reference? > Looks like it may be still in review: https://reviews.csiden.org/r/263/ > > Initial attempts to provoke the panic has failed on the sandbox machine -- it appears that I need a materially-fragmented backup volume (which makes sense, as that would greatly increase the number of TRIM's queued.)
Running a bunch of builds with snapshots taken between generates a metric ton of entropy in the filesystem, but it appears that the number of TRIMs actually issued when you bulk-remove them (with zfs destroy -r) is small enough to not cause it -- probably because the system issues one per area of freed disk, and since there is no interleaving with other (non-removed) data that number is "reasonable" since there's little fragmentation of that free space. The TRIMs *are* attempted, and they *do* fail, however..... I'm running with the 6 pages of kstack now on the production machine, and we'll see if I get another panic... -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/
smime.p7s
Description: S/MIME Cryptographic Signature