On 15.08.2023 12:28, Dag-Erling Smørgrav wrote:
Mateusz Guzik <mjgu...@gmail.com> writes:
Going through the list may or may not reveal other threads doing
something in the area and it very well may be they are deadlocked,
which then results in other processes hanging on them.
Just like in your case the process reported as hung is a random victim
and whatever the real culprit is deeper.
We already know the real culprit, see upthread.
Dag, I looked through the thread once more, and, while thank you for
tracing it, but you never went beyond txg_wait_synced() in `zfs revert`
thread. If you are saying that thread is holding the lock, then the
question is why transaction commit is stuck. I need to see stacks for
ZFS sync threads, or better all kernel stacks, just in case. Without
that information I can only speculate.
Trying to run your test (so far without reproduction) I see it producing
a substantial amount of ZIL writes. The range of commits you reduced
the scope to so far includes my ZIL locking refactoring, where I know
for sure are some deadlocks. I am already waiting for 3 weeks now for
reviews and tests for PR that should fix it:
https://github.com/openzfs/zfs/pull/15122 . It would be good if you
could test it, though it seems to depend on few more earlier patches not
merged to FreeBSD yet.
--
Alexander Motin