it definitely was not me. My bet would be on rsc, geoff, richard, forsyth, quanstrom or djc.
On Tue, Apr 4, 2023 at 11:05 AM Steve Simon <st...@quintile.net> wrote: > > > was this hard to reproduce? > > i have not seen fossil deadlocking and have used it since i installed my > first home server in 2004. > > there definitely _was_ a problem in the snapshot code which was finally > resolved around 2015 (roughly), i think perhaps skip, or forsyth found it - i > apologise if i have the attribution wrong. > > fossil is also unhelpful if it runs out of space - i don’t believe brucee > ever forgave it for that. > this is less of a problem when it is run with venti of course. > > -Steve > > > On 4 Apr 2023, at 6:16 pm, n...@pixelhero.dev wrote: > > > I've sporadically encountered a deadlock in fossil. Naturally, when your root > file system crashes, it can be hard to debug. My solution: stop having a root > file system. Was able to attach acid using mycroft's tooling from ANTS, and > get a clean stack trace > (https://pixelhero.dev/notebook/fossil/stacks/2023-04-03.1). > > After a few hours yesterday > (https://pixelhero.dev/notebook/fossil/2023-04-03.html), I eventually tracked > down the deadlock. When blockWrite is told to flush a clean block to disk - > i.e. one which is already flushed - it removes the block from the cache's > free list, locks the block, detects that it's clean, and then... drops the > reference. While keeping the block locked. And in the cache. > > This leak of the lock, of course, means that the *next* access to the block - > which is still in the cache! - hangs indefinitely. This is seen exactly in > the stack trace: > > _cacheLocal grabs the block from the cache, tries to lock it, and hangs > indefinitely. Worse, it does so under a call to fileWalk, which holds a > different lock, so the effect spreads out and makes even more of the file > system inaccessible as well (the fileMetaFlush proc hangs waiting on this > file lock). > > This patch just ensures we call blockPut on the BioClean path as well, thus > unlocking the block and readding it to the cache's free lists. > > The patch is on my branch - > https://git.sr.ht/~pixelherodev/plan9/commit/1bf8bd4f44e058261da7e89d87527b12073c9e0f > - but I figured I should probably post it here as well. > > If anyone has any other patches that weren't in the 9legacy download as of > ~2018, please let me know! :) > > --- > sys/src/cmd/fossil/cache.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/sys/src/cmd/fossil/cache.c b/sys/src/cmd/fossil/cache.c > index f473d211e..2fec44949 100644 > --- a/sys/src/cmd/fossil/cache.c > +++ b/sys/src/cmd/fossil/cache.c > @@ -1203,8 +1203,10 @@ blockWrite(Block *b, int waitlock) > fprint(2, "%s: %d:%x:%d iostate is %d in blockWrite\n", > argv0, bb->part, bb->addr, bb->l.type, bb->iostate); > /* probably BioWriting if it happens? */ > - if(bb->iostate == BioClean) > + if(bb->iostate == BioClean){ > + blockPut(bb); > goto ignblock; > + } > } > > blockPut(bb); > -- > > 9fans / 9fans / see discussions + participants + delivery options Permalink ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Mc25a40069de1a1f118f53839 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription