it definitely was not me. My bet would be on rsc, geoff, richard,
forsyth, quanstrom or djc.

On Tue, Apr 4, 2023 at 11:05 AM Steve Simon <st...@quintile.net> wrote:
>
>
> was this hard to reproduce?
>
> i have not seen fossil deadlocking and have used it since i installed my 
> first home server in 2004.
>
> there definitely _was_ a problem in the snapshot code which was finally 
> resolved around 2015 (roughly), i think perhaps skip, or forsyth found it - i 
> apologise if i have the attribution wrong.
>
> fossil is also unhelpful if it runs out of space - i don’t believe brucee 
> ever forgave it for that.
> this is less of a problem when it is run with venti of course.
>
> -Steve
>
>
> On 4 Apr 2023, at 6:16 pm, n...@pixelhero.dev wrote:
>
> 
> I've sporadically encountered a deadlock in fossil. Naturally, when your root 
> file system crashes, it can be hard to debug. My solution: stop having a root 
> file system. Was able to attach acid using mycroft's tooling from ANTS, and 
> get a clean stack trace 
> (https://pixelhero.dev/notebook/fossil/stacks/2023-04-03.1).
>
> After a few hours yesterday 
> (https://pixelhero.dev/notebook/fossil/2023-04-03.html), I eventually tracked 
> down the deadlock. When blockWrite is told to flush a clean block to disk - 
> i.e. one which is already flushed - it removes the block from the cache's 
> free list, locks the block, detects that it's clean, and then... drops the 
> reference. While keeping the block locked. And in the cache.
>
> This leak of the lock, of course, means that the *next* access to the block - 
> which is still in the cache! - hangs indefinitely. This is seen exactly in 
> the stack trace:
>
> _cacheLocal grabs the block from the cache, tries to lock it, and hangs 
> indefinitely. Worse, it does so under a call to fileWalk, which holds a 
> different lock, so the effect spreads out and makes even more of the file 
> system inaccessible as well (the fileMetaFlush proc hangs waiting on this 
> file lock).
>
> This patch just ensures we call blockPut on the BioClean path as well, thus 
> unlocking the block and readding it to the cache's free lists.
>
> The patch is on my branch - 
> https://git.sr.ht/~pixelherodev/plan9/commit/1bf8bd4f44e058261da7e89d87527b12073c9e0f
>  - but I figured I should probably post it here as well.
>
> If anyone has any other patches that weren't in the 9legacy download as of 
> ~2018, please let me know! :)
>
> ---
> sys/src/cmd/fossil/cache.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/sys/src/cmd/fossil/cache.c b/sys/src/cmd/fossil/cache.c
> index f473d211e..2fec44949 100644
> --- a/sys/src/cmd/fossil/cache.c
> +++ b/sys/src/cmd/fossil/cache.c
> @@ -1203,8 +1203,10 @@ blockWrite(Block *b, int waitlock)
> fprint(2, "%s: %d:%x:%d iostate is %d in blockWrite\n",
> argv0, bb->part, bb->addr, bb->l.type, bb->iostate);
> /* probably BioWriting if it happens? */
> - if(bb->iostate == BioClean)
> + if(bb->iostate == BioClean){
> + blockPut(bb);
> goto ignblock;
> + }
> }
>
> blockPut(bb);
> --
>
> 9fans / 9fans / see discussions + participants + delivery options Permalink

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Mc25a40069de1a1f118f53839
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to