I've sporadically encountered a deadlock in fossil. Naturally, when your root 
file system crashes, it can be hard to debug. My solution: stop having a root 
file system. Was able to attach acid using mycroft's tooling from ANTS, and get 
a clean stack trace (https://pixelhero.dev/notebook/fossil/stacks/2023-04-03.1).

After a few hours yesterday 
(https://pixelhero.dev/notebook/fossil/2023-04-03.html 
<https://pixelhero.dev/2023-04-03-fossil.html>), I eventually tracked down the 
deadlock. When blockWrite is told to flush a clean block to disk - i.e. one 
which is already flushed - it removes the block from the cache's free list, 
locks the block, detects that it's clean, and then... drops the reference. 
While keeping the block locked. And in the cache.

This leak of the lock, of course, means that the *next* access to the block - 
which is still in the cache! - hangs indefinitely. This is seen exactly in the 
stack trace:

_cacheLocal grabs the block from the cache, tries to lock it, and hangs 
indefinitely. Worse, it does so under a call to fileWalk, which holds a 
different lock, so the effect spreads out and makes even more of the file 
system inaccessible as well (the fileMetaFlush proc hangs waiting on this file 
lock).

This patch just ensures we call blockPut on the BioClean path as well, thus 
unlocking the block and readding it to the cache's free lists.

The patch is on my branch - 
https://git.sr.ht/~pixelherodev/plan9/commit/1bf8bd4f44e058261da7e89d87527b12073c9e0f
 - but I figured I should probably post it here as well.

If anyone has any other patches that weren't in the 9legacy download as of 
~2018, please let me know! :)

---
sys/src/cmd/fossil/cache.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sys/src/cmd/fossil/cache.c b/sys/src/cmd/fossil/cache.c
index f473d211e..2fec44949 100644
--- a/sys/src/cmd/fossil/cache.c
+++ b/sys/src/cmd/fossil/cache.c
@@ -1203,8 +1203,10 @@ blockWrite(Block *b, int waitlock)
fprint(2, "%s: %d:%x:%d iostate is %d in blockWrite\n",
argv0, bb->part, bb->addr, bb->l.type, bb->iostate);
/* probably BioWriting if it happens? */
-                       if(bb->iostate == BioClean)
+                       if(bb->iostate == BioClean){
+                               blockPut(bb);
goto ignblock;
+                       }
}

blockPut(bb);
--

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Ma4bc3ee3ea80defbb239f382
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to