Has anyone else been unsettled by the occasional messages from fossil saying (1) "could not write super block; waiting 10 seconds" and (2) "blistAlloc: called on clean block"?
Patch fossil-superblock-write gets rid of them. (1) When taking a snapshot, blockWrite in cache.c is called to write an updated super block S, which has a pointer to the root block R for the new epoch. To maintain consistency on the disk, R must be written before S, so blockWrite checks whether R is still in the cache and marked dirty. Very rarely, blockWrite finds R locked (eg because the flush thread is just now writing it), so it gives up and returns zero. The zero return is OK when blockWrite is called by the flush thread, because the flush thread can get on with writing out other blocks before coming back to try the failed block again. But when blockWrite is called by superWrite, there's nothing else to do; hence the 10 second sleep and warning message. The solution is to add a waitlock parameter to blockWrite, so superWrite can tell it to wait for a locked dependent block. (2) After the new super block S is sent to the disk write queue, superWrite removes the previous epoch's root block R' from the active file system. This is normally done by attaching a BList entry to S in the cache, noting that R' must be marked closed after S actually goes to the disk. Rarely, S has already been written by the time blistAlloc is called. In this case the correct thing was being done (just close R' immediately), but a spurious warning was produced.