>> My imagination is probably not as good, but if you at time A wallog the 
>> complete map and at A+1 you update a tuple so the visibility bit is cleared 
>> but the map bit change does not happen due to a crash. Then at wal replay 
>> time you restore the map from time A and if the tuple change at A+1 is 
>> represented in the wal stream the you also update the visibility map.  This 
>> is the situation where the heap tuple hit disk but the map is left in a 
>> broken state?  Or is it a different similar looking situation?
> 
> The problem is when a bit is *set* in the visibility map. Clearing a bit is 
> not a problem, we already handle that reliably. If you set the flag on the 
> heap page and set the bit on the visibility map page, and you don't emit a 
> WAL record on either of those operations, the VM page might be flushed to 
> disk before the heap page.

Ah got it, I thought there was an implicit wal stream representing the change. 
Which there isn't. 

My initial suggestion was actually to trash the map on recovery and write it 
safe out on stop and let it be lazily created/updated on reads.  But I can see 
that the different performance patterns from normal operation and fresh 
recovery can be hard to accept although it would be sufficient/acceptable for 
many of us. 

It is nice that a recovery brings the database into the same state as before 
the crash in all perspectives but in real would the application still have a 
huge performance drop due to cold shared buffers and cold page cache on the os.

A database wide select would the be needed to bring the map up to date. Would 
it be ok to update the inmemory bitmap as a sideeffect on selects?


Jesper
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to