On 20 September 2016 at 22:46, Robert Haas <robertmh...@gmail.com> wrote:
> I did some cleanup of 0001 (see attached) and was all set to commit it > when I realized what I think is a problem: holding XidGenLock doesn't > seem to help with the race condition between this function and CLOG > truncation, because vac_truncate_clog() updates the shared memory > limits AFTER performing the truncation. Thanks ... and drat. > If the order of those > operations were reversed, we'd be fine, because it would get stuck > trying to update the shared memory limits and wouldn't be able to > truncate until it did - and conversely, if it updated the shared > memory limits before we examined them, that would be OK, too, because > we'd be sure not to consult the pages that are about to be truncated. I'm hesitant to mess with something that fundamental for what I was hoping was a low-impact feature, albeit one that seems to be trying hard not to be at every turn. It looks pretty reasonable to update oldestXid before clog truncation but I don't want to be wrong and create some obscure race or crash recovery issue related to wraparound and clog truncation. It could well be a problem if we're very close to wraparound. So far nothing has had any reason to care about this, since there's no way to attempt to access an older xid in a normally functioning system. The commit timestamp lookup code doesn't care whether the xid is still in clog or not and nothing else does lookups based on xids supplied by the user. If anything else did or does in future it will have the same problem as txid_status. We've already ruled out releasing the slru LWLock in SlruReportIOError then PG_TRY / PG_CATCH()ing clog access errors in txid_status() per my original approach to this issue. So I see a few options now: * Do nothing. Permit this race to exist, document the error, and expect apps to cope. I'm pretty tempted to go for exactly this since it pushes the cost onto users of the feature and doesn't make a mess elsewhere. People who use this will typically be invoking it as a standalone toplevel function anyway, so it's mostly just a bit of noise in the error logs if you lose the race - and we have plenty of other sources of that already. * Rather than calling TransactionIdDidCommit / TransactionIdDidAbort, call clog.c's TransactionIdGetStatus with a new missing_ok flag. That sets a bool* missing param added to SimpleLruReadPage_ReadOnly(...) and in turn to SimpleLruReadPage(...) that's set instead of calling SlruReportIOError(...). This seems rather intrusive and will add little-used params and paths to fairly hot slru.c code so I'm not keen. * Take CLogControlLock LW_SHARED in txid_status() to prevent truncation before reading oldestXid. We'd need a way to pass an "already locked" state through TransactionIdGetStatus(...) to SimpleLruReadPage_ReadOnly(...), which isn't great since again it's pretty hot code. * Don't release the slru LWLock in SlruReportIOError; instead release CLogControlLock from txid_status on clog access failure. As before means PG_TRY / PG_CATCH without PG_RE_THROW, but it means the only place it affects is callers of txid_status(...). For added safety, restrict txid_status() to being called in a toplevel virtual xact so we know we'll finish up promptly. It's a horrible layering violation having txid_status(...) release the clog control lock though, and seems risky. The only non-horrible one of those is IMO to just let the caller see an error if they lose the race. It's a function that's intended for use when you're dealing with indeterminate transaction state after a server or application error anyway, so it's part of an error path already. So I say we just document the behaviour. Because slru.c doesn't release its LWLock on error we also need to ensure txid_status(...) is also only called from a toplevel xact so the user doesn't attempt to wrap it in plpgsql BEGIN ... EXCEPTION block and it causes the xact to abort. Will follow up with just that. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers