On Wed, Apr 1, 2020 at 2:40 AM Andres Freund <and...@anarazel.de> wrote: > I added some debug output to print the mapping before/after changes by > MaintainOldSnapshotTimeMapping() (note that I used timestamps relative > to the server start in minutes/seconds to make it easier to interpret). > > And the output turns out to be something like: > > WARNING: old snapshot mapping at "before update" with head ts: 7, current > entries: 8 max entries: 15, offset: 0 > entry 0 (ring 0): min 7: xid 582921233 > entry 1 (ring 1): min 8: xid 654154155 > entry 2 (ring 2): min 9: xid 661972949 > entry 3 (ring 3): min 10: xid 666899382 > entry 4 (ring 4): min 11: xid 644169619 > entry 5 (ring 5): min 12: xid 644169619 > entry 6 (ring 6): min 13: xid 644169619 > entry 7 (ring 7): min 14: xid 644169619 > > WARNING: head 420 s: updating existing bucket 4 for sec 660 with xmin > 666899382 > > WARNING: old snapshot mapping at "after update" with head ts: 7, current > entries: 8 max entries: 15, offset: 0 > entry 0 (ring 0): min 7: xid 582921233 > entry 1 (ring 1): min 8: xid 654154155 > entry 2 (ring 2): min 9: xid 661972949 > entry 3 (ring 3): min 10: xid 666899382 > entry 4 (ring 4): min 11: xid 666899382 > entry 5 (ring 5): min 12: xid 644169619 > entry 6 (ring 6): min 13: xid 644169619 > entry 7 (ring 7): min 14: xid 644169619 > > It's pretty obvious that the xids don't make a ton of sense, I think: > They're not monotonically ordered. The same values exist multiple times, > despite xids being constantly used. Also, despite the ringbuffer > supposedly having 15 entries (that's snapshot_too_old = 5min + the 10 we > always add), and the workload having run for 14min, we only have 8 > entries.
The function header comment for MaintainOldSnapshotTimeMapping could hardly be more vague, as it's little more than a restatement of the function name. However, it looks to me like the idea is that this function might get called multiple times for the same or similar values of whenTaken. I suppose that's the point of this code: else if (ts <= (oldSnapshotControl->head_timestamp + ((oldSnapshotControl->count_used - 1) * USECS_PER_MINUTE))) { /* existing mapping; advance xid if possible */ int bucket = (oldSnapshotControl->head_offset + ((ts - oldSnapshotControl->head_timestamp) / USECS_PER_MINUTE)) % OLD_SNAPSHOT_TIME_MAP_ENTRIES; if (TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[bucket], xmin)) oldSnapshotControl->xid_by_minute[bucket] = xmin; } What I interpret this to be doing is saying - if we got a new call to this function with a rounded-to-the-minute timestamp that we've seen previously and for which we still have an entry, and if the XID passed to this function is newer than the one passed by the previous call, then advance the xid_by_minute[] bucket to the newer value. Now that begs the question - what does this XID actually represent? The comments don't seem to answer that question, not even the comments for OldSnapshotControlData, which say that we should "Keep one xid per minute for old snapshot error handling." but don't say which XIDs we should keep or how they'll be used. However, the only call to MaintainOldSnapshotTimeMapping() is in GetSnapshotData(). It appears that we call this function each time a new snapshot is taken and pass the current time (modulo some fiddling) and snapshot xmin. Given that, one would expect that any updates to the map would be tight races, i.e. a bunch of processes that all took their snapshots right around the same time would all update the same map entry in quick succession, with the newest value winning. And that make the debugging output which I quoted from your message above really confusing. At this point, the "head timestamp" is 7 minutes after this facility started up. The first we entry we have is for minute 7, and the last is for minute 14. But the one we're updating is for minute 11. How the heck can that happen? I might suspect that you'd stopped a process inside GetSnapshotData() with a debugger, but that can't explain it either, because GetSnapshotData() gets the xmin first and only afterwards gets the timestamp - so if you'd stopped for ~3 minutes it just before the call to MaintainOldSnapshotTimeMapping(), it would've been updating the map with an *old* XID. In reality, though, it changed the XID from 644169619 to 666899382, advancing over 22 million XIDs. I don't understand what's going on there. How is this function getting called with a 4-minute old value of whenTaken? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company