Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-04 Thread Andres Freund
On 2013-02-04 17:21:50 +, Simon Riggs wrote: > On 4 February 2013 17:02, Andres Freund wrote: > > > I unfortunately don't yet see a robust way without storing the last used > > horizon :(. > > We can't go backwards, but we can go forwards. > > We can move the next xid forwards by an amount

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-04 Thread Simon Riggs
On 4 February 2013 17:02, Andres Freund wrote: > I unfortunately don't yet see a robust way without storing the last used > horizon :(. We can't go backwards, but we can go forwards. We can move the next xid forwards by an amount equal to the increase in vacuum_defer_cleanup_age. -- Simon Ri

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-04 Thread Andres Freund
On 2013-02-04 11:52:05 -0500, Tom Lane wrote: > Andres Freund writes: > > I absolutely hate to suggest it, but what about storing the last > > vacuum's xmin horizon in the main table's pg_class.options in the back > > branches? > > Not workable. This would require a non-in-place update of the ta

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-04 Thread Tom Lane
Andres Freund writes: > I absolutely hate to suggest it, but what about storing the last > vacuum's xmin horizon in the main table's pg_class.options in the back > branches? Not workable. This would require a non-in-place update of the table's pg_class row (at least in cases where the option was

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-04 Thread Andres Freund
On 2013-02-04 11:07:17 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2013-02-01 19:24:02 -0500, Tom Lane wrote: > >> And as for that, it's been pretty clear for awhile that allowing > >> vacuum_defer_cleanup_age to change on the fly was a bad idea we'd > >> eventually have to undo. The da

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-04 Thread Simon Riggs
On 4 February 2013 16:07, Tom Lane wrote: > Simon, would you revert the vacuum_defer_cleanup_age changes? Will do -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@po

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-04 Thread Tom Lane
Andres Freund writes: > On 2013-02-01 19:24:02 -0500, Tom Lane wrote: >> And as for that, it's been pretty clear for awhile that allowing >> vacuum_defer_cleanup_age to change on the fly was a bad idea we'd >> eventually have to undo. The day of reckoning has arrived: it needs >> to be PGC_POSTMA

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-04 Thread Andres Freund
On 2013-02-01 19:24:02 -0500, Tom Lane wrote: > Robert Haas writes: > > Having said that, I agree that a fix in GetOldestXmin() would be nice > > if we could find one, but since the comment describes at least three > > different ways the value can move backwards, I'm not sure that there's > > real

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 2 February 2013 18:38, Andres Freund wrote: > On 2013-02-02 18:32:44 +, Simon Riggs wrote: >> On 2 February 2013 14:24, Andres Freund wrote: >> >> > b) We don't assign the xmin early enough, we only set it when the first >> > feedback message arrives, but we should set it when walsender st

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Andres Freund
On 2013-02-02 18:32:44 +, Simon Riggs wrote: > On 2 February 2013 14:24, Andres Freund wrote: > > > b) We don't assign the xmin early enough, we only set it when the first > > feedback message arrives, but we should set it when walsender starts > > streaming. > > That's easy to fix. Not tri

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 2 February 2013 00:24, Tom Lane wrote: > * if allDbs is FALSE and there are no transactions running in the current > * database, GetOldestXmin() returns latestCompletedXid. If a transaction > * begins after that, its xmin will include in-progress transactions in other > * databases that st

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 2 February 2013 00:24, Tom Lane wrote: > * The return value is also adjusted with vacuum_defer_cleanup_age, so > * increasing that setting on the fly is another easy way to make > * GetOldestXmin() move backwards, with no consequences for data integrity. > > And as for that, it's been prett

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 2 February 2013 14:24, Andres Freund wrote: > b) We don't assign the xmin early enough, we only set it when the first > feedback message arrives, but we should set it when walsender starts > streaming. That's easy to fix. > c) After a disconnect the feedback message will rather likely ask fo

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Robert Haas
On Fri, Feb 1, 2013 at 6:56 PM, Tom Lane wrote: >> That is admittedly kind of weird behavior, but I think one could >> equally blame this on CLUSTER. This is hardly the first time we've >> had to patch CLUSTER's handling of TOAST tables (cf commits >> 21b446dd0927f8f2a187d9461a0d3f11db836f77, >>

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Robert Haas
On Fri, Feb 1, 2013 at 7:24 PM, Tom Lane wrote: > Robert Haas writes: >> Having said that, I agree that a fix in GetOldestXmin() would be nice >> if we could find one, but since the comment describes at least three >> different ways the value can move backwards, I'm not sure that there's >> reall

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Andres Freund
On 2013-02-01 19:24:02 -0500, Tom Lane wrote: > * There are also replication-related effects: a walsender > * process can set its xmin based on transactions that are no longer running > * in the master but are still being replayed on the standby, thus possibly > * making the GetOldestXmin readi

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 1 February 2013 23:56, Tom Lane wrote: > Robert Haas writes: >> On Fri, Feb 1, 2013 at 2:35 PM, Tom Lane wrote: >>> In any case, I no longer have much faith in the idea that letting >>> GetOldestXmin go backwards is really safe. > >> That is admittedly kind of weird behavior, but I think one

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 1 February 2013 23:56, Tom Lane wrote: > Well, if we were tracking the latest value in shared memory, we could > certainly clamp to that to ensure it didn't go backwards. The problem > is where to find storage for a per-DB value. Adding new data columns to catalogs in backbranches seems like

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-01 Thread Tom Lane
Robert Haas writes: > Having said that, I agree that a fix in GetOldestXmin() would be nice > if we could find one, but since the comment describes at least three > different ways the value can move backwards, I'm not sure that there's > really a practical solution there, especially if you want so

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-01 Thread Tom Lane
Robert Haas writes: > On Fri, Feb 1, 2013 at 2:35 PM, Tom Lane wrote: >> In any case, I no longer have much faith in the idea that letting >> GetOldestXmin go backwards is really safe. > That is admittedly kind of weird behavior, but I think one could > equally blame this on CLUSTER. This is ha

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-01 Thread Robert Haas
On Fri, Feb 1, 2013 at 2:35 PM, Tom Lane wrote: > In any case, I no longer have much faith in the idea that letting > GetOldestXmin go backwards is really safe. That is admittedly kind of weird behavior, but I think one could equally blame this on CLUSTER. This is hardly the first time we've had

[HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-01 Thread Tom Lane
I've been able to reproduce the problem reported by Pius Chan in bug #7819. With some logging added that prints the OldestXmin values used by vacuum and cluster operations, the reason is fairly clear: 2013-02-01 13:41:12 EST 8011 LOG: vacuuming "test" with OldestXmin 1760160 FreezeLimit 4246727