On Thu, 17 Jun 2021 at 23:49, Noah Misch wrote:
>
> On Wed, Jun 16, 2021 at 12:00:57PM -0400, Tom Lane wrote:
> > I agree that's a great use-case. I don't like this implementation though.
> > I think if you want to set things up like that, you should draw a line
> > between the tables it's okay f
On Wed, Apr 15, 2020 at 2:21 PM Thomas Munro wrote:
> On Mon, Apr 13, 2020 at 2:58 PM Thomas Munro wrote:
> > On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan wrote:
> > > I'm thinking of a version of "snapshot too old" that amounts to a
> > > statement timeout that gets applied for xmin horizon t
On Wed, Jun 16, 2021 at 12:00:57PM -0400, Tom Lane wrote:
> Greg Stark writes:
> > I think Andres's point earlier is the one that stands out the most for me:
> >
> > > I still think that's the most reasonable course. I actually like the
> > > feature, but I don't think a better implementation of
Greetings,
* Peter Geoghegan (p...@bowt.ie) wrote:
> On Wed, Jun 16, 2021 at 12:06 PM Andres Freund wrote:
> > > I would think that it wouldn't really matter inside VACUUM -- it would
> > > only really need to be either an opportunistic pruning or an
> > > opportunistic index deletion thing -- pr
On Wed, Jun 16, 2021 at 12:06 PM Andres Freund wrote:
> > I would think that it wouldn't really matter inside VACUUM -- it would
> > only really need to be either an opportunistic pruning or an
> > opportunistic index deletion thing -- probably both. Most of the time
> > VACUUM doesn't seem to end
On Wed, Jun 16, 2021 at 11:27 AM Andres Freund wrote:
> 2) Modeling when it is safe to remove row versions. It is easy to remove
>a tuple that was inserted and deleted within one "not needed" xid
>range, but it's far less obvious when it is safe to remove row
>versions where prior/late
Hi,
On 2021-06-16 10:44:49 -0700, Peter Geoghegan wrote:
> On Wed, Jun 16, 2021 at 10:04 AM Tom Lane wrote:
> > Of course, there's still the question of how VACUUM could cheaply
> > apply such info to decide what could be purged.
> I would think that it wouldn't really matter inside VACUUM -- it
On Wed, Jun 16, 2021 at 11:06 AM Andres Freund wrote:
> > 2) (a) Some hackers want the feature gone so they can implement changes
> >without making those changes cooperate with this feature. (b) Bugs in
> > this
> >feature make such cooperation materially harder.
>
> I think the a) part
Hi,
On 2021-06-16 13:04:07 -0400, Tom Lane wrote:
> Yeah, I think this scenario of a few transactions with old snapshots
> and the rest with very new ones could be improved greatly if we exposed
> more info about backends' snapshot state than just "oldest xmin". But
> that might be expensive to d
Hi,
On 2021-06-15 21:59:45 -0700, Noah Misch wrote:
> Hackers are rather wise, but the variety of PostgreSQL use is enormous. We
> see that, among other ways, when regression fixes spike in each vN.1. The
> $SUBJECT feature was born in response to a user experience; a lack of hacker
> interest d
On Wed, Jun 16, 2021 at 10:04 AM Tom Lane wrote:
> I remember that Heikki was fooling with a patch that reduced snapshots
> to LSNs. If we got that done, it'd be practical to expose complete
> info about backends' snapshot state in a lot of cases (i.e., anytime
> you had less than N live snapshot
Stephen Frost writes:
> I've long felt that the appropriate approach to addressing that is to
> improve on VACUUM and find a way to do better than just having the
> conditional of 'xmax < global min' drive if we can mark a given tuple as
> no longer visible to anyone.
Yeah, I think this scenario
Greetings,
* Greg Stark (st...@mit.edu) wrote:
> I think Andres's point earlier is the one that stands out the most for me:
>
> > I still think that's the most reasonable course. I actually like the
> > feature, but I don't think a better implementation of it would share
> > much if any of the cu
Greg Stark writes:
> Fwiw I too think the basic idea of the feature is actually awesome.
> There are tons of use cases where you might have one long-lived
> transaction working on a dedicated table (or even a schema) that will
> never look at the rapidly mutating tables in another schema and never
I think Andres's point earlier is the one that stands out the most for me:
> I still think that's the most reasonable course. I actually like the
> feature, but I don't think a better implementation of it would share
> much if any of the current infrastructure.
That makes me wonder whether rippin
On Tue, Jun 15, 2021 at 11:24 PM Noah Misch wrote:
> When I say "some hackers", I don't mean that specific people think such
> thoughts right now. I'm saying that the expected cost of future cooperation
> with the feature is nonzero, and bugs in the feature raise that cost.
I see.
> > > A hacke
On Tue, Jun 15, 2021 at 10:47:45PM -0700, Peter Geoghegan wrote:
> On Tue, Jun 15, 2021 at 9:59 PM Noah Misch wrote:
> > Hackers are rather wise, but the variety of PostgreSQL use is enormous. We
> > see that, among other ways, when regression fixes spike in each vN.1. The
> > $SUBJECT feature w
On Tue, Jun 15, 2021 at 9:59 PM Noah Misch wrote:
> Hackers are rather wise, but the variety of PostgreSQL use is enormous. We
> see that, among other ways, when regression fixes spike in each vN.1. The
> $SUBJECT feature was born in response to a user experience; a lack of hacker
> interest doe
On Tue, Jun 15, 2021 at 02:32:11PM -0700, Peter Geoghegan wrote:
> What I had in mind was this: a committer adopting the feature
> themselves. The committer would be morally obligated to maintain the
> feature on an ongoing basis, just as if they were the original
> committer. This seems like the o
On Wed, Jun 16, 2021 at 7:17 AM Robert Haas wrote:
> Progress has been pretty limited, but not altogether nonexistent.
> 55b7e2f4d78d8aa7b4a5eae9a0a810601d03c563 fixed, or at least seemed to
> fix, the time->XID mapping, which is one of the main things that
> Andres said was broken originally. Als
Hi,
On 2021-06-15 15:17:05 -0400, Robert Haas wrote:
> But that's not clear to me. I'm not clear how exactly how many
> problems we know about and need to fix in order to keep the feature,
> and I'm also not clear how deep the hole goes. Like, if we need to get
> a certain number of specific bugs
Peter Geoghegan writes:
> What I had in mind was this: a committer adopting the feature
> themselves. The committer would be morally obligated to maintain the
> feature on an ongoing basis, just as if they were the original
> committer. This seems like the only sensible way of resolving this
> iss
On Wed, Apr 1, 2020 at 4:59 PM Andres Freund wrote:
> The primary issue here is that there is no TestForOldSnapshot() in
> heap_hot_search_buffer(). Therefore index fetches will never even try to
> detect that tuples it needs actually have already been pruned away.
This is still true, right? Nobo
On Tue, Jun 15, 2021 at 12:17 PM Robert Haas wrote:
> My general point here is that I would like to know whether we have a
> finite number of reasonably localized bugs or a three-ring disaster
> that is unrecoverable no matter what we do. Andres seems to think it
> is the latter, and I *think* Pet
On Tue, Jun 15, 2021 at 12:49 PM Tom Lane wrote:
> Robert Haas writes:
> > My general point here is that I would like to know whether we have a
> > finite number of reasonably localized bugs or a three-ring disaster
> > that is unrecoverable no matter what we do. Andres seems to think it
> > is t
Robert Haas writes:
> My general point here is that I would like to know whether we have a
> finite number of reasonably localized bugs or a three-ring disaster
> that is unrecoverable no matter what we do. Andres seems to think it
> is the latter, and I *think* Peter Geoghegan agrees, but I think
On Tue, Jun 15, 2021 at 12:51 PM Tom Lane wrote:
> So, it's well over a year later, and so far as I can see exactly
> nothing has been done about snapshot_too_old's problems.
Progress has been pretty limited, but not altogether nonexistent.
55b7e2f4d78d8aa7b4a5eae9a0a810601d03c563 fixed, or at le
On Tue, Jun 15, 2021 at 11:01 AM Tom Lane wrote:
> The goal I have in mind is for snapshot_too_old to be fixed or gone
> in v15. I don't feel a need to force the issue sooner than that, so
> there's plenty of time for someone to step up, if anyone wishes to.
Seems more than reasonable to me. A y
Peter Geoghegan writes:
> On Tue, Jun 15, 2021 at 9:51 AM Tom Lane wrote:
>> So, it's well over a year later, and so far as I can see exactly
>> nothing has been done about snapshot_too_old's problems.
> I propose that the revert question be explicitly timeboxed. If the
> issues haven't been fix
On Tue, Jun 15, 2021 at 9:51 AM Tom Lane wrote:
> So, it's well over a year later, and so far as I can see exactly
> nothing has been done about snapshot_too_old's problems.
FWIW I think that the concept itself is basically reasonable. The
implementation is very flawed, though, so it hardly enter
Hi,
On 2021-06-15 12:51:28 -0400, Tom Lane wrote:
> Robert Haas writes:
> > Oh, maybe I'm the one who misunderstood...
>
> So, it's well over a year later, and so far as I can see exactly
> nothing has been done about snapshot_too_old's problems.
>
> I never liked that feature to begin with, an
Robert Haas writes:
> Oh, maybe I'm the one who misunderstood...
So, it's well over a year later, and so far as I can see exactly
nothing has been done about snapshot_too_old's problems.
I never liked that feature to begin with, and I would be very
glad to undertake the task of ripping it out.
On Fri, Apr 17, 2020 at 4:40 PM Thomas Munro wrote:
> I understood that you'd forked a new thread to discuss one particular
> problem among the many that Andres nailed to the door, namely the xid
> map's failure to be monotonic, and here I was responding to other
> things from his list, namely the
On Sat, Apr 18, 2020 at 12:19 AM Robert Haas wrote:
> On Thu, Apr 16, 2020 at 11:37 PM Thomas Munro wrote:
> > Then of course frozenXID can be advanced with eg update pg_database
> > set datallowconn = 't' where datname = 'template0', then vacuumdb
> > --freeze --all, and checked before and afte
On Thu, Apr 16, 2020 at 11:37 PM Thomas Munro wrote:
> Then of course frozenXID can be advanced with eg update pg_database
> set datallowconn = 't' where datname = 'template0', then vacuumdb
> --freeze --all, and checked before and after with Robert's
> pg_old_snapshot_time_mapping() SRF to see t
On Fri, Apr 17, 2020 at 3:37 PM Thomas Munro wrote:
> On Mon, Apr 13, 2020 at 5:14 PM Andres Freund wrote:
> > FWIW, I think the part that is currently harder to fix is the time->xmin
> > mapping and some related pieces. Second comes the test
> > infrastructure. Compared to those, adding addition
On Mon, Apr 13, 2020 at 5:14 PM Andres Freund wrote:
> FWIW, I think the part that is currently harder to fix is the time->xmin
> mapping and some related pieces. Second comes the test
> infrastructure. Compared to those, adding additional checks for old
> snapshots wouldn't be too hard - although
On Mon, Apr 13, 2020 at 2:58 PM Thomas Munro wrote:
> On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan wrote:
> > I think that it's worth considering whether or not there are a
> > significant number of "snapshot too old" users that rarely or never
> > rely on old snapshots used by new queries. Kev
Hi,
On 2020-04-13 14:58:34 +1200, Thomas Munro wrote:
> On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan wrote:
> > I think that it's worth considering whether or not there are a
> > significant number of "snapshot too old" users that rarely or never
> > rely on old snapshots used by new queries. K
On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan wrote:
> I think that it's worth considering whether or not there are a
> significant number of "snapshot too old" users that rarely or never
> rely on old snapshots used by new queries. Kevin said that this
> happens "in some cases", but how many cas
On Sat, Apr 4, 2020 at 12:33 AM Andres Freund wrote:
>
> On 2020-04-03 14:32:09 +0530, Amit Kapila wrote:
> >
> > Agreed, but OTOH, not giving time to Kevin or others who might be
> > interested to support this work is also not fair. I think once
> > somebody comes up with patches for problems we
Hi,
On 2020-04-03 14:32:09 +0530, Amit Kapila wrote:
> On Fri, Apr 3, 2020 at 6:52 AM Peter Geoghegan wrote:
> >
> > On Thu, Apr 2, 2020 at 5:17 PM Andres Freund wrote:
> > > Since this is a feature that can result in wrong query results (and
> > > quite possibly crashes / data corruption), I do
On Fri, Apr 3, 2020 at 6:52 AM Peter Geoghegan wrote:
>
> On Thu, Apr 2, 2020 at 5:17 PM Andres Freund wrote:
> > Since this is a feature that can result in wrong query results (and
> > quite possibly crashes / data corruption), I don't think we can just
> > leave this unfixed. But given the amo
On Thu, Apr 2, 2020 at 5:17 PM Andres Freund wrote:
> Since this is a feature that can result in wrong query results (and
> quite possibly crashes / data corruption), I don't think we can just
> leave this unfixed. But given the amount of code / infrastructure
> changes required to get this into
Hi,
On 2020-04-01 12:02:18 -0400, Robert Haas wrote:
> I have no objection to the idea that *if* the feature is hopelessly
> broken, it should be removed.
I don't think we have a real choice here at this point, at least for the
back branches.
Just about nothing around old_snapshot_threshold work
Hi,
I just spend a good bit more time improving my snapshot patch, so it
could work well with a fixed version of the old_snapshot_threshold
feature. Mostly so there's no unnecessary dependency on the resolution
of the issues in that patch.
When testing my changes, for quite a while, I could not
On Thu, Apr 2, 2020 at 11:28 AM Peter Geoghegan wrote:
> In conclusion, I share Andres' concerns here. There are glaring
> problems with how we manipulate the data structure that controls the
> effective horizon for pruning. Maybe they can be fixed while leaving
> the code that manages the OldSnap
On Tue, Mar 31, 2020 at 11:40 PM Andres Freund wrote:
> The problem, as far as I can tell, is that
> oldSnapshotControl->head_timestamp appears to be intended to be the
> oldest value in the ring. But we update it unconditionally in the "need
> a new bucket, but it might not be the very next one"
Hi,
On April 2, 2020 9:36:32 AM PDT, Kevin Grittner wrote:
>On Wed, Apr 1, 2020 at 7:17 PM Andres Freund
>wrote:
>
>> FWIW, with autovacuum=off the query does not get killed until a
>manual
>> vacuum, nor if fewer rows are deleted and the table has previously
>been
>> vacuumed.
>>
>> The vacuum
On Wed, Apr 1, 2020 at 7:17 PM Andres Freund wrote:
> FWIW, with autovacuum=off the query does not get killed until a manual
> vacuum, nor if fewer rows are deleted and the table has previously been
> vacuumed.
>
> The vacuum in the second session isn't required. There just needs to be
> somethin
On Wed, Apr 1, 2020 at 6:59 PM Andres Freund wrote:
> index fetches will never even try to
> detect that tuples it needs actually have already been pruned away.
>
I looked at this flavor of problem today and from what I saw:
(1) This has been a problem all the way back to 9.6.0.
(2) The behavio
Hi,
On 2020-04-01 17:54:06 -0700, Andres Freund wrote:
> * Check whether the given snapshot is too old to have safely read the given
> * page from the given table. If so, throw a "snapshot too old" error.
> *
> * This test generally needs to be performed after every BufferGetPage() call
> *
On Wed, Apr 1, 2020 at 5:54 PM Andres Freund wrote:
> As far as I can tell there's not sufficient in-tree explanation of when
> code needs to test for an old snapshot. There's just the following
> comment above TestForOldSnapshot():
> * Check whether the given snapshot is too old to have safely r
Hi,
On 2020-04-01 16:59:51 -0700, Andres Freund wrote:
> The primary issue here is that there is no TestForOldSnapshot() in
> heap_hot_search_buffer(). Therefore index fetches will never even try to
> detect that tuples it needs actually have already been pruned away.
bitmap heap scan doesn't hav
On Wed, Apr 1, 2020 at 4:59 PM Andres Freund wrote:
> Thanks, that's super helpful.
Glad I could help.
> I got a bit confused here - you seemed to have switched session 1 and 2
> around? Doesn't seem to matter much though, I was able to reproduce this.
Yeah, I switched the session numbers becau
Hi,
On 2020-04-01 16:59:51 -0700, Andres Freund wrote:
> The primary issue here is that there is no TestForOldSnapshot() in
> heap_hot_search_buffer(). Therefore index fetches will never even try to
> detect that tuples it needs actually have already been pruned away.
FWIW, with autovacuum=off th
Hi,
On 2020-04-01 15:30:39 -0700, Peter Geoghegan wrote:
> On Wed, Apr 1, 2020 at 3:00 PM Peter Geoghegan wrote:
> > I like that idea. I think that I've spotted what may be an independent
> > bug, but I have to wait around for a minute or two to reproduce it
> > each time. Makes it hard to get to
Hi,
On 2020-04-01 14:11:11 -0700, Andres Freund wrote:
> As far as I can tell, with a large old_snapshot_threshold, it can take a
> very long time to get to a head_timestamp that's old enough for
> TransactionIdLimitedForOldSnapshots() to do anything. Look at this
> trace of a pgbench run with ol
On Wed, Apr 1, 2020 at 3:00 PM Peter Geoghegan wrote:
> I like that idea. I think that I've spotted what may be an independent
> bug, but I have to wait around for a minute or two to reproduce it
> each time. Makes it hard to get to a minimal test case.
I now have simple steps to reproduce a bug
On Wed, Apr 1, 2020 at 1:25 PM Robert Haas wrote:
> Maybe that contrib module could even have some functions to simulate
> aging without the passage of any real time. Like, say you have a
> function or procedure old_snapshot_pretend_time_has_passed(integer),
> and it moves oldSnapshotControl->head
Hi,
On 2020-04-01 15:11:52 -0500, Kevin Grittner wrote:
> On Wed, Apr 1, 2020 at 2:43 PM Andres Freund wrote:
>
> > The thing that makes me really worried is that the contents of the time
> > mapping seem very wrong. I've reproduced query results in a REPEATABLE
> > READ transaction changing (pru
On Wed, Apr 1, 2020 at 3:43 PM Andres Freund wrote:
> The thing that makes me really worried is that the contents of the time
> mapping seem very wrong. I've reproduced query results in a REPEATABLE
> READ transaction changing (pruned without triggering an error). And I've
> reproduced rows not ge
On Wed, Apr 1, 2020 at 2:43 PM Andres Freund wrote:
> The thing that makes me really worried is that the contents of the time
> mapping seem very wrong. I've reproduced query results in a REPEATABLE
> READ transaction changing (pruned without triggering an error).
That is a very big problem. O
Hi,
Nice to have you back for a bit! Even if the circumstances aren't
great...
It's very understandable that the lists are past your limits, I barely
keep up these days. Without any health issues.
On 2020-04-01 14:10:09 -0500, Kevin Grittner wrote:
> Perhaps the lack of evidence for usage in th
On Wed, Apr 1, 2020 at 10:09 AM Andres Freund wrote:
First off, many thanks to Andres for investigating this, and apologies for
the bugs. Also thanks to Michael for making sure I saw the thread. I must
also apologize that for not being able to track the community lists
consistently due to healt
On Wed, Apr 1, 2020 at 2:37 PM Andres Freund wrote:
> Just continuing is easier said than done. Especially with the background
> of knowing that several users had hit the bug that allowed all of the
> above to be hit, and that advancing relfrozenxid further would make it
> worse.
Fair point, but
Hi,
On 2020-04-01 13:27:56 -0400, Robert Haas wrote:
> Perhaps "irresponsible" is the wrong word, but it's certainly caused
> problems for multiple EnterpriseDB customers, and in my view, those
> problems weren't necessary. Either a WARNING or an ERROR would have
> shown up in the log, but an ERRO
Hi,
On 2020-04-01 11:04:43 -0700, Peter Geoghegan wrote:
> On Wed, Apr 1, 2020 at 10:28 AM Robert Haas wrote:
> > Is there any chance that you're planning to look into the details?
> > That would certainly be welcome from my perspective.
+1
This definitely needs more eyes. I am not even close t
On Wed, Apr 1, 2020 at 10:28 AM Robert Haas wrote:
> Sure, but not all levels of risk are equal. Jumping out of a plane
> carries some risk of death whether or not you have a parachute, but
> that does not mean that we shouldn't worry about whether you have one
> or not before you jump.
>
> In thi
Hi,
On 2020-04-01 12:02:18 -0400, Robert Haas wrote:
> On Wed, Apr 1, 2020 at 11:09 AM Andres Freund wrote:
> > There's really no reason at all to have bins of one minute. As it's a
> > PGC_POSTMASTER GUC, it should just have didided time into bins of
> > (old_snapshot_threshold * USEC_PER_SEC) /
On Wed, Apr 1, 2020 at 1:03 PM Peter Geoghegan wrote:
> I don't think that it's fair to characterize Andres' actions in that
> situation as in any way irresponsible. We had an extremely complicated
> data corruption bug that he went to great lengths to fix, following
> two other incorrect fixes. H
On Wed, Apr 1, 2020 at 9:02 AM Robert Haas wrote:
> I complained
> when you added those error checks to vacuum in back-branches, and
> since that release went out people are regularly tripping those checks
> and taking prolonged outages for a problem that wasn't making them
> unhappy before. I kno
On Wed, Apr 1, 2020 at 11:09 AM Andres Freund wrote:
> That doesn't exist in all the back branches. Think it'd be easier to add
> code to explicitly prune it during MaintainOldSnapshotTimeMapping().
That's reasonable.
> There's really no reason at all to have bins of one minute. As it's a
> PGC_
Hi,
On 2020-04-01 11:15:14 -0400, Robert Haas wrote:
> On Wed, Apr 1, 2020 at 2:40 AM Andres Freund wrote:
> > I added some debug output to print the mapping before/after changes by
> > MaintainOldSnapshotTimeMapping() (note that I used timestamps relative
> > to the server start in minutes/secon
On Wed, Apr 1, 2020 at 2:40 AM Andres Freund wrote:
> I added some debug output to print the mapping before/after changes by
> MaintainOldSnapshotTimeMapping() (note that I used timestamps relative
> to the server start in minutes/seconds to make it easier to interpret).
>
> And the output turns o
Hi,
On 2020-04-01 10:01:07 -0400, Robert Haas wrote:
> On Wed, Apr 1, 2020 at 2:40 AM Andres Freund wrote:
> > The problem is that there's no protection again the xids in the
> > ringbuffer getting old enough to wrap around. Given that practical uses
> > of old_snapshot_threshold are likely to be
Hi,
On 2020-03-31 23:40:08 -0700, Andres Freund wrote:
> I added some debug output to print the mapping before/after changes by
> MaintainOldSnapshotTimeMapping() (note that I used timestamps relative
> to the server start in minutes/seconds to make it easier to interpret).
Now attached.
Greetin
On Wed, Apr 1, 2020 at 2:40 AM Andres Freund wrote:
> The problem is that there's no protection again the xids in the
> ringbuffer getting old enough to wrap around. Given that practical uses
> of old_snapshot_threshold are likely to be several hours to several
> days, that's not particularly hard
78 matches
Mail list logo