> Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample?
> Though I also tried to test the effect of it, I was unable to find it
> in the conf file.
Using it has some bugs we need to clean up, apparently.
--Josh Berkus
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgr
Fujii Masao wrote:
> On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus wrote:
> > I've been playing with vacuum_defer_cleanup_age in reference to the
> > query cancel problem. ?It really seems to me that this is the way
> > forward in terms of dealing with query cancel for normal operation
> > rather t
On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus wrote:
> I've been playing with vacuum_defer_cleanup_age in reference to the
> query cancel problem. It really seems to me that this is the way
> forward in terms of dealing with query cancel for normal operation
> rather than wal_standby_delay, or may
On 3/10/10 3:38 AM, Greg Stark wrote:
> I think that means that a
> vacuum_defer_cleanup of up to about 100 or so (it depends on the width
> of your counter record) might be reasonable as a general suggestion
> but anything higher will depend on understanding the specific system.
100 wouldn't be u
On Wed, Mar 10, 2010 at 6:29 AM, Josh Berkus wrote:
> Then I increased vacuum_defer_cleanup_age to 10, which represents
> about 5 minutes of transactions on the test system. This eliminated all
> query cancels for the reporting query, which takes an average of 10s.
>
> Next is a database bloa
All,
I've been playing with vacuum_defer_cleanup_age in reference to the
query cancel problem. It really seems to me that this is the way
forward in terms of dealing with query cancel for normal operation
rather than wal_standby_delay, or maybe in combination with it.
As a first test, I set up a
Greg Smith wrote:
> Bruce Momjian wrote:
> >> Right now you can't choose "master bloat", but you can choose the other
> >> two. I think that is acceptable for 9.0, assuming the other two don't
> >> have the problems that Tom foresees.
> >>
> >
> > I was wrong. You can choose "master bloat" w
Greg Stark wrote:
> On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus wrote:
> > I don't think that defer_cleanup_age is a long-term solution. ?But we
> > need *a* solution which does not involve delaying 9.0.
>
> So I think the primary solution currently is to raise max_standby_age.
>
> However there
Greg Stark wrote:
> On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus wrote:
> > I don't think that defer_cleanup_age is a long-term solution. ?But we
> > need *a* solution which does not involve delaying 9.0.
>
> So I think the primary solution currently is to raise max_standby_age.
>
> However there
On Sun, 2010-02-28 at 16:56 +0100, Joachim Wieland wrote:
> Now let's take a look at both scenarios from the administrators' point
> of view:
Well argued, agree with all of your points.
--
Simon Riggs www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgr
On Mon, 2010-03-01 at 14:43 -0500, Tom Lane wrote:
> Speaking of which, does the current HS+SR code have a
> provision to force the standby to stop tracking WAL and come up live,
> even when there's more WAL available?
Yes, trigger file.
--
Simon Riggs www.2ndQuadrant.com
--
Sent v
On Mon, 2010-03-01 at 12:04 -0800, Josh Berkus wrote:
> does anyone dispute his analysis? Simon?
No dispute. I think I've discussed this before.
--
Simon Riggs www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscr
Bruce Momjian wrote:
Right now you can't choose "master bloat", but you can choose the other
two. I think that is acceptable for 9.0, assuming the other two don't
have the problems that Tom foresees.
I was wrong. You can choose "master bloat" with
vacuum_defer_cleanup_age, but only crude
On 3/2/10 10:30 AM, Bruce Momjian wrote:
> Right now you can't choose "master bloat", but you can choose the other
> two. I think that is acceptable for 9.0, assuming the other two don't
> have the problems that Tom foresees.
Actually, if vacuum_defer_cleanup_age can be used, "master bloat" is an
Bruce Momjian wrote:
> > 'max_standby_delay = -1' is really only a reasonable idea if you are
> > absolutely certain all queries are going to be short, which we can't
> > dismiss as an unfounded use case so it has value. I would expect you
> > have to also combine it with a matching reasonable
Greg Smith wrote:
> > I assumed they would set max_standby_delay = -1 and be happy.
> >
>
> The admin in this situation might be happy until the first time the
> primary fails and a failover is forced, at which point there is an
> unbounded amount of recovery data to apply that was stuck wait
Robert Haas wrote:
I just read through the current documentation and it doesn't really
seem to explain very much about how HS decides which queries to kill.
Can someone try to flesh that out a bit?
I believe it just launches on a mass killing spree once things like
max_standby_delay expire. T
Bruce Momjian wrote:
Joachim Wieland wrote:
1) With the current implementation they will see better performance on
the master and more aggressive vacuum (!), since they have less
long-running queries now on the master and autovacuum can kick in and
clean up with less delay than before. On the
Josh Berkus wrote:
> HS+SR is still a tremendous improvement over the options available
> previously. We never thought it was going to work for everyone
> everywhere, and shouldn't let our project's OCD tendencies run away from us.
OCD (Obsessive-Compulsive Disorder) --- good one. :-)
--
Bru
Joachim Wieland wrote:
> 1) With the current implementation they will see better performance on
> the master and more aggressive vacuum (!), since they have less
> long-running queries now on the master and autovacuum can kick in and
> clean up with less delay than before. On the other hand their q
* Tom Lane [100301 20:04]:
> Greg Stark writes:
> > josh, nobody is talking about it because it doesn't make sense. you could
> > only retry if it was the first query in the transaction and only if you
> > could prove there were no side-effects outside the database and then you
> > would have no
On Mon, Mar 1, 2010 at 5:32 PM, Josh Berkus wrote:
> On 2/28/10 7:12 PM, Robert Haas wrote:
>>> However, I'd still like to hear from someone with the requisite
>>> > technical knowledge whether capturing and retrying the current query in
>>> > a query cancel is even possible.
>>
>> I'm not sure wh
Josh Berkus wrote:
However, this leaves aside Greg's point about snapshot age and
successive queries; does anyone dispute his analysis? Simon?
There's already a note on the Hot Standby TODO about unexpectly bad
max_standby_delay behavior being possible on an idle system, with no
suggested
Greg Stark writes:
> josh, nobody is talking about it because it doesn't make sense. you could
> only retry if it was the first query in the transaction and only if you
> could prove there were no side-effects outside the database and then you
> would have no reason to think the retry would be any
josh, nobody is talking about it because it doesn't make sense. you could
only retry if it was the first query in the transaction and only if you
could prove there were no side-effects outside the database and then you
would have no reason to think the retry would be any more likely to work.
greg
Josh Berkus wrote:
> It's undeniable that auto-retry would be better from a user's
> perspective than a user-visible cancel. So if it's *reasonable*
> to implement, I think we should be working on it. I'm also very
> puzzled as to why nobody else wants to even discuss it; it's like
> some wier
On 2/28/10 7:12 PM, Robert Haas wrote:
>> However, I'd still like to hear from someone with the requisite
>> > technical knowledge whether capturing and retrying the current query in
>> > a query cancel is even possible.
>
> I'm not sure who you want to hear from here, but I think that's a dead en
On 3/1/10 11:43 AM, Tom Lane wrote:
> Stefan Kaltenbrunner writes:
>> Greg Stark wrote:
>>> For what it's worth Oracle has an option to have your standby
>>> intentionally hold back n minutes behind and I've seen that set to 5
>>> minutes.
>
>> yeah a lot of people are doing that intentionally...
Stefan Kaltenbrunner writes:
> Greg Stark wrote:
>> For what it's worth Oracle has an option to have your standby
>> intentionally hold back n minutes behind and I've seen that set to 5
>> minutes.
> yeah a lot of people are doing that intentionally...
It's the old DBA screwup safety valve ... d
Greg Stark wrote:
On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus wrote:
Completely aside from that, how many users are going to be happy with a
slave server which is constantly 5 minutes behind?
Uhm, well all the ones who are happy with our current warm standby
setup for one?
And all the ones
On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus wrote:
> Completely aside from that, how many users are going to be happy with a
> slave server which is constantly 5 minutes behind?
>
Uhm, well all the ones who are happy with our current warm standby
setup for one?
And all the ones who are looking f
> So I think the primary solution currently is to raise max_standby_age.
>
> However there is a concern with max_standby_age. If you set it to,
> say, 300s. Then run a 300s query on the slave which causes the slave
> to fall 299s behind. Now you start a new query on the slave -- it gets
> a snaps
On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus wrote:
> I don't think that defer_cleanup_age is a long-term solution. But we
> need *a* solution which does not involve delaying 9.0.
So I think the primary solution currently is to raise max_standby_age.
However there is a concern with max_standby_a
Josh Berkus wrote:
And I think we can measure bloat in a pgbench test, no? When I get a
chance, I'll run one for a couple hours and see the difference that
cleanup_age makes.
The test case I attached at the start of this thread runs just the
UPDATE to the tellers table. Running something
On 2/28/10 7:00 PM, Greg Smith wrote:
> The main problem with setting vacuum_defer_cleanup_age high isn't
> showing it works, it's a pretty simple bit of code. It's when you
> recognize that it penalizes all cleanup all the time, whether or not the
> standby is actually executing a long-running qu
On Sun, Feb 28, 2010 at 5:38 PM, Josh Berkus wrote:
> Greg, Joachim,
>
>> As I see it, the main technical obstacle here is that a subset of a
>> feature already on the SR roadmap needs to get built earlier than
>> expected to pull this off. I don't know about Tom, but I have no
>> expectation it'
Josh Berkus wrote:
Well, we could throw this on the user if we could get them some
information on how to calculate that number. For example, some way for
them to calculate the number of XIDs per minute via a query, and then
set vacuum_defer_cleanup_age appropriately on the master. Sure, it's
cl
Greg, Joachim,
> As I see it, the main technical obstacle here is that a subset of a
> feature already on the SR roadmap needs to get built earlier than
> expected to pull this off. I don't know about Tom, but I have no
> expectation it's possible for me to get up to speed on that code fast
> eno
On Sun, Feb 28, 2010 at 8:47 PM, Josh Berkus wrote:
> 1) Automated retry of cancelled queries on the slave. I have no idea
> how hard this would be to implement, but it makes the difference between
> writing lots of exception-handling code for slave connections
> (unacceptable) to just slow respo
Josh Berkus writes:
> 2) A more usable vacuum_defer_cleanup_age. If it was feasible for a
> user to configure the master to not vacuum records less than, say, 5
> minutes dead, then that would again offer the choice to the user of
> slightly degraded performance on the master (acceptable) vs. lot
Josh Berkus wrote:
First, from the nature of the arguments, we need to eventually have both
versions of SR: delay-based and xmin-pub. And it would be fantastic if
Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
ready as well.
As I see it, the main technical obstacle h
Joachim Wieland wrote:
Instead, I assume that most people who will grab 9.0 and use HS+SR do
already have a database with a certain query profile. Now with HS+SR
they will try to put the most costly and longest read-only queries to
the standby but in the end will run the same number of queries wi
All,
First, from the nature of the arguments, we need to eventually have both
versions of SR: delay-based and xmin-pub. And it would be fantastic if
Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
ready as well.
I also think, based on the discussion and Greg's test case, t
On Sun, Feb 28, 2010 at 2:54 PM, Greg Stark wrote:
> Really? I think we get lots of suprised wows from the field from the
> idea that a long-running read-only query can cause your database to
> bloat. I think the only reason that's obvious to us is that we've been
> grappling with that problem for
On Sun, Feb 28, 2010 at 6:07 AM, Greg Smith wrote:
> Not forced to--have the option of. There are obviously workloads where you
> wouldn't want this. At the same time, I think there are some pretty common
> ones people are going to expect HS+SR to work on transparently where this
> would obvious
Robert Haas wrote:
It seems to me that if we're forced to pass the xmin from the
slave back to the master, that would be a huge step backward in terms
of both scalability and performance, so I really hope it doesn't come
to that.
Not forced to--have the option of. There are obviously workloads
On Sun, Feb 28, 2010 at 5:28 AM, Greg Smith wrote:
> The idea of the workaround is that if you have a single long-running query
> to execute, and you want to make sure it doesn't get canceled because of a
> vacuum cleanup, you just have it connect back to the master to keep an open
> snapshot the
On Fri, Feb 26, 2010 at 1:53 PM, Tom Lane wrote:
> Greg Stark writes:
>> In the model you describe any long-lived queries on the slave cause
>> tables in the master to bloat with dead records.
>
> Yup, same as they would do on the master.
>
>> I think this model is on the roadmap but it's not app
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing. It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going t
Josh Berkus wrote:
>> That is exactly the core idea I was trying to suggest in my rambling
>> message. Just that small additional bit of information transmitted and
>> published to the master via that route, and it's possible to optimize
>> this problem in a way not available now. And it's a way
Greg Stark wrote:
> On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane wrote:
>> There's *definitely* not going to be enough information in the WAL
>> stream coming from a master that doesn't think it has HS slaves.
>> We can't afford to record all that extra stuff in installations for
>> which it's just u
On Fri, Feb 26, 2010 at 9:44 PM, Tom Lane wrote:
> Greg Stark writes:
>
>> What extra entries?
>
> Locks, just for starters. I haven't read enough of the code yet to know
> what else Simon added. In the past it's not been necessary to record
> any transient information in WAL, but now we'll hav
> That is exactly the core idea I was trying to suggest in my rambling
> message. Just that small additional bit of information transmitted and
> published to the master via that route, and it's possible to optimize
> this problem in a way not available now. And it's a way that I believe
> will
Tom Lane wrote:
I don't see a "substantial additional burden" there. What I would
imagine is needed is that the slave transmits a single number back
--- its current oldest xmin --- and the walsender process publishes
that number as its transaction xmin in its PGPROC entry on the master.
Tha
On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane wrote:
> There's *definitely* not going to be enough information in the WAL
> stream coming from a master that doesn't think it has HS slaves.
> We can't afford to record all that extra stuff in installations for
> which it's just useless overhead. BTW, h
On Fri, Feb 26, 2010 at 8:30 PM, Tom Lane wrote:
> How's it going to do that, when it has no queries at the instant
> of startup?
>
Why shouldn't it have any queries at walreceiver startup? It has any
xlog segments that were copied from the master and any it can find in
the archive, it could easi
* Greg Stark [100226 15:10]:
> On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane wrote:
> > I don't see a "substantial additional burden" there. What I would
> > imagine is needed is that the slave transmits a single number back
> > --- its current oldest xmin --- and the walsender process publishes
> >
On Fri, 2010-02-26 at 12:02 -0800, Josh Berkus wrote:
> > I don't see a "substantial additional burden" there. What I would
> > imagine is needed is that the slave transmits a single number back
> > --- its current oldest xmin --- and the walsender process publishes
> > that number as its transact
Heikki Linnakangas writes:
> I don't actually understand how tight synchronization on its own would
> solve the problem. What if the connection to the master is lost? Do you
> kill all queries in the standby before reconnecting?
Sure. So what? They'd have been killed if they individually lost
c
On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane wrote:
> I don't see a "substantial additional burden" there. What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in
Tom Lane wrote:
> Josh Berkus writes:
>> On 2/26/10 10:53 AM, Tom Lane wrote:
>>> I think that what we are going to have to do before we can ship 9.0
>>> is rip all of that stuff out and replace it with the sort of closed-loop
>>> synchronization Greg Smith is pushing. It will probably be several
> I don't see a "substantial additional burden" there. What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.
If the main purp
Josh Berkus writes:
> On 2/26/10 10:53 AM, Tom Lane wrote:
>> I think that what we are going to have to do before we can ship 9.0
>> is rip all of that stuff out and replace it with the sort of closed-loop
>> synchronization Greg Smith is pushing. It will probably be several
>> months before ever
On 2/26/10 10:53 AM, Tom Lane wrote:
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing. It will probably be several
> months before everyone is forced to accept th
Tom Lane wrote:
> I'm going to make an unvarnished assertion here. I believe that the
> notion of synchronizing the WAL stream against slave queries is
> fundamentally wrong and we will never be able to make it work.
> The information needed isn't available in the log stream and can't be
> made av
Greg Stark writes:
> In the model you describe any long-lived queries on the slave cause
> tables in the master to bloat with dead records.
Yup, same as they would do on the master.
> I think this model is on the roadmap but it's not appropriate for
> everyone and I think one of the benefits of
On Fri, Feb 26, 2010 at 4:43 PM, Richard Huxton wrote:
> Let's see if I've got the concepts clear here, and hopefully my thinking it
> through will help others reading the archives.
>
> There are two queues:
I don't see two queues. I only see the one queue of operations which
have been executed o
On Fri, Feb 26, 2010 at 8:33 AM, Greg Smith wrote:
>
> I'm not sure what you might be expecting from the above combination, but
> what actually happens is that many of the SELECT statements on the table
> *that isn't even being updated* are canceled. You see this in the logs:
Well I proposed tha
68 matches
Mail list logo