On Friday, June 07, 2013 2:10 AM Noah Misch wrote:
> On Thu, Jun 06, 2013 at 07:02:27PM +0530, Amit Kapila wrote:
> > On Tuesday, June 04, 2013 12:37 AM Noah Misch wrote:
>
> > This patch can give good performance gain in the scenario described
> by you.
> > Infact I had taken the readings with pa
On Thu, Jun 6, 2013 at 9:30 PM, Jeff Janes wrote:
> I would oppose that as the solution, either an unconditional one, or
> configurable with is it as the default. Those segments are not unneeded. I
> need them. That is why I set up archiving in the first place. If you need
> to shut down the d
On 06/06/2013 09:30 PM, Jeff Janes wrote:
Archiving
-
In some ways, this is the simplest case. Really, we just need a way to
know when the available WAL space has become 90% full, and abort
archiving at that stage. Once we stop attempting to archive, we can
cl
On Thursday, June 6, 2013, Josh Berkus wrote:
> Let's talk failure cases.
>
> There's actually three potential failure cases here:
>
> - One Volume: WAL is on the same volume as PGDATA, and that volume is
> completely out of space.
>
> - XLog Partition: WAL is on its own partition/volume, and fill
On Thursday, June 06, 2013 10:22 PM Robert Haas wrote:
> On Wed, Jun 5, 2013 at 7:24 AM, Amit Kapila
> wrote:
> > On Monday, May 27, 2013 4:17 PM Amit Kapila wrote:
> >> On Wednesday, April 03, 2013 11:55 AM Amit Kapila wote:
> >> > On Tuesday, April 02, 2013 9:49 PM Peter Eisentraut wrote:
> >>
>
On 6/6/13 4:41 AM, Heikki Linnakangas wrote:
I was thinking of letting the estimate
decrease like a moving average, but react to any increases immediately.
Same thing we do in bgwriter to track buffer allocations:
Combine what your submitted patch does and this idea, and you'll have
something
On 6/6/13 4:42 AM, Joshua D. Drake wrote:
On 6/6/2013 1:11 AM, Heikki Linnakangas wrote:
(I'm sure you know this, but:) If you perform a checkpoint as fast and
short as possible, the sudden burst of writes and fsyncs will
overwhelm the I/O subsystem, and slow down queries. That's what we saw
b
Greg Stark writes:
> On Thu, Jun 6, 2013 at 10:46 PM, Tom Lane wrote:
>> : This rule guarantees that tuples on page M will have no children on page N,
>> : since (M+1) mod 3 != N mod 3.
> Even if the invariant was maintained why doesn't that just mean you
> need three concurrent inserts to creat
On Wed, 2013-06-05 at 08:43 +0800, amul sul wrote:
> Just want to ask, what exactly you want, is like this
> 1. you want to create symbolic -ref as _git symbolic-ref
> "refs/heads/REL9_3_STABLE" "refs/heads/master"_
> 2. which will show in _git branch_ as
> REL9_3_STABLE -> master
> * master
>
On Thu, Jun 6, 2013 at 4:28 PM, Christian Ullrich wrote:
> * Heikki Linnakangas wrote:
>
>> The current situation is that if you run out of disk space while writing
>> WAL, you get a PANIC, and the server shuts down. That's awful. We can
>
>
>> So we need to somehow stop new WAL insertions from ha
Let's talk failure cases.
There's actually three potential failure cases here:
- One Volume: WAL is on the same volume as PGDATA, and that volume is
completely out of space.
- XLog Partition: WAL is on its own partition/volume, and fills it up.
- Archiving: archiving is failing or too slow, cau
>> Given the behavior of xlog, I'd want to adjust the
>> algo so that peak usage on a 24-hour basis would affect current
>> preallocation. That is, if a site regularly has a peak from 2-3pm where
>> they're using 180 segments/cycle, then they should still be somewhat
>> higher at 2am than a datab
Given the recent ideas being thrown about changing how freezing and
clog is handled and MVCC catalog access I thought I would write out
the ideas that I have had about speeding up snapshots in case there is
an interesting tie in with the current discussions.
To refresh your memory the basic idea i
On Thu, Jun 6, 2013 at 10:38 PM, Andres Freund wrote:
> That's not a bad technique. I wonder how reliable it would be in
> postgres. Do all filesystems allow a rename() to succeed if there isn't
> actually any space left? E.g. on btrfs I wouldn't be sure. We need to
> rename because WAL files nee
On Thu, Jun 6, 2013 at 10:46 PM, Tom Lane wrote:
> To prevent
> : deadlocks we introduce a concept of "triple parity" of pages: if inner tuple
> : is on page with BlockNumber N, then its child tuples should be placed on the
> : same page, or else on a page with BlockNumber M where (N+1) mod 3 ==
On Thu, Jun 6, 2013 at 1:39 PM, Heikki Linnakangas
wrote:
> That will keep OldestXmin from advancing. Which will keep vacuum from
> advancing relfrozenxid/datfrozenxid. Which will first trigger the warnings
> about wrap-around, then stops new XIDs from being generated, and finally a
> forced shutd
I've been looking into the problem reported at
http://www.postgresql.org/message-id/519a5917.40...@qunar.com
and what I find is that we have spgist insertion operations deadlocking
against each other because one is descending from page A to page B while
the other descends from page B to page A. Ac
On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote:
> * Heikki Linnakangas wrote:
>
> >The current situation is that if you run out of disk space while writing
> >WAL, you get a PANIC, and the server shuts down. That's awful. We can
>
> >So we need to somehow stop new WAL insertions from happe
On Mon, Jun 3, 2013 at 5:03 AM, Craig Ringer wrote:
> ->
> "I'll whack in some manual VACUUM cron jobs during low load maintenance
> hours and hope that keeps the worst of the problem away, that's what
> random forum posts on the Internet say to do".
> -> "oh my, why did my DB just do an emergenc
* Heikki Linnakangas wrote:
The current situation is that if you run out of disk space while writing
WAL, you get a PANIC, and the server shuts down. That's awful. We can
So we need to somehow stop new WAL insertions from happening, before
it's too late.
A naive idea is to check if there's
On 2013-06-06 12:34:01 -0700, Jeff Janes wrote:
> On Fri, May 24, 2013 at 11:51 AM, Greg Smith wrote:
>
> > On 5/24/13 9:21 AM, Robert Haas wrote:
> >
> > But I wonder if we wouldn't be better off coming up with a little more
> >> user-friendly API. Instead of exposing a cost delay, a cost limi
On Mon, Jun 3, 2013 at 6:34 AM, Kevin Grittner wrote:
>
>
> Where I hit a nightmare scenario with an anti-wraparound
> autovacuum, personally, was after an upgrade using pg_dump piped to
> psql. At a high OLTP transaction load time (obviously the most
> likely time for it to kick in, because it
On Thu, Jun 06, 2013 at 07:02:27PM +0530, Amit Kapila wrote:
> On Tuesday, June 04, 2013 12:37 AM Noah Misch wrote:
> This patch can give good performance gain in the scenario described by you.
> Infact I had taken the readings with patch, it shows similar gain.
Thanks for testing.
> This patch
On Mon, Jun 3, 2013 at 5:03 AM, Craig Ringer wrote:
> On 06/02/2013 05:56 AM, Robert Haas wrote:
>
> > (b) users
> > making ridiculous settings changes to avoid the problems caused by
> > anti-wraparound vacuums kicking in at inconvenient times and eating up
> > too many resources.
>
> Some rec
On 06.06.2013 20:24, Josh Berkus wrote:
Yeah, something like that :-). I was thinking of letting the estimate
decrease like a moving average, but react to any increases immediately.
Same thing we do in bgwriter to track buffer allocations:
Seems reasonable.
Here's a patch implementing that. D
On Thu, Jun 6, 2013 at 3:34 PM, Jeff Janes wrote:
> On Fri, May 24, 2013 at 11:51 AM, Greg Smith wrote:
>>
>> On 5/24/13 9:21 AM, Robert Haas wrote:
>>
>>> But I wonder if we wouldn't be better off coming up with a little more
>>> user-friendly API. Instead of exposing a cost delay, a cost limit
On Thu, Jun 6, 2013 at 1:42 AM, Joshua D. Drake wrote:
>
>
> I may be confused but it is my understanding that bgwriter writes out the
> data from the shared buffer cache that is dirty based on an interval and a
> max pages written.
It primarily writes out based on how many buffers have recently
On Fri, May 24, 2013 at 11:51 AM, Greg Smith wrote:
> On 5/24/13 9:21 AM, Robert Haas wrote:
>
> But I wonder if we wouldn't be better off coming up with a little more
>> user-friendly API. Instead of exposing a cost delay, a cost limit,
>> and various charges, perhaps we should just provide li
On 18 September 2012 10:32, Hitoshi Harada wrote:
> As wiki says, BERNOULLI relies on the statistics of the table, which
> doesn't sound good to me. Of course we could say this is our
> restriction and say good-bye to users who hadn't run ANALYZE first,
> but it is too hard for a normal users to
On 6/5/13 3:49 PM, Robert Haas wrote:
Now, I did find a couple that I thought should probably stick with
SnapshotNow, specifically pgrowlocks and pgstattuple.
FWIW, I've often wished for a way to make all stat access transactional, across all the
stats views. Perhaps that couldn't be done by d
On Wed, Jun 5, 2013 at 8:20 PM, Joshua D. Drake wrote:
>
> On 06/05/2013 05:37 PM, Robert Haas wrote:
>
> - If it looks like we're going to exceed limit #3 before the
>> checkpoint completes, we start exerting back-pressure on writers by
>> making them wait every time they write WAL, probably in
>> Then I suggest we not use exactly that name. I feel quite sure we
>> would get complaints from people if something labeled as "max" was
>> exceeded -- especially if they set that to the actual size of a
>> filesystem dedicated to WAL files.
>
> You're probably right. Any suggestions for a bet
According to http://www.postgresql.org/docs/9.2/static/libpq-async.html
"Even when PQresultStatus indicates a fatal error, PQgetResult should be
called
until it returns a null pointer, to allow libpq to process the error
information completely."
In libpq/fe-exec.c:PQexecFinish() error messages merg
Daniel,
So your suggestion is that if archiving is falling behind, we should
introduce delays on COMMIT in order to slow down the rate of WAL writing?
Just so I'm clear.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgres
On Wed, Jun 5, 2013 at 10:46 AM, Andrew Dunstan wrote:
> In 9.2, the JSON parser didn't check the validity of the use of unicode
> escapes other than that it required 4 hex digits to follow '\u'. In 9.3,
> that is still the case. However, the JSON accessor functions and operators
> also try to tur
On Wed, Jun 5, 2013 at 7:24 AM, Amit Kapila wrote:
> On Monday, May 27, 2013 4:17 PM Amit Kapila wrote:
>> On Wednesday, April 03, 2013 11:55 AM Amit Kapila wote:
>> > On Tuesday, April 02, 2013 9:49 PM Peter Eisentraut wrote:
>>
>
> There are 2 options to proceed for this patch for 9.4
>
> 1. Upl
On Thu, Jun 6, 2013 at 5:30 AM, Andres Freund wrote:
>> + * XXX: Now that we have MVCC catalog access, the reasoning above is no
>> longer
>> + * true. Are there other good reasons to hard-code this, or should we
>> revisit
>> + * that decision?
>
> We could just the function by looking in the
2013/6/6 Thom Brown :
> Hi,
>
> When a statement is cancelled due to it running for long enough for
> statement_timeout to take effect, it logs a message:
>
> ERROR: canceling statement due to statement timeout
>
> However, it doesn't log what the timeout was at the time of the
> cancellation. Th
On Tue, Jun 4, 2013 at 2:50 PM, Kohei KaiGai wrote:
> Also, I don't think ExecNodeExtender is not a good naming, because it
> is a bit long and
> abbreviation (ENE?) is hard to imagine the feature. Please give this
> feature a cool and
> well understandable name.
I agree that "Extender" doesn't s
On Mon, May 27, 2013 at 10:32 AM, Christopher Browne wrote:
> On Mon, May 27, 2013 at 1:42 AM, Gurjeet Singh wrote:
>
>>
>>
>>> Joking about "640K" aside, it doesn't seem reasonable to expect a truly
>>> enormous query as is generated by the broken forms of this logic to turn
>>> out happily. I'
Hi,
When a statement is cancelled due to it running for long enough for
statement_timeout to take effect, it logs a message:
ERROR: canceling statement due to statement timeout
However, it doesn't log what the timeout was at the time of the
cancellation. This may be set in postgresql.conf, the
On 06.06.2013 17:17, Andres Freund wrote:
On 2013-06-06 17:00:30 +0300, Heikki Linnakangas wrote:
A more workable idea is to sprinkle checks in higher-level code, before you
hold any critical locks, to check that there is enough preallocated WAL.
Like, at the beginning of heap_insert, heap_updat
On 2013-06-06 10:22:14 -0400, Robert Haas wrote:
> On Thu, May 30, 2013 at 2:29 AM, Andres Freund wrote:
> >> Yeah, I think it's fine. The patch also looks fine, although I think
> >> the comments could use a bit of tidying. I guess we need to
> >> back-patch this all the way back to 8.4? It wi
Bruce Momjian wrote:
> In a private bug report, I have realized that if you are eventually
> going to be using link mode with pg_upgrade, and you run --check mode,
> you should use --link with --check to check that both clusters are on
> the same file system.
Would it make sense to run the filesys
On Thu, May 30, 2013 at 2:29 AM, Andres Freund wrote:
>> Yeah, I think it's fine. The patch also looks fine, although I think
>> the comments could use a bit of tidying. I guess we need to
>> back-patch this all the way back to 8.4? It will require some
>> adjustments for the older branches.
>
On 2013-06-06 17:00:30 +0300, Heikki Linnakangas wrote:
> A more workable idea is to sprinkle checks in higher-level code, before you
> hold any critical locks, to check that there is enough preallocated WAL.
> Like, at the beginning of heap_insert, heap_update, etc., and all similar
> indexam entr
In a private bug report, I have realized that if you are eventually
going to be using link mode with pg_upgrade, and you run --check mode,
you should use --link with --check to check that both clusters are on
the same file system.
I have documented this with the attached, applied patch, and backpa
In the "Redesigning checkpoint_segments" thread, many people opined that
there should be a hard limit on the amount of disk space used for WAL:
http://www.postgresql.org/message-id/CA+TgmoaOkgZb5YsmQeMg8ZVqWMtR=6s4-ppd+6jiy4oq78i...@mail.gmail.com.
I'm starting a new thread on that, because that
On Tuesday, June 04, 2013 12:37 AM Noah Misch wrote:
> A colleague, Korry Douglas, observed a table partitioning scenario
> where deserializing pg_constraint.ccbin is a hot spot. The following
> test case, a simplification of a typical partitioning setup, spends 28%
> of its time in
> stringToNode
On 06/05/2013 09:13:45 PM, Peter Eisentraut wrote:
> On Tue, 2013-06-04 at 22:27 -0500, Karl O. Pinc wrote:
> > On 06/04/2013 10:16:20 PM, Peter Eisentraut wrote:
> > > On Tue, 2013-05-07 at 23:18 -0400, Alvaro Herrera wrote:
> > > > Peter Eisentraut wrote:
> > > > > On Tue, 2013-05-07 at 00:32 -05
On 06.06.2013 15:31, Kevin Grittner wrote:
Heikki Linnakangas wrote:
On 05.06.2013 22:18, Kevin Grittner wrote:
Heikki Linnakangas wrote:
I was not thinking of making it a hard limit. It would be just
like checkpoint_segments from that point of view - if a
checkpoint takes a long time, max
On 06.06.2013 15:16, Greg Stark wrote:
On Fri, May 31, 2013 at 3:04 AM, Robert Haas wrote:
Even at a more modest 10,000 tps, with default
settings, you'll do anti-wraparound vacuums of the entire cluster
about every 8 hours. That's not fun.
I've forgotten now. What happens if you have a long
Heikki Linnakangas wrote:
> On 05.06.2013 22:18, Kevin Grittner wrote:
>> Heikki Linnakangas wrote:
>>
>>> I was not thinking of making it a hard limit. It would be just
>>> like checkpoint_segments from that point of view - if a
>>> checkpoint takes a long time, max_wal_size might still be
>>> e
On Fri, May 31, 2013 at 3:04 AM, Robert Haas wrote:
> Even at a more modest 10,000 tps, with default
> settings, you'll do anti-wraparound vacuums of the entire cluster
> about every 8 hours. That's not fun.
I've forgotten now. What happens if you have a long-lived transaction
still alive from >
On 05.06.2013 22:24, Fujii Masao wrote:
On Thu, Jun 6, 2013 at 3:35 AM, Heikki Linnakangas
wrote:
The checkpoint spreading code already tracks if the checkpoint is "on
schedule", and it takes into account both checkpoint_timeout and
checkpoint_segments. Ie. if you consume segments faster than
On 05.06.2013 22:18, Kevin Grittner wrote:
Heikki Linnakangas wrote:
I was not thinking of making it a hard limit. It would be just
like checkpoint_segments from that point of view - if a
checkpoint takes a long time, max_wal_size might still be
exceeded.
Then I suggest we not use exactly th
On 06.06.2013 11:42, Joshua D. Drake wrote:
On 6/6/2013 1:11 AM, Heikki Linnakangas wrote:
Yes checkpoint_segments is awkward. We shouldn't have to set it at all.
It should be gone.
The point of having checkpoint_segments or max_wal_size is to put a
limit (albeit a soft one) on the amount of d
2013/6/6 Tatsuo Ishii
> > Hi.
> >
> > At the moment libpq doesn't seem to support asynchronous and
> > non-blocking support for large objects, in the style of
> > PQsendQuery/PQgetResult. This makes large objects hardly suited for
> > single-threaded programs based on some variant of select().
>
Hi Robert,
Took a quick look through the patch to understand what your current
revision is actually doing and to facilitate thinking about possible
pain points.
Here are the notes I made during my reading:
On 2013-06-03 14:57:12 -0400, Robert Haas wrote:
> +++ b/src/backend/catalog/catalog.c
> @
On 6/6/2013 1:11 AM, Heikki Linnakangas wrote:
(I'm sure you know this, but:) If you perform a checkpoint as fast and
short as possible, the sudden burst of writes and fsyncs will
overwhelm the I/O subsystem, and slow down queries. That's what we saw
before spread checkpoints: when a checkpo
On 05.06.2013 23:16, Josh Berkus wrote:
For limiting the time required to recover after crash,
checkpoint_segments is awkward because it's difficult to calculate how
long recovery will take, given checkpoint_segments=X. A bulk load can
use up segments really fast, and recovery will be fast, while
On 2013-06-05 18:56:28 -0400, Tom Lane wrote:
> Robert Haas writes:
> > Now, I did find a couple that I thought should probably stick with
> > SnapshotNow, specifically pgrowlocks and pgstattuple. Those are just
> > gathering statistical information, so there's no harm in having the
> > snapshot c
On 06.06.2013 06:20, Joshua D. Drake wrote:
3. The spread checkpoints have always confused me. If anything we want a
checkpoint to be fast and short because:
(I'm sure you know this, but:) If you perform a checkpoint as fast and
short as possible, the sudden burst of writes and fsyncs will ove
On Jun 6, 2013 4:14 AM, "Peter Eisentraut" wrote:
>
> On Tue, 2013-06-04 at 22:27 -0500, Karl O. Pinc wrote:
> > On 06/04/2013 10:16:20 PM, Peter Eisentraut wrote:
> > > On Tue, 2013-05-07 at 23:18 -0400, Alvaro Herrera wrote:
> > > > Peter Eisentraut wrote:
> > > > > On Tue, 2013-05-07 at 00:32 -
On 6/5/2013 11:31 PM, Peter Geoghegan wrote:
On Wed, Jun 5, 2013 at 11:28 PM, Joshua D. Drake wrote:
I have zero doubt that in your case it is true and desirable. I just don't
know that it is a positive solution to the problem as a whole. Your case is
rather limited to your environment, which
On 6/5/2013 11:25 PM, Harold Giménez wrote:
Instead of "running out of disk space PANIC" we should just write
to an emergency location within PGDATA
This merely buys you some time, but with aggressive and sustained
write throughput you are left on the same spot. Practically speaking
On 5 June 2013 08:59, Dean Rasheed wrote:
> I'm still not happy with pg_view_is_updatable() et al. and the
> information_schema views. I accept that the information_schema views
> have to be the way they are because that's what's defined in the
> standard, but as it stands, the distinction between
On Tuesday, May 28, 2013 6:54 PM Robert Haas wrote:
> >> Instead, I suggest modifying BgBufferSync, specifically this part
> right
> >> here:
> >>
> >> else if (buffer_state & BUF_REUSABLE)
> >> reusable_buffers++;
> >>
> >> What I would suggest is that if the BUF_REUSABLE flag
68 matches
Mail list logo