Re: [GENERAL] checkpoint and recovering process use too much memory

2017-11-05 Thread tao tony
Thank you,  Justin Pryzby. I reset shared_buffer to 16GB,and the  memory usage of  checkpoint and recovering just stayed at 16GB.   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND 192956 postgres  20   0 18.5g  16g  16g S  1.3 25.9  19:44.69 postgres: startup process   recover

Re: [GENERAL] checkpoint and recovering process use too much memory

2017-11-02 Thread Andres Freund
Hi, On 2017-11-03 01:43:32 +, tao tony wrote: > I had an asynchronous steaming replication HA cluster.Each node had 64G > memory.pg is 9.6.2 and deployed on centos 6. > > > Last month the database was killed by OS kernel for OOM,the checkpoint > process was killed. > > > I noticed checkp

Re: [GENERAL] checkpoint and recovering process use too much memory

2017-11-02 Thread Justin Pryzby
On Fri, Nov 03, 2017 at 01:43:32AM +, tao tony wrote: > I had an asynchronous steaming replication HA cluster.Each node had 64G > memory.pg is 9.6.2 and deployed on centos 6. > > Last month the database was killed by OS kernel for OOM,the checkpoint > process was killed. If you still have l

[GENERAL] checkpoint and recovering process use too much memory

2017-11-02 Thread tao tony
hi dears, I had an asynchronous steaming replication HA cluster.Each node had 64G memory.pg is 9.6.2 and deployed on centos 6. Last month the database was killed by OS kernel for OOM,the checkpoint process was killed. I noticed checkpoint process occupied memory for more than 20GB,and it wa

Re: [GENERAL] Checkpoint write time - anything unusual?

2017-10-03 Thread Laurenz Albe
pinker wrote: > I've just run pgBadger on my pg logs and wonder if those checkpoint > statistics is something I should worry about or not? > The highest write time is about 47 minutes but I'm not sure if that's > checkpoint_completion_target*checkpoint_target value or real time between > sending th

[GENERAL] Checkpoint write time - anything unusual?

2017-10-02 Thread pinker
I've just run pgBadger on my pg logs and wonder if those checkpoint statistics is something I should worry about or not? The highest write time is about 47 minutes but I'm not sure if that's checkpoint_completion_target*checkpoint_target value or real time between sending the command to write and g

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
> > Hi, I had already read that doc but I can't answer clearly to my >> questions 2,4 and 5. >> > > The answer would seem to depend on what you consider 'a consistency state > position'. Is it possible to be more explicit about what you mean? > >> >> Hi, I meant a position such that, if you replay

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
> > > Hi, > > so let's suppose that the WAL is: > > LSN 10: start transaction 123 > > LSN 11: update tuple 100 > >checkpoint position here (not a record but just for understanding) > > LSN 12: update tuple 100 > > LSN 13: update tuple 100 > > LSN 14: checkpoint record ( postion=11) > > LSN 15:

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Alvaro Herrera
Tom DalPozzo wrote: > Hi, > so let's suppose that the WAL is: > LSN 10: start transaction 123 > LSN 11: update tuple 100 >checkpoint position here (not a record but just for understanding) > LSN 12: update tuple 100 > LSN 13: update tuple 100 > LSN 14: checkpoint record ( postion=11) > LSN 15:

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
> > Whether any individual tuple in the data files is visible or not depends > not only on the data itself, but also on the commit status of the > transactions that created it (and deleted it, if any). Replaying WAL > also updates the commit status of transactions, so if you're in the > middle of

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Alvaro Herrera
Tom DalPozzo wrote: > 2) I see that a checkpoint position can be right in the middle of a group > of records related to a transaction (in the example, transaction id 10684). > So a checkpoint position is NOT a consistency state point, right? > 4) If I'm right at 2) then, between the checkpoint po

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Adrian Klaver
On 01/09/2017 01:10 PM, Tom DalPozzo wrote: Reread your original post and realized you where also asking about transaction consistency and WALs. The thumbnail version is that Postgres writes transactions to the WALs before they are written to the data files on disk

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
> > Reread your original post and realized you where also asking about >> transaction consistency and WALs. The thumbnail version is that Postgres >> writes transactions to the WALs before they are written to the data files >> on disk. A checkpoint represents a point in the sequence when is is know

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Adrian Klaver
On 01/09/2017 06:47 AM, Tom DalPozzo wrote: https://www.postgresql.org/docs/9.5/static/wal-internals.html "After a checkpoint has been made and the log flushed, the checkpoint's position is saved in the file

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Adrian Klaver
On 01/09/2017 06:47 AM, Tom DalPozzo wrote: https://www.postgresql.org/docs/9.5/static/wal-internals.html "After a checkpoint has been made and the log flushed, the checkpoint's position is saved in the file

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
> > https://www.postgresql.org/docs/9.5/static/wal-internals.html >> > > "After a checkpoint has been made and the log flushed, the checkpoint's > position is saved in the file pg_control. Therefore, at the start of > recovery, the server first reads pg_control and then the checkpoint record; > the

Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Adrian Klaver
On 01/09/2017 06:14 AM, Tom DalPozzo wrote: Hi, I need some clarifications about checkpoints. Below here a log from my standby server when started and then some parts of the interested WAL in the master's cluster obtained by pg_xlogdump. Just to have an example to talk on. 1) I see: "LOG: redo

[GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
Hi, I need some clarifications about checkpoints. Below here a log from my standby server when started and then some parts of the interested WAL in the master's cluster obtained by pg_xlogdump. Just to have an example to talk on. 1) I see: "LOG: redo starts at 1/F00A7448" . I was expecting a che

Re: [GENERAL] checkpoint write errors ( getting worse )

2016-10-24 Thread CS DBA
Understood, thanks. This is a new server fired up for our client by Rackspace Not real impressed so far, for the first several days we had major performance issues even thought new new HW had more memory and more/faster CPU's and faster IO - turned out rackspace had turned on cpu throttling l

Re: [GENERAL] checkpoint write errors ( getting worse )

2016-10-23 Thread Michael Paquier
On Sun, Oct 23, 2016 at 12:45 PM, CS DBA wrote: > would a dump/restore correct these issues? Not directly, but it would give a logical representation of your data, or a good start image that you could deploy on a server that has less problems. You seem to be facing advanced issues with your hardw

Re: [GENERAL] checkpoint write errors ( getting worse )

2016-10-22 Thread CS DBA
also, any thoughts on what could be causing these issues? On 10/22/2016 05:59 PM, Tom Lane wrote: CS DBA writes: So I ran REINDEX on all the db's and the errors went away for a bit. Now I'm seeing this: Log entries like this:FATAL: could not read block 0 of relation base/1311892067/2687: rea

Re: [GENERAL] checkpoint write errors ( getting worse )

2016-10-22 Thread CS DBA
would a dump/restore correct these issues? On 10/22/2016 05:59 PM, Tom Lane wrote: CS DBA writes: So I ran REINDEX on all the db's and the errors went away for a bit. Now I'm seeing this: Log entries like this:FATAL: could not read block 0 of relation base/1311892067/2687: read only 0 of 81

Re: [GENERAL] checkpoint write errors ( getting worse )

2016-10-22 Thread Tom Lane
CS DBA writes: > So I ran REINDEX on all the db's and the errors went away for a bit. Now > I'm seeing this: > Log entries like this:FATAL: could not read block 0 of relation > base/1311892067/2687: read only 0 of 8192 bytes You have a problem there, because: regression=# select 2687::regcla

Re: [GENERAL] checkpoint write errors ( getting worse )

2016-10-22 Thread CS DBA
So I ran REINDEX on all the db's and the errors went away for a bit. Now I'm seeing this: Log entries like this:FATAL: could not read block 0 of relation base/1311892067/2687: read only 0 of 8192 bytes So I checked which db it is: $ psql -h localhost psql (8.4.20) Type "help" for help. p

Re: [GENERAL] checkpoint write errors

2016-10-22 Thread CS DBA
Thanks the REINDEX fixed it, it's a client of ours and we're pushing to get them to move to 9.5 On 10/21/2016 06:33 PM, Tom Lane wrote: CS DBA writes: we're seeing the below errors over and over in the logs of one of our postgres databases. Version 8.4.22 [ you really oughta get off 8.4, b

Re: [GENERAL] checkpoint write errors

2016-10-21 Thread Tom Lane
CS DBA writes: > we're seeing the below errors over and over in the logs of one of our > postgres databases. Version 8.4.22 [ you really oughta get off 8.4, but you knew that right? ] > Anyone have any thoughts on correcting/debugging it? > ERROR: xlog flush request 2571/9C141530 is not satis

[GENERAL] checkpoint write errors

2016-10-21 Thread CS DBA
Hi all; we're seeing the below errors over and over in the logs of one of our postgres databases. Version 8.4.22 Anyone have any thoughts on correcting/debugging it? Maybe I need to run a REINDEX on whatever table equates to "base/1029860192/1029863651"? If so how do I determine the db and

Re: [GENERAL] Checkpoint Err on Startup of Rsynced System

2016-06-01 Thread Jim Longwill
Jeff Janes, Ok. I checked this further and just found that the pg_xlog area is symlinked to another area.. and indeed that other area was not being rsynced (!) and I thought it was. So, I just fixed this, re-ran it and now it is working. Now I believe I have a stable postgres running on M2.

Re: [GENERAL] Checkpoint Err on Startup of Rsynced System

2016-06-01 Thread Jeff Janes
On Tue, May 31, 2016 at 10:13 AM, Jim Longwill wrote: > I am trying to setup a 2nd, identical, db server (M2) for development and > I've run into a problem with starting up the 2nd Postgres installation. > > Here's what I've done: > 1) did a 'clone' of 1st (production) machine M1 (so both machin

Re: [GENERAL] Checkpoint Err on Startup of Rsynced System

2016-05-31 Thread Jim Longwill
Scott, Thanks. If I understand you correctly.. Actually, we did have M1 shutdown when the inital clone was done (some weeks ago). That was done using the VMWare system, not rsync. My main problem is that I don't have WAL archiving setup yet (I've not changed the Postgres defaults on this

Re: [GENERAL] Checkpoint Err on Startup of Rsynced System

2016-05-31 Thread Venkata Balaji N
On Wed, Jun 1, 2016 at 3:13 AM, Jim Longwill wrote: > I am trying to setup a 2nd, identical, db server (M2) for development and > I've run into a problem with starting up the 2nd Postgres installation. > > Here's what I've done: > 1) did a 'clone' of 1st (production) machine M1 (so both machine

Re: [GENERAL] Checkpoint Err on Startup of Rsynced System

2016-05-31 Thread Alan Hodgson
On Tuesday, May 31, 2016 10:13:14 AM Jim Longwill wrote: > I am trying to setup a 2nd, identical, db server (M2) for development > and I've run into a problem with starting up the 2nd Postgres installation. > > Here's what I've done: >1) did a 'clone' of 1st (production) machine M1 (so both ma

Re: [GENERAL] Checkpoint Err on Startup of Rsynced System

2016-05-31 Thread Scott Mead
On Tue, May 31, 2016 at 1:13 PM, Jim Longwill wrote: > I am trying to setup a 2nd, identical, db server (M2) for development and > I've run into a problem with starting up the 2nd Postgres installation. > > Here's what I've done: > 1) did a 'clone' of 1st (production) machine M1 (so both machin

[GENERAL] Checkpoint Err on Startup of Rsynced System

2016-05-31 Thread Jim Longwill
I am trying to setup a 2nd, identical, db server (M2) for development and I've run into a problem with starting up the 2nd Postgres installation. Here's what I've done: 1) did a 'clone' of 1st (production) machine M1 (so both machines on Cent OS 7.2) 2) setup an rsync operation, did a comp

Re: [GENERAL] checkpoint

2014-07-10 Thread Yves Dorfsman
On 2014-07-10 13:02, Guillaume Lelarge wrote: > 2014-07-10 20:56 GMT+02:00 Yves Dorfsman >: > > > Hi, > > If I run checkpoint from psql, is it applied to all the databases? > > What if I do it though an API? When connecting with psycopg2, I'm forced > to >

Re: [GENERAL] checkpoint

2014-07-10 Thread Guillaume Lelarge
2014-07-10 20:56 GMT+02:00 Yves Dorfsman : > > Hi, > > If I run checkpoint from psql, is it applied to all the databases? > > What if I do it though an API? When connecting with psycopg2, I'm forced to > specify a database name, if I use "dbname=postgres", and execute > "checkpoint;", is it applie

[GENERAL] checkpoint

2014-07-10 Thread Yves Dorfsman
Hi, If I run checkpoint from psql, is it applied to all the databases? What if I do it though an API? When connecting with psycopg2, I'm forced to specify a database name, if I use "dbname=postgres", and execute "checkpoint;", is it applied to all the databases? Thanks. -- Yves. -- Sent vi

Re: [GENERAL] checkpoint logs

2011-09-07 Thread Tomas Vondra
On 7 Září 2011, 21:26, Martín Marqués wrote: > I'm logging checkpoints to see how the background writter is working, > and I bumped into log information that I don't fully understand: > > LOG: checkpoint complete: wrote 5015 buffers (15.1%); 0 transaction > log file(s) added, 0 removed, 15 recycle

[GENERAL] checkpoint logs

2011-09-07 Thread Martín Marqués
I'm logging checkpoints to see how the background writter is working, and I bumped into log information that I don't fully understand: LOG: checkpoint complete: wrote 5015 buffers (15.1%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=1004.333 s, sync=0.106 s, total=1004.571 s 5

Re: [GENERAL] checkpoint spikes

2010-06-14 Thread Janning
Hi Martijn, hi Greg, thanks you very much for your help. We finally got rid of these annoying spikes. First we tried to set checkpoint_segments = 3# before 16 checkpoint_timeout = 5min # before: 60min which didn't really help. we had the same spikes but more often. Then we tried to lo

Re: [GENERAL] checkpoint spikes

2010-06-11 Thread Martijn van Oosterhout
On Thu, Jun 10, 2010 at 04:00:54PM -0400, Greg Smith wrote: >> 5. Does anybody know if I can set dirty_background_ratio to 0.5? As we >> have 12 GB RAM and rather slow disks 0,5% would result in a maximum of >> 61MB dirty pages. > > Nope. Linux has absolutely terrible controls for this critic

Re: [GENERAL] checkpoint spikes

2010-06-11 Thread Greg Smith
Janning wrote: most docs I found relates to 8.2 and 8.3. In Things of checkpoints, is 8.4 comparable to 8.3? It would be nice if you update your article to reflect 8.4 There haven't been any changes made in this area since 8.3, that's why there's been no update. 8.4 and 9.0 have exactly the s

Re: [GENERAL] checkpoint spikes

2010-06-11 Thread Janning
On Thursday 10 June 2010 22:00:54 Greg Smith wrote: > Janning wrote: > > 1. With raising checkpoint_timeout, is there any downgrade other than > > slower after-crash recovery? > > Checkpoint spikes happen when too much I/O has been saved up for > checkpoint time than the server can handle. While t

Re: [GENERAL] checkpoint spikes

2010-06-10 Thread Greg Smith
Janning wrote: 1. With raising checkpoint_timeout, is there any downgrade other than slower after-crash recovery? Checkpoint spikes happen when too much I/O has been saved up for checkpoint time than the server can handle. While this is normally handled by the checkpoint spreading logic,

Re: [GENERAL] checkpoint spikes

2010-06-10 Thread Vick Khera
On Thu, Jun 10, 2010 at 12:49 PM, Janning wrote: > 1. With raising checkpoint_timeout, is there any downgrade other than slower > after-crash recovery? Depends on how busy your DB is, and how many checkpoint segments you have. All the timeout does is say, "if we have not done a checkpoint this l

Re: [GENERAL] checkpoint spikes

2010-06-10 Thread Janning
Hi again, nobody answered my question :-(, so i did some research. I was convinced to set: checkpoint_segments = 16 checkpoint_timeout = 60min and echo 2 > /proc/sys/vm/dirty_background_ratio this helped a lot. Our freeze time was reduced from 10 seconds to 5 seconds. But this is still way to

[GENERAL] checkpoint spikes

2010-06-09 Thread Janning
Hi, we currently encounter an increasing load on our website. With the increasing load we see some problems on our database. so we checked what happens and we saw spikes in our load when checkpoints are about to finish. Our configuration: max_connections = 125 ssl = false shared_buffers = 500M

Re: [GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Cory Isaacson
pg_xlog -rw--- postgres postgres root:object_r:postgresql_db_t 0001 drwx-- postgres postgres root:object_r:postgresql_db_t archive_status > From: Tom Lane > Date: Mon, 14 Sep 2009 12:09:48 -0400 > To: Cory Isaacson > Cc: > Subject: Re: [GENERAL] Che

Re: [GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Cory Isaacson
postgres: itt itt_dev 127.0.0.1(49593) idle postgres 26879 25752 0 00:22 ?00:00:00 postgres: itt itt_dev 127.0.0.1(49595) idle > From: Tom Lane > Date: Mon, 14 Sep 2009 10:29:54 -0400 > To: Cory Isaacson > Cc: > Subject: Re: [GENERAL] Checkpoint request failed, permission d

Re: [GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Tom Lane
Cory Isaacson writes: > Here are the permissions on pg_xlog: > drwx-- 3 postgres postgres 4096 Sep 13 22:19 pg_xlog Well, that certainly looks right. I'm back to suspecting selinux ... have you tried "ls -Z"? I'm not totally sure about RHEL5, but in recent Fedora it should look like drwx-

Re: [GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Cory Isaacson
To: Cory Isaacson > Cc: > Subject: Re: [GENERAL] Checkpoint request failed, permission denied > > Cory Isaacson writes: >> They look right to me. Below are the permissions and process list. I ended >> up rebuilding the data directory since it was just a test database, s

Re: [GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Scott Marlowe
On Mon, Sep 14, 2009 at 8:52 AM, Cory Isaacson wrote: > [r...@ittdev1 data]# ls -l pg_xlog > total 16416 > -rw--- 1 postgres postgres 16777216 Sep 13 23:16 > 0001 > drwx-- 2 postgres postgres     4096 Sep 13 22:19 archive_status What odes ls -ld pg_xlog say? -- Sent

Re: [GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Tom Lane
Cory Isaacson writes: > They look right to me. Below are the permissions and process list. I ended > up rebuilding the data directory since it was just a test database, so far > so good. The permissions and setup were exactly the same before I did this. > [r...@ittdev1 data]# ls -l pg_xlog This

Re: [GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Tom Lane
Cory Isaacson writes: > I think you may be right. There were some audit access denied messages. I > had SELinux in permissive mode, but its tricky to work with. > I generated a new SELinux rule using audit2allow, here is what it looks like > now. Do you think this is adequate? If you're keeping

Re: [GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Cory Isaacson
I should note that this came up when I tried to drop a database. It was not allowed with the checkpoint failed message. Cory From: Cory Isaacson Date: Sun, 13 Sep 2009 21:57:50 -0600 To: Subject: [GENERAL] Checkpoint request failed, permission denied When I try and manually perform a

Re: [GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Tom Lane
Cory Isaacson writes: > When I try and manually perform a checkpoint with version 8.3 on CentOS 5 I > get this error: > ERROR: could not link file "pg_xlog/0001" to > "pg_xlog/00010002" (initialization of log file 0, segment > 2): Permission denied > Any idea

[GENERAL] Checkpoint request failed, permission denied

2009-09-14 Thread Cory Isaacson
When I try and manually perform a checkpoint with version 8.3 on CentOS 5 I get this error: ERROR: could not link file "pg_xlog/0001" to "pg_xlog/00010002" (initialization of log file 0, segment 2): Permission denied ERROR: could not link file "pg_xlog/000

Re: [GENERAL] Checkpoint Tuning Question

2009-07-22 Thread tomrevam
Dan Armbrust wrote: > > All of my testing to date has been done with synchronous_commit=off > > I just tried setting full_page_writes=off - and like magic, the entire > hiccup went away. > Why is the full_page_write happening before the commit returns when synchronous_commit is set to off? I

Re: [GENERAL] Checkpoint Tuning Question

2009-07-20 Thread Dan Armbrust
On Mon, Jul 13, 2009 at 3:53 PM, Dan Armbrust wrote: >> So this thought leads to a couple of other things Dan could test. >> First, see if turning off full_page_writes makes the hiccup go away. >> If so, we know the problem is in this area (though still not exactly >> which reason); if not we need

Re: [GENERAL] Checkpoint Tuning Question

2009-07-14 Thread Dan Armbrust
> > Propose a DTrace probe immediately after the "goto begin" at line 740 of > xlog.c, so we can start tracing from the first backend following > checkpoint, and turn off tracing when all backends have completed a > transaction. > That's greek to me. But I'm happy to test things if you send me pa

Re: [GENERAL] Checkpoint Tuning Question

2009-07-14 Thread Simon Riggs
On Mon, 2009-07-13 at 15:53 -0500, Dan Armbrust wrote: > > So this thought leads to a couple of other things Dan could test. > > First, see if turning off full_page_writes makes the hiccup go away. > > If so, we know the problem is in this area (though still not exactly > > which reason); if not w

Re: [GENERAL] Checkpoint Tuning Question

2009-07-13 Thread Dan Armbrust
> So this thought leads to a couple of other things Dan could test. > First, see if turning off full_page_writes makes the hiccup go away. > If so, we know the problem is in this area (though still not exactly > which reason); if not we need another idea.  That's not a good permanent > fix though,

Re: [GENERAL] Checkpoint Tuning Question

2009-07-12 Thread Simon Riggs
On Sun, 2009-07-12 at 13:10 -0400, Tom Lane wrote: > It's hard to see how it could have continuing effects over several > seconds, especially in a system that has CPU to spare. Any queueing situation takes a while to resolve and over-damped systems can take a long time to resolve themselves. We

Re: [GENERAL] Checkpoint Tuning Question

2009-07-12 Thread Tom Lane
Simon Riggs writes: > This causes us to queue for the WALInsertLock twice at exactly the time > when every caller needs to calculate the CRC for complete blocks. So we > queue twice when the lock-hold-time is consistently high, causing queue > lengths to go ballistic. You keep saying that, and it

Re: [GENERAL] Checkpoint Tuning Question

2009-07-12 Thread Simon Riggs
On Fri, 2009-07-10 at 14:25 -0500, Dan Armbrust wrote: > > Hm, I'm not sure I believe any of that except the last bit, seeing that > > he's got plenty of excess CPU capability. But the last bit fits with > > the wimpy-I/O problem, and it also offers something we could test. > > Dan, please see wh

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Dan Armbrust
> Hm, I'm not sure I believe any of that except the last bit, seeing that > he's got plenty of excess CPU capability.  But the last bit fits with > the wimpy-I/O problem, and it also offers something we could test. > Dan, please see what happens when you vary the wal_buffers setting. > (Note you ne

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Tom Lane
Simon Riggs writes: > I think its a traffic jam. > After checkpoint in XLogInsert(), we discover that we now have to backup > a block that we didn't think so previously. So we have to drop the lock > and then re-access WALInsertLock. So every backend has to go through the > queue twice the first

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Simon Riggs
On Fri, 2009-07-10 at 10:27 -0400, Tom Lane wrote: > Simon Riggs writes: > > ISTM more likely to be a problem with checkpointing clog or subtrans. > > That would block everybody and the scale of the problem is about right. > > That's what I had been thinking too, but the log_checkpoint output >

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Tom Lane
Simon Riggs writes: > ISTM more likely to be a problem with checkpointing clog or subtrans. > That would block everybody and the scale of the problem is about right. That's what I had been thinking too, but the log_checkpoint output conclusively disproves it: those steps are taking less than 20ms

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Simon Riggs
On Wed, 2009-07-08 at 18:22 -0400, Tom Lane wrote: > As Greg commented upthread, we seem to be getting forced to the > conclusion that the initial buffer scan in BufferSync() is somehow > causing this. There are a couple of things it'd be useful to try > here: Not sure why you're forced to that

Re: [GENERAL] Checkpoint Tuning Question

2009-07-09 Thread Dan Armbrust
> As Greg commented upthread, we seem to be getting forced to the > conclusion that the initial buffer scan in BufferSync() is somehow > causing this.  There are a couple of things it'd be useful to try > here: > > * see how the size of the hiccup varies with shared_buffers; I tried decreasing sha

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
Dan Armbrust writes: > Almost all of the slow query log messages are logged within about 3 > seconds of the checkpoint starting message. > LOG: checkpoint complete: wrote 9975 buffers (77.9%); 0 transaction > log file(s) added, 0 removed, 15 recycled; write=156.576 s, sync=0.065 > s, total=156.6

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Dan Armbrust
> However, the latest report says that he > managed that, and yet there's still a one-or-two-second transient of > some sort.  I'm wondering what's causing that.  If it were at the *end* > of the checkpoint, it might be the disk again (failing to handle a bunch > of fsyncs, perhaps).  But if it rea

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Greg Smith
On Wed, 8 Jul 2009, Tom Lane wrote: He's only got 100MB of shared buffers, which doesn't seem like much considering it's apparently a fairly beefy system. I definitely don't see how one CPU spinning over the buffer headers in BufferSync is going to create the sort of hiccup he's describing. A

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
John R Pierce writes: > a beefy system with... >> Harddrive is just a simple, run-of-the-mill desktop drive. > which is going to severely limit random write throughput True, which is why he's having to flail so hard to keep the checkpoint from saturating his I/O. However, the latest report s

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread John R Pierce
Tom Lane wrote: He's only got 100MB of shared buffers, which doesn't seem like much considering it's apparently a fairly beefy system. a beefy system with... Harddrive is just a simple, run-of-the-mill desktop drive. which is going to severely limit random write throughput -- Se

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
Greg Smith writes: On Wed, 8 Jul 2009, Dan Armbrust wrote: >> What I observe now is that I get a short (1-2 second) period where I >> get slow queries - I'm running about 30 queries in parallel at any >> given time - it appears that all 30 queries get paused for a couple of >> seconds at the momen

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Greg Smith
On Wed, 8 Jul 2009, Dan Armbrust wrote: My takeaway is that starting the checkpoint process is really expensive - so I don't want to start it very frequently. And the only downside to longer intervals between checkpoints is a longer recovery time if the system crashes? And additional disk spa

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Greg Smith
On Wed, 8 Jul 2009, Dan Armbrust wrote: With checkpoint_segments set to 10, the checkpoints appear to be happening due to checkpoint_timeout - which I've left at the default of 5 minutes. OK, then that's as far upwards as you probably need to tweak that for your workload, even though most sys

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Dan Armbrust
>> Wouldn't increasing the length between checkpoints result in the >> checkpoint process taking even longer to complete? > > You don't really care how long it takes.  What you want is for it not to > be chewing a bigger fraction of your I/O bandwidth than you can spare. > Hence, you want it to tak

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
Dan Armbrust writes: > On Wed, Jul 8, 2009 at 1:23 PM, Tom Lane wrote: >> Well, you could increase both those settings so as to put the >> checkpoints further apart, and/or increase checkpoint_completion_target >> to spread the checkpoint I/O over a larger fraction of the cycle. > Wouldn't increa

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Dan Armbrust
On Wed, Jul 8, 2009 at 1:23 PM, Tom Lane wrote: > Dan Armbrust writes: >> With checkpoint_segments set to 10, the checkpoints appear to be >> happening due to checkpoint_timeout - which I've left at the default >> of 5 minutes. > > Well, you could increase both those settings so as to put the > ch

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Dan Armbrust
On Wed, Jul 8, 2009 at 12:50 PM, Tom Lane wrote: > Dan Armbrust writes: >> However, once the checkpoint process begins, I get a whole flood of >> queries that take between 1 and 10 seconds to complete.  My throughput >> crashes to near nothing.  The checkpoint takes between 45 seconds and >> a min

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
Dan Armbrust writes: > However, once the checkpoint process begins, I get a whole flood of > queries that take between 1 and 10 seconds to complete. My throughput > crashes to near nothing. The checkpoint takes between 45 seconds and > a minute to complete. You sure this is 8.3? It should spre

[GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Dan Armbrust
I'm running a steady state test where I am pushing about 600 queries per second through a Posgres 8.3 system on an 8 CPU Linux system. It's a mix of inserts, updates, and deletes on a few tables - the two biggest ones probably have about 200,000 rows. Harddrive is just a simple, run-of-the-mill de

Re: [GENERAL] Checkpoint segments too small

2007-08-06 Thread Magnus Hagander
On Mon, Aug 06, 2007 at 11:26:18AM +0200, Henrik Zagerholm wrote: > Hi list, > > I'm running 8.2.4 and I've started to get these messages and even > though I googled for some answers I couldn't find any good info > against the 8.2 code base. > > 2007-08-05 04:00:58.815 CEST LOG: checkpoints

[GENERAL] Checkpoint segments too small

2007-08-06 Thread Henrik Zagerholm
Hi list, I'm running 8.2.4 and I've started to get these messages and even though I googled for some answers I couldn't find any good info against the 8.2 code base. 2007-08-05 04:00:58.815 CEST LOG: checkpoints are occurring too frequently (17 seconds apart) 2007-08-05 04:00:58.815 CES

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-16 Thread Andrew Dunstan
Tom Lane wrote: Magnus Hagander <[EMAIL PROTECTED]> writes: And actually, when I look at the API docs, our case now seems to be documented. Or am I misreading our situation. I have: "If you call CreateFile on a file that is pending deletion as a result of a previous call to DeleteF

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-16 Thread Magnus Hagander
On Tue, Jan 16, 2007 at 11:11:59AM -0500, Tom Lane wrote: > Magnus Hagander <[EMAIL PROTECTED]> writes: > > And actually, when I look at the API docs, our case now seems to be > > documented. Or am I misreading our situation. I have: > > > "If you call CreateFile on a file that is pending deletion

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-16 Thread Tom Lane
Magnus Hagander <[EMAIL PROTECTED]> writes: > And actually, when I look at the API docs, our case now seems to be > documented. Or am I misreading our situation. I have: > "If you call CreateFile on a file that is pending deletion as a result > of a previous call to DeleteFile, the function fails.

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-16 Thread Magnus Hagander
On Tue, Jan 16, 2007 at 10:20:04AM +0900, Takayuki Tsunakawa wrote: > From: "Magnus Hagander" <[EMAIL PROTECTED]> > > But yeah, that's probably a good idea. A quick look at the code says > we > > should at least ask people who have this problem to give it a run > with > > logging at DEBUG5 which sh

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Tom Lane
"Takayuki Tsunakawa" <[EMAIL PROTECTED]> writes: > BTW, why does the bgwriter try to open and write the pages of already > dropped relations? It does not; the problem is with stale fsync requests. > If the relation being dropeed has > already been registered in the list of files to be fsynced, is

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Takayuki Tsunakawa
From: "Magnus Hagander" <[EMAIL PROTECTED]> > But yeah, that's probably a good idea. A quick look at the code says we > should at least ask people who have this problem to give it a run with > logging at DEBUG5 which should then log exactly what the errorcode was. > Or are you seeing more places th

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Tom Lane
Magnus Hagander <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> DEBUG5 is going to be a bit voluminous, but let's try that if we can. > Perhaps we should switch down the DEBUG level of it, at least until we > know what happens? That would have to wait on another update release, or at least someo

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Magnus Hagander
Tom Lane wrote: > Magnus Hagander <[EMAIL PROTECTED]> writes: >> But yeah, that's probably a good idea. A quick look at the code says we >> should at least ask people who have this problem to give it a run with >> logging at DEBUG5 which should then log exactly what the errorcode was. >> Or are you

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Tom Lane
Magnus Hagander <[EMAIL PROTECTED]> writes: > But yeah, that's probably a good idea. A quick look at the code says we > should at least ask people who have this problem to give it a run with > logging at DEBUG5 which should then log exactly what the errorcode was. > Or are you seeing more places th

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Magnus Hagander
Tom Lane wrote: > Magnus Hagander <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> pg_control is certainly not ever deleted or renamed, and in fact I >>> believe there's an LWLock enforcing that only one PG process at a time >>> is even touching it. So we need another theory to explain this one

Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-13 Thread Jim C. Nasby
On Thu, Jan 11, 2007 at 06:04:56PM -0500, Andrew Dunstan wrote: > Please don't. At least not on the PostgreSQL web site nor in the docs. > And no, I don't run my production servers on Windows either. > > For good or ill, we made a decision years ago to do a proper Windows > port. I think that it

Re: [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Tom Lane
Scott Ribe <[EMAIL PROTECTED]> writes: > Note when it happens, and if it doesn't succeed for some value of "too > long", at least escalate to ERROR message, possibly fail. ERROR and "fail" are the same thing. We could do this, and it wouldn't even be much code, but it doesn't seem to address the

Re: [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Scott Ribe
> Comments? Note when it happens, and if it doesn't succeed for some value of "too long", at least escalate to ERROR message, possibly fail. -- Scott Ribe [EMAIL PROTECTED] http://www.killerbytes.com/ (303) 722-0567 voice ---(end of broadcast)--

  1   2   >