from:"Fujii Masao"

Re: [GENERAL] Strange replication problem - segment restored from archive but still requested from master

2015-05-21 Thread Fujii Masao

On Fri, May 22, 2015 at 5:52 AM, Piotr Gasidło  wrote:
> Got strange problem. Unable to repeat, but got logs.
>
> Simple master-slave using streaming replication.
> Master is running. Slave is down.
> Segment 00044C4D0090 was successfully archived and send
> from master to slave.
>
> Now I've started slave, and:
>
> ay 21 21:23:37 d8 postgres[50645]: [3-1] 2015-05-21 21:23:37.033 CEST
> @ 50645   LOG:  database system was shut down in recovery at
> 2015-05-21 21:22:03 CEST
> May 21 21:23:37 d8 postgres[50645]: [4-1] 2015-05-21 21:23:37.034 CEST
> @ 50645   LOG:  entering standby mode
> May 21 21:23:37 d8 postgres[50645]: [5-1] 2015-05-21 21:23:37.058 CEST
> @ 50645   LOG:  restored log file "00044C4D0088" from
> archive
> May 21 21:23:37 d8 postgres[50645]: [6-1] 2015-05-21 21:23:37.120 CEST
> @ 50645   LOG:  redo starts at 4C4D/88493B50
> May 21 21:23:37 d8 postgres[50645]: [7-1] 2015-05-21 21:23:37.226 CEST
> @ 50645   LOG:  restored log file "00044C4D0089" from
> archive
> May 21 21:23:37 d8 postgres[50645]: [8-1] 2015-05-21 21:23:37.426 CEST
> @ 50645   LOG:  restored log file "00044C4D008A" from
> archive
> May 21 21:23:37 d8 postgres[50645]: [9-1] 2015-05-21 21:23:37.750 CEST
> @ 50645   LOG:  restored log file "00044C4D008B" from
> archive
> May 21 21:23:38 d8 postgres[50645]: [10-1] 2015-05-21 21:23:38.376
> CEST @ 50645   LOG:  restored log file "00044C4D008C" from
> archive
> May 21 21:23:38 d8 postgres[50645]: [11-1] 2015-05-21 21:23:38.690
> CEST @ 50645   LOG:  restored log file "00044C4D008D" from
> archive
> May 21 21:23:38 d8 postgres[50645]: [12-1] 2015-05-21 21:23:38.855
> CEST @ 50645   LOG:  restored log file "00044C4D008E" from
> archive
> May 21 21:23:39 d8 postgres[50645]: [13-1] 2015-05-21 21:23:39.275
> CEST @ 50645   LOG:  restored log file "00044C4D008F" from
> archive
> May 21 21:23:39 d8 postgres[50645]: [14-1] 2015-05-21 21:23:39.654
> CEST @ 50645   LOG:  restored log file "00044C4D0090" from
> archive
> May 21 21:23:40 d8 postgres[50645]: [15-1] 2015-05-21 21:23:40.222
> CEST @ 50645   LOG:  consistent recovery state reached at
> 4C4D/90FFF9C8
> May 21 21:23:40 d8 postgres[50644]: [3-1] 2015-05-21 21:23:40.222 CEST
> @ 50644   LOG:  database system is ready to accept read only
> connections
> May 21 21:23:40 d8 postgres[50645]: [16-1] 2015-05-21 21:23:40.223
> CEST @ 50645   LOG:  unexpected pageaddr 4C46/E00 in log segment
> 00044C4D0091, offset 0
> May 21 21:23:40 d8 postgres[50699]: [4-1] 2015-05-21 21:23:40.232 CEST
> @ 50699   LOG:  started streaming WAL from primary at 4C4D/9000 on
> timeline 4
> May 21 21:23:40 d8 postgres[50699]: [5-1] 2015-05-21 21:23:40.232 CEST
> @ 50699   FATAL:  could not receive data from WAL stream: ERROR:
> requested WAL segment 00044C4D0090 has already been
> removed
> May 21 21:23:40 d8 postgres[50645]: [17-1] 2015-05-21 21:23:40.255
> CEST @ 50645   LOG:  restored log file "00044C4D0090" from
> archive
> May 21 21:23:40 d8 postgres[50703]: [4-1] 2015-05-21 21:23:40.268 CEST
> @ 50703   LOG:  started streaming WAL from primary at 4C4D/9000 on
> timeline 4
> May 21 21:23:40 d8 postgres[50703]: [5-1] 2015-05-21 21:23:40.268 CEST
> @ 50703   FATAL:  could not receive data from WAL stream: ERROR:
> requested WAL segment 00044C4D0090 has already been
> removed
> May 21 21:23:40 d8 postgres[50703]: [5-2]
> ...
> (and so on)
>
> So, as I understand:
> - slave was started and entered restore,
> - slave restored 00044C4D0090 from archive, reached
> consistent recovery state
> - now, it connected to master and noticed, that, it has
> 00044C4D0091 segment uncomplete
> - and then, started yelling about missing segment (master deleted it
> already after archiving) on master
>
> Why?

Thanks for the report! This seems to be a bug.

This problem happens when WAL record is stored in separate two WAL files and
there is no valid latter WAL file in the standby. In your case, the former file
is 00044C4D0090 and the latter is 00044C4D0091.

In this case, the first half of WAL record can be read from the former WAL file,
but the remaining half not because no valid latter file exists in the standby.
Then the standby tries to retrieve the latter WAL file via replication.
The problem here is that the standby tries to start the replication from the
starting point of WAL record, i.e., that's the location of the former WAL file.
So the already-read WAL file is requested via replication.

T

Re: [GENERAL] Problem with Postgresql 9.0 streaming replication on Solaris 10 (x86)

2010-10-20 Thread Fujii Masao

On Wed, Oct 20, 2010 at 10:31 PM, dan.m.harris
 wrote:
> LOG: entering standby mode
> WARNING: WAL was generated with wal_level=minimal, data may be missing
> HINT: This happens if you temporarily set wal_level=minimal without taking a
> new base backup.

Did you set wal_level to archive or hot_standby on the master?
If not, you have to do that and restart the master. Then you
need to take a new base backup from the master and restart the
standby from that base backup.

> FATAL: could not connect to the primary server: invalid connection option
> "replication"
> (etc)
>
> Does anyone have any ideas what the problem may be? I suspect I may be
> missing a library somewhere - I can't believe that streaming replication
> just doesn't work on Solaris 10.

I guess that the version of the libpq library you use is not 9.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Failover on Windows

2010-11-01 Thread Fujii Masao

On Fri, Oct 29, 2010 at 9:58 PM, Norberto Delle  wrote:
> I'm testing a warm standby setup using PostgreSQL 9 x64 on Windows 2008 R2.

What command (pg_standby? cp?) is supplied in restore_command for warm-standby?
Or you are testing streaming replication + hot standby?

> The problem is that when I put the trigger file on the location specified in
> the parameter
> 'trigger_file' of the recovery.conf, nothing happens. No log entries, the
> recovery just continues
> as if nothing has happened.
> Any clues of what may be wrong?

At least if you use pg_standby, you have to create the trigger file on
the location
specified in -t option of pg_standby.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Failed archive_command copy - number of attempts configurable?

2010-11-09 Thread Fujii Masao

On Tue, Nov 9, 2010 at 4:01 AM, dan.m.harris
 wrote:
> But then the primary retries this another 49 times! So 150 attempts in all.
>
> What I need to know is whether these numbers are configurable?

No.

> Can they be
> timed? How long before the primary stops retrying altogether?

Forever until the archive will have been available again.

BTW, since the primary cannot remove the unarchived WAL file from
pg_xlog directory, unless you fix the archive soon, the primary might
run out of the disk space and cause a PANIC error.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.1 pg_last_xact_replay_timestamp limitations

2010-12-07 Thread Fujii Masao

On Wed, Dec 8, 2010 at 1:31 AM, Gabi Julien  wrote:
> pg_last_xact_replay_timestamp() returns null when the server is restarted 
> until a new transaction is streamed to the hot standby server. It might take 
> a long time before this happens. Because of this, we can't rely this function 
> completely.

I couldn't reproduce this. Could you provide a self-contained test case?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.1 pg_last_xact_replay_timestamp limitations

2010-12-08 Thread Fujii Masao

On Thu, Dec 9, 2010 at 1:37 AM, Gabi Julien  wrote:
> slave# /etc/init.d/postgresql start
> slave# psql -hlocalhost my_db -c "select pg_last_xact_replay_timestamp(), 
> now() as not_modified_since;"
>  pg_last_xact_replay_timestamp |      not_modified_since
> ---+---
>                               | 2010-12-08 16:06:09.920219+00

pg_last_xact_replay_timestamp returns the timestamp of last *replayed*
transaction.
So it returns NULL until at least one transaction has been replayed.

In your case, I guess that you started the master and standby from the
same initial
database cluster or clean-shutdowned one. In this case, since the standby has no
transaction to replay right after the startup, you got NULL until you
executed the
write query on the master.

We should return the timestamp of last valid checkpoint rather than NULL in that
case?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.1 pg_last_xact_replay_timestamp limitations

2010-12-09 Thread Fujii Masao

On Fri, Dec 10, 2010 at 1:24 AM, Gabi Julien  wrote:
> On Wednesday 08 December 2010 21:58:46 you wrote:
>> On Thu, Dec 9, 2010 at 1:37 AM, Gabi Julien  
>> wrote:
>> > slave# /etc/init.d/postgresql start
>> > slave# psql -hlocalhost my_db -c "select pg_last_xact_replay_timestamp(), 
>> > now() as not_modified_since;"
>> >  pg_last_xact_replay_timestamp |      not_modified_since
>> > ---+---
>> >                               | 2010-12-08 16:06:09.920219+00
>
>> We should return the timestamp of last valid checkpoint rather than NULL in 
>> that
>> case?
>
> Well, I think this behavior would be more appreciated by postgresql users in 
> general. The case where the slave can be restarted after a clean shutdown is 
> rare but we need to consider it nonetheless. In my case I implemented a 
> custom function that reads the last returned timestamp from a new file on 
> disk. This is not a perfect solution since the value returned might be older 
> then the actual state of the replication but it's good enough for my needs.

The second question is; What should be returned when the server has been
started normally without recovery? NULL? The timestamp of last valid checkpoint?

The third question is; What should be returned while replaying WAL records which
exist between REDO starting point and checkpoint? In this case, it seems bad to
return the timestamp of the checkpoint whenever there is no replay transaction,
since the result timestamp would go back once at least one transaction has been
replayed before reaching the checkpoint record.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Hot Standby pg_xlog problem

2010-12-14 Thread Fujii Masao

On Wed, Dec 15, 2010 at 7:24 AM, Michael Blake  wrote:
> I'm trying to set up a master/slave server, which initially worked
> fine, but recently started failing with the following error:
>
> ==
> LOG:  database system was interrupted; last known up at [time]
> LOG:  could not open file "pg_xlog/0001002B" (log file
> 0, segment 43): No such file or directory
> LOG:  invalid checkpoint record
> PANIC:  could not locate required checkpoint record
> HINT:  If you are not restoring from a backup, try removing the file
> "/var/lib/postgresql/9.0/main/backup_label".
> LOG:  startup process (PID 31489) was terminated by signal 6: Aborted
> LOG:  aborting startup due to startup process failure
> ==

If you set standby_mode to on in recovery.conf, you should get
the following message before getting PANIC, but that isn't in your
log messages.

LOG:  entering standby mode

So I guess you forgot placing recovery.conf in HS server.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Understanding PG9.0 streaming replication feature

2010-12-21 Thread Fujii Masao

On Wed, Dec 22, 2010 at 8:31 AM, Satoshi Nagayasu
 wrote:
> My blog entry would be a good entry point for you.  :)
>
> 5 steps to implement a PostgreSQL replication system
> http://pgsnaga.blogspot.com/2010/05/5-steps-to-implement-postgresql.html

Or
http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] did freese in creating a database cluster

2010-12-27 Thread Fujii Masao

On Tue, Dec 28, 2010 at 11:21 AM, Tsutomu Nakajima  wrote:
> I add the information of freeze status in creating a database cluster.
> The info is that the following process exists.
>
> postgres 1331432  708812   0 11:08:31 pts/10  0:00 /bin/sh
> /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data -d
> postgres 1347694 1331432 120 11:08:32 pts/10  0:05
> /usr/local/pgsql/bin/postgres -boot -x0 -F -D/usr/local/pgsql/data -c
> shared_buffers=500 -c max_connections=100 template1
> * TIME is now.

Though I'm not sure what the cause is, I recommend you to upgrade
your database since 7.4 is already end-of-life.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Startup Process Initiated by init proc (Unix)

2010-12-27 Thread Fujii Masao

On Tue, Dec 28, 2010 at 2:24 PM, aaliya zarrin  wrote:
> I am new to postgres. I am using 9.0.1 of postgres.
> I am using Switch over functionality of Postgres and want to reduce the
> switch over time.
> I have implemented this using Signaling. After signaling the postmaster
> (startup process) once the trigger file is created and making PG to check
> for trigger file
> and do the switch over. PG is receiving the signal and doing the switch over
> as well but two startup processes are running on standby node one initiated
> by postmested and other by init (unix process).
>
> The steps I am following are as follows,
>  - search for trigger file.
> - Complete the recovery- rename recovery.conf file.
> any step is missing I am not sure.

The backtrace of that orphan process would be helpful for the analysis.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Startup Process Initiated by init proc (Unix)

2010-12-27 Thread Fujii Masao

On Tue, Dec 28, 2010 at 3:07 PM, aaliya zarrin  wrote:
> I didn't get? Do you want the syslog files or some more information?

Yeah, I want more information. You can get the backtrace by using gdb.
In detail, please see the following:
http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Maintenance commands on standby servers

2011-02-08 Thread Fujii Masao

On Tue, Feb 8, 2011 at 4:04 AM, Sylvain Rabot  wrote:
> Is it possible to run maintenance commands like ANALYZE, VACUUM, CLUSTER
> on a standby server ?

No.

Since the effect of the maintenance command on the primary server
is also replicated, you don't need to do that on the standby.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Trying out replication: cp cannot stat log file during recovery

2011-04-12 Thread Fujii Masao

On Wed, Apr 13, 2011 at 4:45 AM, Henry C.  wrote:
> Greets,
>
> Pg 9.0.3.
>
> I'm trying out Pg's built-in replication for the first time and noticed
> something odd.
>
> On the slave I see the following in the logs (after rsyncing all from master
> to slave and firing up Pg on the slave):
>
> ...
> restored log file "0001018E000E" from archive
> restored log file "0001018E000F" from archive
> consistent recovery state reached at 18E/1000
> restored log file "0001018E0010" from archive
> cp: cannot stat `/home/arc/0001018E0011': No such file or 
> directory
> unexpected pageaddr 18D/9100 in log file 398, segment 17, offset 0
> cp: cannot stat `/home/arc/0001018E0011': No such file or 
> directory
> streaming replication successfully connected to primary
> ...
>
> /home/arc is an NFS mount from master and is where the WAL archive is kept
> (yes, I'll move it eventually; for now I'm just testing).
>
> Things seem to run fine up until (and after) log file
> 0001018E0011.  That particular file is definitely present.  Why
> would cp(1) fail to stat the file when it worked fine for all the others?

I guess that file didn't exist in the archive at the moment when cp failed.
It was archived after that. So you observed that file in the archive.

> I notice from another mailing list post that 'unexpected pageaddr' is possibly
> not that serious and is probably unrelated to the cp/stat error above.
>
> However, since recovery seems to have skipped a log file, what would that mean
> in terms of the slave being a true copy of master and integrity of the data?

When the standby fails to read the WAL file from the archive, it tries to read
that from the master via replication connection. So the standby would not skip
that file.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [Solved] Corrupt indexes on slave when using pg_bulkload on master

2013-01-10 Thread Fujii Masao

On Thu, Jan 10, 2013 at 6:51 AM, Tom Lane  wrote:
> James Cowell  writes:
>> But pg_bulkload only puts the index updates into WAL if you also have
>>
>> archive_mode = on
>>
>> I guess it needs to test wal_level rather than archive mode now?  It looks 
>> like changes to the project have been minimal for some time, which is a 
>> shame because it's a very useful tool.
>
> Oh, the code in question is in pg_bulkload not the backend?  Yeah, it
> sounds like it hasn't tracked some core-code changes.  In particular
> you might point the author at
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=40f908bdcdc726fc11912cd95dfd2f89603d1f37#patch10
> which shows how it needs to be done in 9.0 and later.

I reported this problem to the author in offlist. Thanks for the bug report!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Streaming Replication

2013-01-15 Thread Fujii Masao

On Wed, Jan 16, 2013 at 1:22 AM, ning chan  wrote:
> Hi Albe,
> Thanks for your kind response.
> The wordings is confusing, it gives me impression that the recovery is going
> on.

Yes. Walreceiver receives the WAL records from the master server, and writes
them to the pg_xlog directory. Then the startup process reads the WAL records
from the pg_xlog directory, and replays them.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Strange message from pg_receivexlog

2013-08-21 Thread Fujii Masao

On Tue, Aug 20, 2013 at 3:17 PM, Sergey Konoplev  wrote:
> Hi all,
>
> My WAL archiving script based on pg_receivexlog reported the following
> error several days ago (just ignore everything before
> 'pg_receivexlog', it's a message my script generates).
>
> Thu Aug 15 18:33:09 MSK 2013 ERROR archive_wal.sh: Problem occured
> during WAL archiving: pg_receivexlog: could not send feedback packet:
> server closed the connection unexpectedly
>
> At the same time postgres reported this error in log:
>
> 2013-08-15 18:32:51 MSK 30945 postgres@[unknown] from [local]
> [vxid:53/0 txid:0] [streaming 2A97/6FA48000] LOG:  terminating
> walsender process due to replication timeout
>
> Both pg_receivexlog and postgres run at the same machive,
> pg_receivexlog connects to postgres locally. /var/log/messages has
> absolutely nothing about it. I also have a hot standby on another
> machine connecting to the same master, but there is nothing strange in
> its logs either.
>
> Any thoughts what it was?

Is the value of replication_timeout sufficiently-larger than the status-interval
of pg_receivexlog?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: postmaster.pid still exists after pacemaker stopped postgresql - how to remove

2013-08-26 Thread Fujii Masao

On Mon, Aug 26, 2013 at 9:53 PM, Mistina Michal  wrote:
> Hi there.
>
> I didn’t find out why this issue happened. Only backup and format of the
> filesystem where corrupted postmaster.pid file existed helped to get rid of
> it. Hopefully the file won’t appear in the future.

I have encountered similar problem when I broke the filesystem by
a double mount. You may have gotten the same problem.

>  Master/Slave Set: ms_drbd_pg [drbd_pg]
>
>  Masters: [ tstcaps01 ]
>
>  Slaves: [ tstcaps02 ]

Why do you use DRBD with streaming replicatin? If you locates
the database cluster on DRBD, it's better to check the status of
DRBD filesystem.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: postmaster.pid still exists after pacemaker stopped postgresql - how to remove

2013-08-26 Thread Fujii Masao

On Mon, Aug 26, 2013 at 11:02 PM, Mistina Michal
 wrote:
> Hi Masao.
> Thank you for suggestion. In deed that could occure. Most probably while I
> was testing split-brain situation. In that case I turned off network card on
> one node and on both nodes DRBD was in primary role. But after the
> split-brain occurred I resync DRBD so from two primaries I promoted one as
> "primary" (winner) and second one as "secondary" (victim). Data should be
> consistent by that moment. But probably it wasn't consistent.
>
> I am using DRBD only in one technical center. Data are syncing by streaming
> replication to the secondary technical center where is another DRBD
> instance.
>
> It's like this:
>
> TC1:
> --- node1: DRBD (primary), pgsql
> --- node2: DRBD (secondary), pgsql
>
> TC2:
> --- node1: DRBD (primary), pgsql
> --- node2: DRBD (secondary), pgsql
>
> Within one technical center only one pgsql runs only on one node. This is
> done by pacemaker/corosync.
> From the outside perspective it looks like only one postgresql server is
> running in one TC.
> TC1 (master)  streaming replication => TC2 (slave)
>
> If one node in technical center fails, the fail-over to secondary node is
> really quick. It's because fast network within technical center.
> Between TC1 and TC2 there is a WAN link. If something goes wrong and TC1
> became unavailable I can switch manually / automatically to TC2.
>
> Is there more appropriate solution? Would you use something else?

Nope. I've heard the similar configuration, though it uses shared disk
failover solution instead of DRBD.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Last insert/update/delete time for a table

2008-04-25 Thread Fujii Masao

2008/4/25 Aleksander Kmetec - INTERA <[EMAIL PROTECTED]>:
>  Is there a way to get the time of the last insert, update or delete
> statement for a specific table?

You can check the time stamp of the file corresponding the table
after checkpoint. The relationship between the table name and
the file name is in pg_class.

   SELECT relfilenode FROM pg_class WHERE relname = 'tablename';

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] clustering without locking

2008-05-02 Thread Fujii Masao

On Fri, May 2, 2008 at 9:12 AM, fschmidt <[EMAIL PROTECTED]> wrote:
>
>  An implementation of clustering without locking would start by comparing the
>  index to the table from the beginning to find the first mismatch.  Rows
>  before the mismatch are fine, and can be left alone.  From here on, go
>  through the index and rewrite each row in order.  This will put the rows at
>  the end of the table in cluster order.  When done, vacuum the table.  This
>  will result in a clustered table without any locking needed.  Those few
>  records that were updated while clustering was happening will be out of
>  order, but that should only be a few.

In your proposal, a large amount of dead tuple can be generated in both
table and index. This is a serious problem that spoils the effect of clustering.

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Replication and fsync

2013-10-24 Thread Fujii Masao

On Thu, Oct 24, 2013 at 10:39 AM,   wrote:
> Newb question.
>
> I'm running 9.1 with a slave using streaming replication. A coworker wants
> to turn off fsync on the master and insists that the slave will still be in
> a usable state if there is a failure on the master. We all know that turning
> off fsync is a bad idea, but I was under the impression that the fsync
> setting would be replicated to the slave, making it useless as a backup in
> this scenario.

No. The setting of fsync in the master is not replicated to the standby.

> Am I wrong? If I'm wrong, is there still danger to the slave
> in this kind of setup?

No, I think.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] replay doesn't catch up with receive on standby

2011-04-18 Thread Fujii Masao

On Tue, Apr 19, 2011 at 9:00 AM, Steven Parkes  wrote:
> This is on 9.0.3: I've got two dbs running as standby to a main db. They 
> start up fine and seem to think they're all caught up (by /var/log logs), but
>
> SELECT pg_last_xlog_receive_location() AS receive, 
> pg_last_xlog_replay_location() AS replay;
>
> reports replay behind receive and it doesn't change. This is on both dbs.
>
> Notably the main db isn't (wasn't) doing anything, so no new commits were 
> causing things to move forward. I did a write to it and both slaves moved 
> both their recieved and replay serial numbers up.
>
> Is there a valid situation where an idle master/standby setup could remain 
> with replay behind received indefinitely? (My nagios monitor isn't very happy 
> with that (at present)) and before changing that I'd like to understand 
> better what's going on.)

Did you run query on the standby? If yes, I guess that query conflict prevented
the reply location from advancing.
http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] replay doesn't catch up with receive on standby

2011-04-18 Thread Fujii Masao

On Tue, Apr 19, 2011 at 10:28 AM, Steven Parkes  wrote:
>> Did you run query on the standby?
>
> Yup. Both standbys. They both responded the same way.
>
>> If yes, I guess that query conflict prevented
>> the reply location from advancing.
>> http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT
>
> The standbys were idle and this was a persistent state. I restarted the 
> standbys and they stayed in this state. Am I missing something? I thought 
> these conflicts were related to queries against the standbys but there 
> shouldn't have been any that I'm aware. Certainly none should survive a 
> restart ...
>
> Am I missing something about the conflict?
>
> It also seems notable that a new commit on the master cleared the issue ... 
> Does that seem like the hot standby conflict case?

Probably no.

Was there idle-in-transaction in the master when the problem happened?
If yes, this can happen. In that case, only half of WAL record can be written
to the disk by walwriter and sent to the standby by walsender. The rest
will be written and sent after you'll have finished the transaction. In this
case, the receive location indicates the end of that WAL record obviously.
OTOH, since that half-baked WAL record cannot be replayed, the replay
location cannot advance and still has to indicate the end of previous WAL
record.

If you issue new commit, all of the WAL record is flushed to the standby.
So that WAL record was replayed and the replay location advanced. I guess
you observed the above situation.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_rman in Windows - is it possible?

2011-07-01 Thread Fujii Masao

On Fri, Jul 1, 2011 at 6:18 PM, AI Rumman  wrote:
> Could anyone please tell me whether I can use pg_rman in my Windows
> environment?

http://code.google.com/p/pg-rman/wiki/Platforms

According to the above page, you cannot use it on Windows.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] streaming replication does not work across datacenter with 20ms latency?

2011-07-25 Thread Fujii Masao

On Mon, Jul 25, 2011 at 10:56 AM, Yan Chunlu  wrote:
> I think the problem is still "invalid record length" and "invalid
> magic number", it start showing right after I complete sync data and
> start slave.  If I stop slave later and restart, yes it could show
> xlog not found and can not catch master. but why the "invalid" things
> in the first place?

You might have the same problem which was reported before.
http://archives.postgresql.org/pgsql-hackers/2011-06/msg00661.php

That problem was fixed, and the fix will be included in next minor
update (i.e., 9.0.5).
http://archives.postgresql.org/pgsql-committers/2011-06/msg00101.php

Of course, you can avoid the problem by building PostgreSQL with
gcc != 4.6.0, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Timeline Conflict

2011-08-02 Thread Fujii Masao

On Tue, Aug 2, 2011 at 2:59 PM, senthilnathan  wrote:
> We have system(Cluster) with Master replicating to 2 stand by servers.
>
> i.e
>
> M   |---> S1
>
>      |---> S2
>
> If master failed, we do a trigger file at S1 to take over as master. Now we
> need to re-point the standby S2 as slave for the new master (i.e S1)
>
> While trying to start standby S2,there is a conflict in timelines, since on
> recovery it generates a new line.
>
> Is there any way to solve this issue?

Basically you need to take a fresh backup from new master and restart
the standby
using it. But, if S1 and S2 share the archive, S1 is ahead of S2
(i.e., the replay location
of S1 is bigger than or equal to that of S2), and
recovery_target_timeline is set to
'latest' in S2's recovery.conf, you can skip taking a fresh backup
from new master.
In this case, you can re-point S2 as a standby just by changing
primary_conninfo in
S2's recovery.conf and restarting S2. When S2 restarts, S2 reads the
timeline history
file which was created by S1 at failover and adjust its timeline ID to
S1's. So timeline
conflict doesn't happen.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] streaming replication: one problem & several questions

2011-08-15 Thread Fujii Masao

On Thu, Aug 11, 2011 at 7:19 AM, Lonni J Friedman  wrote:
> First the problem.  On *only* one of the two standby servers, I'm
> seeing errors like the following whenever I issue any SQL commands on
> the master which write (insert, update, etc) to the database:
> LOG:  invalid record length at 8/7A20
> FATAL:  terminating walreceiver process due to administrator command
> LOG:  invalid record length at 8/7AB0
> LOG:  streaming replication successfully connected to primary
> LOG:  invalid record length at 8/7B20
> FATAL:  terminating walreceiver process due to administrator command
> LOG:  record with zero length at 8/7BB0
> LOG:  streaming replication successfully connected to primary
> LOG:  record with incorrect prev-link 8/7958 at 8/7DB0
> LOG:  streaming replication successfully connected to primary

Did you use gcc4.6 or later to build PostgreSQL9.0? If yes, you would
face the same problem reported before;
http://archives.postgresql.org/pgsql-hackers/2011-06/msg00661.php

This problem was fixed, and the fix will be included in next minor update
(i.e., 9.0.5).
http://archives.postgresql.org/pgsql-committers/2011-06/msg00101.php

Of course, you can avoid the problem by building PostgreSQL with
pre-4.6 gcc.

> 0) I've successfully setup the WAL archiving on the master, and set
> archive_timeout=61.  However, what I'm seeing is that new files are
> not getting generated every 61 seconds, but instead only when some
> kind of SQL is invoked which writes to the database, or every 305
> seconds (whichever comes first).

First of all, you don't need to set archive_timeout. Without archive_timeout,
streaming replication transfers WAL records from the master to the standby
in almost real time.

archive_timeout doesn't always generate new WAL file for each timeout.
If there is no write workload, WAL file generation by archive_timeout is
skipped.

OTOH, checkout generates write workload, so archive_timeout after checkpoint
always creates new WAL file. Since (I guess) you set checkpoint_timeout
to 5min, you observed WAL file generation for each about 5min.

> 1) Both of the wiki links above comment that the restore_command may
> not be necessary if wal_keep_segments is large enough (mine is set to
> 128).  I was going to setup the restore_command anyway, as I'm not yet
> confident enough about streaming replication and failover with
> postgresql to take chances, although the fact that i have two standby
> servers makes this setup a bit more complex.  However, can anyone
> comment about whether its ever truly safe 100% of the time to run
> without a restore_command ?

Specifically, what problem are you concerned about?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Change master to standby

2011-08-17 Thread Fujii Masao

On Tue, Aug 16, 2011 at 6:55 PM, Alexander Perepelica
 wrote:
> Can I change server mode master to slave (standby) without restarting?

Which replication tool do you use? If streaming replication, the answer is "No".
You need to shutdown the master, make a fresh base backup from new master,
create recovery.conf and start the server as the standby from the backup.

I have no idea about other replication method.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] streaming replication: one problem & several questions

2011-08-17 Thread Fujii Masao

On Thu, Aug 18, 2011 at 4:26 AM, Lonni J Friedman  wrote:
> I wish I knew.  All the documentation out there always focuses on
> setting up a restore command, as if there would be a huge disaster if
> it wasn't done.  Is it safe to simply make wal_keep_segments really
> large, and skip the restore_command altogether?

There are pros and cons of replication setting NOT using restore_command.
Please evaluate whether it's safe or not according to them.

Pros;
* You don't need to prepare the archive area shared between the master
   and standby. Don't need to purchase new server for that.

* If you use restore_command and have the shared archive area,
   archive_command is a bit more likely to fail because it copies WAL files
   via network. Failure of archive_command might fill up the pg_xlog
   directory on the master, which might cause PANIC error. So you need
   to consider how to handle this failure case. OTOH, you don't need to
   do that if you don't use restore_command.

Cons;
* When setting up the standby, if the backup takes very long because
   the database is quite large, some WAL files required to the backup
   might be deleted from the master during the backup. If this happens,
   the standby starting from that backup will fail to start replication.
   To avoid such an unexpected deletion of WAL files from the master,
   you need to increase wal_keep_segments enough. But it might not
   be easy to determine the appropriate value of it.

* You need to prepare large disk space for pg_xlog directory
   if wal_keep_segments is large. Because, in that case, a large number
   of WAL files can accumulate in pg_xlog.

* When replication connection is terminated, no WAL data is streamed
   to the standby, so the standby cannot advance recovery at all. OTOH,
   if you set restore_command on the standby and have the shared
   archive area, the standby can read new WAL file from it by using
   restore_command and advance recovery.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] synchronous_standby_names with '*'

2011-08-19 Thread Fujii Masao

On Fri, Aug 19, 2011 at 7:06 PM, alexondi  wrote:
> Hi!
> I try create master for replication and in his config I set
> synchronous_standby_names = '*'
> so I can connect with other slave's (name of this slave I don't know at this
> moment)
> But if I try execute some commands (I hung on 'create database') my program
> hang and after some time I see
> in processes
> ... create database waiting for 0/XX
> and when I strace to wal I see
> ...
> select ... timeout
> getpid()
> select ... timeout
> getpid()
> ...
> and so on.
> What so '*' mean in this GUC?

Setting synchronous_standby_names forces the master to perform
synchronous replication. If synchronous_commit is set to 'on',
all the transactions must wait for their WAL to be replicated to
the standby. Since you set synchronous_standby_names to '*',
"create database" hanged, i.e., was waiting for its WAL to be
replicated to the standby.

Only standby whose application_name matches synchronous_standby_names
can run as synchronous standby. '*' matches any application_name, so
'*' means that any standby can run as synchronous one.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] synchronous_standby_names with '*'

2011-08-19 Thread Fujii Masao

On Fri, Aug 19, 2011 at 8:04 PM, alexondi  wrote:
> but if I don't have any slave at this time why my command hang?

Because there is no standby at that time ;)

If synchronous replication is enabled but there is no standby connecting
to the master, a write transaction waits until at least one synchronous
standby has appeared and its WAL has been replicated to it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] synchronous_standby_names with '*'

2011-08-19 Thread Fujii Masao

On Fri, Aug 19, 2011 at 8:39 PM, alexondi  wrote:
> so can I somehow disable|enable synchronous replication at runtime (not only
> for session but for the server)?

Yes.

To disable;
Empty synchronous_standby_names, or set synchronous_commit to "local" or "off".

To enable;
Set synchronous_standby_names or set synchronous_commit to "on".

And then reload the configuration file.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] FATAL: terminating connection due to conflict with recovery

2011-08-30 Thread Fujii Masao

On Wed, Aug 31, 2011 at 5:51 AM, Jeff Ross  wrote:
> Is there a setting in this or something else that I should tweak so this
> query can complete against the replica?  Google turned up some threads on
> the error code associated with the error but I didn't find much else that
> seems applicable.

Increasing max_standby_archive_delay and max_standby_streaming_delay
would be helpful to make the query complete. Please see the following manual
for details.
http://www.postgresql.org/docs/9.0/interactive/hot-standby.html#HOT-STANDBY-CONFLICT

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] master-side counterpart of pg_last_xact_replay_timestamp?

2011-09-07 Thread Fujii Masao

On Thu, Sep 8, 2011 at 7:06 AM, Chris Redekop  wrote:
> Is there anything available to get the last time a transaction
> occurred?like say "pg_last_xact_timestamp"?

No.

> In order to accurately
> calculate how far behind my slave is I need to do something like
> master::pg_last_xact_timestamp() -
> slave::pg_last_xact_replay_timestamp()currently I'm using now() instead
> of the pg_last_xact_timestamp() call, but then when the master is not busy
> the slave appears to lag behind.  I'm considering writing a C module to get
> the last modified file time of the xlog, but I'm hoping there is a better
> alternative that I haven't found yet

Your complaint makes sense. I'll implement something like
pg_last_xact_timestamp() for 9.2. But unfortunately there is
no way to know such a timestamp on the master, in 9.1..

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] master-side counterpart of pg_last_xact_replay_timestamp?

2011-09-08 Thread Fujii Masao

On Thu, Sep 8, 2011 at 3:19 PM, Simon Riggs  wrote:
> On Thu, Sep 8, 2011 at 6:43 AM, Fujii Masao  wrote:
>> Your complaint makes sense. I'll implement something like
>> pg_last_xact_timestamp() for 9.2. But unfortunately there is
>> no way to know such a timestamp on the master, in 9.1..
>
>
> I see the reason, but would be against that change.
>
> We don't currently generate a timestamp for each WAL record. Doing so
> would be a performance drain and a contention hotspot.

Each commit/abort record already has a timestamp. So I'm thinking to
implement pg_last_xact_insert_timestamp() so that it returns the
timestamp of the last inserted commit/abort record. Since we don't
need to generate a timestamp newly, I guess that what I'm thinking to
implement would not degrade a performance.

pg_last_xact_replay_timestamp() also returns the timestamp of the
commit/abort record replayed. So pg_last_xact_insert_timestamp()
doesn't need to return the timestamp other than that of commit/abort
record, to compare them to calculate the replication delay.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Streaming Replication and Firewall

2011-09-29 Thread Fujii Masao

On Fri, Sep 30, 2011 at 1:35 AM, Ian Harding  wrote:
> I updated the firewall rules on a streaming replication standby server
> and thought nothing of it.  I later happened to notice on the primary
> that ps aux | grep stream didn't show streaming to that server
> anymore.  On the standby that command still showed the wal receiver
> patiently waiting for new data.
>
> I know I broke it, but would anything have eventually happened, or
> would the wal receiver keep patiently waiting as the world passed it
> by?

I guess that walreceiver has been stuck because it could not detect
the termination of connection for a while. By default, keepalive is
enabled on the replication connection from the standby to the master,
but how long it takes for walreceiver to notice the termination of
connection depends on the kernel parameters related to keepalive.
By default settings, it takes about two hours.

Setting keepalives parameters in primary_conninfo would help to fix
such a problem.
http://developer.postgresql.org/pgdocs/postgres/libpq-connect.html

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to avoid TimeLine increase / Change on recovery?

2011-10-04 Thread Fujii Masao

On Mon, Oct 3, 2011 at 11:39 PM, senthilnathan
 wrote:
> Is there any way to avoid time line increase /change on recovery

No, there is no way to prevent the timeline ID from being incremented
at the end of archive recovery.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Disable Streaming Replication without restarting either master or slave

2012-05-29 Thread Fujii Masao

On Tue, May 29, 2012 at 7:18 PM, Samba  wrote:
> Hi,
>
>
> Is it possible stop/disable streaming replication without stopping or
> restarting either master or slave servers?
>
> Since stopping or restarting the postgres servers would involve complete
> invalidation of the connection pool [Java/JEE app server pool] that may take
> a few minutes before the application becomes usable, it would be great if
> there is a way we can disable replication [for maintenance reasons like
> applying patches or upgrades, etc].

There is no clean way to disable streaming replication. But you can do that
by the following steps:

1. change pg_hba.conf in the master so that the master does not accept new
replication connection
2. reload pg_hba.conf in the master
3. send SIGTERM signal to currently-running walsender process, e.g., by
"select pg_terminate_backend(pid) from pg_stat_replication".

Then replication connection will be terminated. The standby tries reconnecting
to the master, but which will continue failing until you'll change pg_hba.conf
again.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Disable Streaming Replication without restarting either master or slave

2012-05-29 Thread Fujii Masao

On Tue, May 29, 2012 at 10:17 PM, François Beausoleil
 wrote:
> Hi!
>
> Le 2012-05-29 à 06:18, Samba a écrit :
>
> Is it possible stop/disable streaming replication without stopping or
> restarting either master or slave servers?
>
> Since stopping or restarting the postgres servers would involve complete
> invalidation of the connection pool [Java/JEE app server pool] that may take
> a few minutes before the application becomes usable, it would be great if
> there is a way we can disable replication [for maintenance reasons like
> applying patches or upgrades, etc].
>
>
> Are per-chance looking for pg_xlog_replay_pause() and
> pg_xlog_replay_resume() ?

Those can pause and resume WAL replay in the standby, but not streaming
replication. Even while WAL replay is being paused, WAL can be streamed
from the master to the standby.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Disable Streaming Replication without restarting either master or slave

2012-05-29 Thread Fujii Masao

On Wed, May 30, 2012 at 2:38 AM, Michael Nolan  wrote:
>
>
> -- Forwarded message --
> From: Michael Nolan 
> Date: Tue, May 29, 2012 at 1:37 PM
> Subject: Re: [GENERAL] Disable Streaming Replication without restarting
> either master or slave
> To: Fujii Masao 
>
>
>
>
> On Tue, May 29, 2012 at 1:15 PM, Fujii Masao  wrote:
>>
>> On Tue, May 29, 2012 at 10:17 PM, François Beausoleil
>>  wrote:
>
>
>>
>> > Are per-chance looking for pg_xlog_replay_pause() and
>> > pg_xlog_replay_resume() ?
>>
>> Those can pause and resume WAL replay in the standby, but not streaming
>> replication. Even while WAL replay is being paused, WAL can be streamed
>> from the master to the standby.
>>
>> Regards,
>>
>> --
>> Fujii Masao
>
>
> So, that means that the only ways to stop streaming replication are to stop
> the slave server, to disable access to the master via the pg_hba.conf file
> (requiring the master configs be reloaded) or to set the trigger file on the
> slave to tell it to stop replicating the master.
>
> And if the master/slave are set to synchronous streaming replication, your
> options are more limited, since the master has to know to stop waiting for
> the synchronous slave to respond.
>
> Once the slave has gone out of asynchronous replication mode, wuld it be
> possible to resume asynchronous replication by stopping the slave server,
> removing the trigger file, and restarting it in asynchronous streaming
> replication mode?  This would, at a minimum, depend on how many updates have
> occurred on the master during the time streaming replication was disabled
> and having all the WAL files available, right?

You'd like to restart the *promoted* standby server as the standby again?
To do this, a fresh base backup must be taken from the master onto
the standby before restarting it, even if there has been no update since
the standby had been promoted.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Disable Streaming Replication without restarting either master or slave

2012-05-29 Thread Fujii Masao

On Wed, May 30, 2012 at 3:04 AM, Raghavendra
 wrote:
>> > Since stopping or restarting the postgres servers would involve complete
>> > invalidation of the connection pool [Java/JEE app server pool] that may
>> > take
>> > a few minutes before the application becomes usable, it would be great
>> > if
>> > there is a way we can disable replication [for maintenance reasons like
>> > applying patches or upgrades, etc].
>>
>
> I think even applying patches or upgrades needs restart.

Yep.

>> 3. send SIGTERM signal to currently-running walsender process, e.g., by
>>    "select pg_terminate_backend(pid) from pg_stat_replication".
>
>
> Will it be helpful here sending SIGINT instead of killing ?

No, walsender ignores SIGINT signal.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] can we avoid pg_basebackup on planned switches?

2012-08-05 Thread Fujii Masao

On Sat, Jul 28, 2012 at 2:00 AM, Ben Chobot  wrote:
> We make heavy use of streaming replication on PG 9.1 and it's been great for
> us. We do have one issue with it, though, and that's when we switch master
> nodes - currently, the documentation says that you must run pg_basebackup on
> your old master to turn it into a slave. That makes sense when the old
> master had crashed, but it seems that in the case of a planned switch, we
> could do better. Here's what we tried that seemed to work... are we shooting
> ourselves in the foot?
>
> 1. Cleanly shut down the current master.
> 2. Pick a slave, turn it into the new master.

Before promoting the standby, you have to confirm that all WAL files
the old master
generated have been shipped to the standby which you'll promote. Because the
standby might terminate the replication before receiving all WAL
files. Note that
there is no clean way to confirm that. For example, to confirm that, you need to
execute CHECKPOINT in the standby, run pg_controldata in both old master and
standby, and check whether their latest checkpoint locations are the same. You
may think to compare the latest checkpoint location in the old master and
pg_last_xlog_replay_location in the standby. But the former indicates
the *starting*
location of the last WAL record (i.e., shutdown checkpoint WAL record). OTOH,
the latter indicates the *ending* location of it. So you should not compare them
without taking into consideration the above mismatch.

If the standby failed to receive some WAL files, you need to manually copy them
in pg_xlog from the old master to the standby.

> 3. Copy the new pg_xlog history file over to the old master.
> 4. On any other slaves (many of our clusters are 3 nodes), we already have
> "recovery_target_timeline=latest" and wal archiving, so they should already
> be working as slaves of the new master.
> 5. Set up recovery.conf on the old master to be like the other slaves.
> 6. Start up the old master.
>
> Have we just avoided running pg_basebackup, or have we just given ourselves
> data corruption?

If you change your operations in the above-mentioned way, I think you can
avoid pg_basebackup on the planned switch. I've not tested your operations.
So please test them carefully before applying them to your system.

> Because we're using wal archiving, can we simplify and
> leave out step 3?

Yes.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] can we avoid pg_basebackup on planned switches?

2012-08-07 Thread Fujii Masao

On Mon, Aug 6, 2012 at 3:29 AM, Ben Chobot  wrote:
>
> On Aug 5, 2012, at 11:12 AM, Fujii Masao wrote:
>
>> On Sat, Jul 28, 2012 at 2:00 AM, Ben Chobot  wrote:
>>> We make heavy use of streaming replication on PG 9.1 and it's been great for
>>> us. We do have one issue with it, though, and that's when we switch master
>>> nodes - currently, the documentation says that you must run pg_basebackup on
>>> your old master to turn it into a slave. That makes sense when the old
>>> master had crashed, but it seems that in the case of a planned switch, we
>>> could do better. Here's what we tried that seemed to work... are we shooting
>>> ourselves in the foot?
>>>
>>> 1. Cleanly shut down the current master.
>>> 2. Pick a slave, turn it into the new master.
>>
>> Before promoting the standby, you have to confirm that all WAL files
>> the old master generated have been shipped to the standby which you'll 
>> promote. Because the
>> standby might terminate the replication before receiving all WAL
>> files. Note that there is no clean way to confirm that. For example, to 
>> confirm that, you need to
>> execute CHECKPOINT in the standby, run pg_controldata in both old master and
>> standby, and check whether their latest checkpoint locations are the same. 
>> You
>> may think to compare the latest checkpoint location in the old master and
>> pg_last_xlog_replay_location in the standby. But the former indicates
>> the *starting* location of the last WAL record (i.e., shutdown checkpoint 
>> WAL record). OTOH,
>> the latter indicates the *ending* location of it. So you should not compare 
>> them
>> without taking into consideration the above mismatch.
>>
>> If the standby failed to receive some WAL files, you need to manually copy 
>> them
>> in pg_xlog from the old master to the standby.
>
> Oh, I would have though that doing a clean shutdown of the old master (step 
> 1) would have made sure that all the unstreamed wal records would be flushed 
> to any connected slaves as part of the master shutting down. In retrospect, I 
> don't remember reading that anywhere, so I must have made that up because I 
> wanted it to be that way. Is it wishful thinking?

When clean shutdown is requested, the master sends all WAL records to
the standby,
but it doesn't wait for the standby to receive them. So there is no
guarantee that all WAL
records have been flushed to the standby. Walreceiver process in the
standby might
detect the termination of replication connection and exit before
receiving all WAL records.
Unfortunately I've encountered that case some times.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] When do archived WAL files gets removed after wal_keep_segments changed?

2012-11-02 Thread Fujii Masao

On Thu, Nov 1, 2012 at 1:21 AM, Christian Hammers  wrote:
> Hello
>
> I run two PostgreSQL servers in a master-slave setup and set
> wal_keep_segments=1000 on the master to allow long downtimes on the slave.
>
> Meanwhile the disk got fuller than I estimated and I changed the config
> to wal_keep_segments=500 and restarted the server afterwards.
>
> Yet, the number of WAL segments in my archive dir was still at 1000!
>
> I tried a random mix of VACUUM, CHECKPOINT, pg_start_backup(),
> pg_stop_backup(), server restarts, googling and just plain waiting
> but the number of archived WAL segments just increased to 1018.

wal_keep_segments affects the WAL files in pg_xlog directory,
not archive directory. When and how to remove archived files is
basically responsibility of a user.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Archiver not picking up changes to archive_command

2010-05-10 Thread Fujii Masao

On Tue, May 11, 2010 at 9:50 AM, Tom Lane  wrote:
> bricklen  writes:
>> Due to some heavy processing today, we have been falling behind on
>> shipping log files (by about a 1000 logs or so), so wanted to up our
>> bwlimit like so:
>
>> rsync -a %p postg...@192.168.80.174:/WAL_Archive/ && rsync
>> --bwlimit=1875 -az %p postg...@14.121.70.98:/WAL_Archive/
>
>> The db is showing the change.
>> SHOW archive_command:
>> rsync -a %p postg...@192.168.80.174:/WAL_Archive/ && rsync
>> --bwlimit=1875 -az %p postg...@14.121.70.98:/WAL_Archive/
>
>> Yet, the running processes never get above the original bwlimit of
>> 1250. Have I missed a step? Would "kill -HUP " help?
>> (I'm leery of trying that untested though)
>
> A look at the code shows that the archiver only notices SIGHUP once
> per outer loop, so the change would only take effect once you catch up,
> which is not going to help much in this case.  Possibly we should change
> it to check for SIGHUP after each archive_command execution.

+1

Here is the simple patch to do so.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgarch_check_sighup_v1.patch
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] archive_command

2010-06-01 Thread Fujii Masao

On Wed, Jun 2, 2010 at 3:47 AM, Jun Wang  wrote:
> The document recommends a command as below:
>
> archive_command = 'test ! -f /var/lib/pgsql/backup_in_progress || cp
> -i %p /var/lib/pgsql/archive/%f < /dev/null'
>
> How can this be done using windows batch? I tried the following
> command, but it did not work:
>
> archive_command = 'if not exist C:\\pgsql\\backup_in_progress || copy
> %p C:\\pgsqlarchive\\%f'

You want to take "standalone hot backup" instead of normal one? If not,
you don't need to check the file "backup_in_progress".

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] LINE 1: IDENTIFY_SYSTEM error infinitum

2010-06-30 Thread Fujii Masao

On Thu, Jul 1, 2010 at 11:35 AM, Zoid  wrote:
> Hmm.  I tried the replication=1 switch but I was prompted with the below but
> I noticed the "local" requirement assumes a UNIX socket which i'm not using.
>  And both databases are actually on the same box (just different ports).
>
>   psql: FATAL:  no pg_hba.conf entry for replication connection from host
> "[local]", user "postgres"
>
>
> I've been using this to start the replication database processs, which does
> start ok, spit out to syslog then barfs on the IDENTIFY_SYSTEM returned from
> the primary.     /usr/local/pgsql/bin/postgres -p5433 -D
> replication_database  -o -d5
>
> Can I be sure that the Primary even has the capability to answer the
> IDENTIFY_SYSTEM command?   I'm beginning to think maybe this is zero'd out
> until the beta is finalized.   It would take a code search but if the
> functionality is linked to Major Version and Minor version variables, I
> guess its possible that it won't work until the beta2 is normalized to an
> normal release.    So I'll look for an alternate means to extract the System
> Identity on the primary.

Hmm... you'd like to get the system identifier from the postgres server via SQL
rather than starting replication? If so, you can do that by adding replication
entry into pg_hba.conf and performing the following

$ psql replication=1 -c "IDENTIFY_SYSTEM"
      systemid   | timeline
-+--
 5488763631978937207 |1
(1 row)

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] LINE 1: IDENTIFY_SYSTEM error infinitum

2010-06-30 Thread Fujii Masao

On Thu, Jul 1, 2010 at 1:02 PM, Tom Lane  wrote:
> Fujii Masao  writes:
>> Hmm... you'd like to get the system identifier from the postgres
>> server via SQL rather than starting replication? If so, you can do
>> that by adding replication entry into pg_hba.conf and performing the
>> following
>
>>     $ psql replication=1 -c "IDENTIFY_SYSTEM"
>
> Cute, but all Zoid wants is to get his replication slave to start ;-)
> Can you figure out what's going wrong?  I'm wondering about some sort of
> incompatibility between various pre-beta states of the code, but don't
> have a clear answer.

I'm suspicious of that incompatibility, too. So reinstalling the postgres
9.0beta2 would fix the problem, I guess. Though I'm not convinced..

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries about PostgreSQL PITR

2010-07-11 Thread Fujii Masao

On Fri, Jul 9, 2010 at 6:47 PM, Jayadevan M
 wrote:
> So recovery happened to a point after I dropped the first table and before
> I dropped
> the second table. Why ?

Because you didn't disable recovery_target_inclusive, I guess.
http://www.postgresql.org/docs/8.4/static/continuous-archiving.html#RECOVERY-TARGET-INCLUSIVE

> Is there a way in which I can now go back a bit further, and ensure I am
> back to the
> time line before I dropped either of the tables? From documentation, I
> think the answer is 'No'.
> Of course, I could try the entire recovery process once more, and provide
> a couple of minutes
> earlier time as recovery_target_time.

How about setting recovery_target_timeline to the old timeline ID (= 1)?
http://www.postgresql.org/docs/8.4/static/continuous-archiving.html#RECOVERY-TARGET-TIMELINE

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries about PostgreSQL PITR

2010-07-12 Thread Fujii Masao

On Mon, Jul 12, 2010 at 5:29 PM, Jayadevan M
 wrote:
> Hi,
>>Because you didn't disable recovery_target_inclusive, I guess.
>>
> http://www.postgresql.org/docs/8.4/static/continuous-archiving.html#RECOVERY-TARGET-INCLUSIVE
> Thanks. I was almost sure this will fix it. But the issue seems to be
> something else. Even if I give a time that is a few more minutes before
> what I got from select now(), it is always moving upto/or just before
> (depending on the above parameter) transaction id 676. The ooutput reads
>  LOG:  recovery stopping before commit of transaction 676, time 2010-07-09
> 07:49:26.580518+05:30

A recovery stops when the commit time > or >= recovery_target_time.
So, unless it moves up to the newer commit than recovery_target_time,
it cannot stop.

> Is there a way to find out the transaction ids and corresponding SQLs,
> timeline etc? May be doing the recovery in debug/logging mode or something
> like that?

xlogviewer reads WAL files and displays the contents of them. But
it's been inactive for several years, so I'm not sure if it's available now.
http://pgfoundry.org/projects/xlogviewer/

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Application name and psql in 9.0

2010-08-05 Thread Fujii Masao

2010/8/6 Devrim GÜNDÜZ :
> Why can't I see psql there? Is it just because that logging is performed
> just before detecting application name?

Yes. The backend checks whether target database exists, before
processing application name.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg 9.0, streaming replication, fail over and fail back strategies

2010-08-09 Thread Fujii Masao

On Tue, Aug 10, 2010 at 7:10 AM, Kyle R. Burton  wrote:
> Is there any way to get PostgreSQL to bind to a new ip address and
> interface without actually shutting it down?  If it could, would I
> need to break all the current (read only) client connections to get
> them to reconnect and have the ability to write?  (am I confused about
> this?)

What about setting listen_addresses to '*'? If so, you would be able to
connect to new master as soon as VIP has been moved to it.

> Now that the master+slave configuration is up and running again, I'm
> looking for advice on how to monitor for faults: I can fail over
> manually, which is fine for now.  What aspects of the postgres system
> should be monitored to watch for faults and what are the kinds of
> faults that should lead to a fail over?  The machine crashing (OS/HW)
> is an obvious one, which will be recognized by corosync and I can
> script the initiation of failover (including using ipmi to power down
> the master).

Probably the crash of the postgres and corosync process, the trouble
of VIP, and network outage between the master and the client should be
monitored, I think. Since any of them prevents the master from running
queries from the client, we should cause a failover.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Checking for stale reads on hot standby

2010-09-26 Thread Fujii Masao

On Mon, Sep 27, 2010 at 9:09 AM, Gurjeet Singh  wrote:
> See the nuggets hidden in section 25.2.5.2. "Monitoring" at
> http://www.postgresql.org/docs/9.0/static/warm-standby.html#STREAMING-REPLICATION
>
> After an UPDATE, your application can cache the info from
> 'pg_current_xlog_location()' result on the primary and then compare that
> with the result of  'pg_last_xlog_receive_location()' on the standby to see
> if it is seeing fresh enough data.

Yep, but since recovery might fall behind WAL receiving,
pg_last_xlog_replay_location should be called instead of
pg_last_xlog_receive_location.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [9.0] hot standby plus streaming replication

2010-10-01 Thread Fujii Masao

On Fri, Oct 1, 2010 at 4:02 PM, Michele Petrazzo - Unipex
 wrote:
> - why in my tests, _whitout_ common direcotory, master and slave keep in
> sync also if I shutdown slave, add (in my last tests) something about
> 100k record (although little ones) on the master and then after woke up
> the slave in about 2/3 seconds I have all the dbs in sync?

Because the master had the WAL files containing that 100k record in its
pg_xlog directory. If those WAL files were unfortunately removed from
the master before you started the standby, the standby would not have
been in sync with the master.

You can specify how many WAL files you'd keep in the master by using
wal_keep_segments parameter.
http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html#RUNTIME-CONFIG-REPLICATION

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] streaming replication question

2010-10-04 Thread Fujii Masao

On Sun, Oct 3, 2010 at 1:02 AM, Rajesh Kumar Mallah
 wrote:
> As an enduser i setup SR based on
> http://wiki.postgresql.org/wiki/Streaming_Replication
> The master & standby are both powerful machines and are on same gigabit
> switch.
> Things worked as expected I updated 1000's of records in single txn and saw
> them
> appearing on the standby instantly.
>
> Then i wanted  to see the impact of shutting down the standby momentarily .
> This apparently
> failed with following messages in standby logs:
>
> DETAIL:  last completed transaction was at log time 2010-10-02
> 20:57:26.17677+05:30
> LOG:  restartpoint starting: time
> LOG:  received smart shutdown request
> FATAL:  terminating walreceiver process due to administrator command
> LOG:  restartpoint complete: wrote 1568 buffers (0.3%); write=146.237 s,
> sync=0.251 s, total=146.489 s
> LOG:  recovery restart point at 25/EA87BA18
> DETAIL:  last completed transaction was at log time 2010-10-02
> 21:02:21.89303+05:30
> LOG:  shutting down <--- - - - - - - - - - - - - -- - -  shutdown command
> issued here
> LOG:  database system is shut down
> LOG:  database system was shut down in recovery at 2010-10-02 21:02:22 IST
> < first attempt to start was made
> LOG:  entering standby mode
> LOG:  restored log file "0001002500EA" from archive
> LOG:  redo starts at 25/EA87BA18
> FATAL:  too many KnownAssignedXids
> CONTEXT:  xlog redo insert: rel 1663/16399/303892827; tid 1503/119
> LOG:  startup process (PID 20527) exited with exit code 1
> LOG:  terminating any other active server processes
>
> LOG:  database system was interrupted while in recovery at log time
> 2010-10-02 20:57:12 IST <--- second attempt was made.
> HINT:  If this has occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target.
> LOG:  entering standby mode
> LOG:  restored log file "0001002500EA" from archive
> LOG:  redo starts at 25/EA87BA18
> FATAL:  too many KnownAssignedXids
> CONTEXT:  xlog redo insert: rel 1663/16399/303892827; tid 1503/119
> LOG:  startup process (PID 20693) exited with exit code 1
> LOG:  terminating any other active server processes

This looks like the bug of HS. But I'm not sure why KnownAssignedXids
overflowed.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] streaming replication question

2010-10-04 Thread Fujii Masao

On Tue, Oct 5, 2010 at 2:57 AM, Rajesh Kumar Mallah
 wrote:
> I am currently aiming to setup only SR  between 2 servers only.

"only SR" means that you don't need Hot Standby (i.e., you don't need to
run any query on the standby server)? If so, you can set wal_level to
archive instead of hot_standby, and disable hot_standby parameter on the
standby.

> My question is in SR setup do i need to keep transferring the archived WAL
> files also ?

No if wal_keep_segments is high enough that WAL files required for the standby
can be saved in pg_xlog directory of the master.

> when the slave requests WAL records from the master does the master consult
> the archived
> wal files also for sending the records ?

No. In 9.0, the master doesn't read and send the archived WAL files.
But I agree to make the master send the archived WAL files, and I'm
now proposing that.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg02040.php

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] streaming replication question

2010-10-05 Thread Fujii Masao

On Tue, Oct 5, 2010 at 4:04 PM, Rajesh Kumar Mallah
 wrote:
>> > I am currently aiming to setup only SR  between 2 servers only.
>>
>> "only SR" means that you don't need Hot Standby (i.e., you don't need to
>> run any query on the standby server)? If so, you can set wal_level to
>> archive instead of hot_standby, and disable hot_standby parameter on the
>> standby.
>
> Thanks for the kind replies,
> May be I am mistaken in understanding of the terms. I do issue read only
> queries to the standby server. So I think its HS. But I also stream the
> wal records.

SR is capability to stream WAL records from the master to the standby and
keep the database on the standby up to date by applying the WAL records.
HS is capability to allow us to run read-only queries on the standby.
You seem to need both SR and HS. So you don't need to change wal_level and
hot_standby parameters.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] How to analyze the output of debug_print_plan

2008-08-13 Thread Fujii Masao

Hi,

In order to check the plan of PreparedStatement,
I'm going to analyze the output of debug_print_plan.
But, since its output is very complicated, it's difficult
to analyze it.

Please let me know the method and tool which analyze it
easily.

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Switch off PITR

2008-12-03 Thread Fujii Masao

On Thu, Dec 4, 2008 at 3:00 AM, Joey K. <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I have been doing PITR backups for a while to a backup server via NFS. It's
> working great.
>
> I have to shutdown the backup server for hardware/OS upgrade. I expect this
> upgrade will last a week.
>
> How do I turn off PITR in the mean time? I commented archive_command and
> issued a pg_ctl reload and postgres is *still* archiving logs to the backup
> server.

Unfortunately there is delay in the archiver actually reloading the
configuration
change. If there are 10 .ready files in pg_xlog/archive_status, you might have
to wait for the reloading until the archiver finish archiving 10 WAL files.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] in transaction - safest way to kill

2008-12-05 Thread Fujii Masao

On Fri, Dec 5, 2008 at 11:25 PM, Glyn Astill <[EMAIL PROTECTED]> wrote:
>
> select pg_cancel_backend();

No, pg_cancel_backend() cancels only *query*, and doesn't kill idle
in transaction. I think that killing the backend (idle in transaction) with
SIGTERM is better way.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] timestams in the the pg_standby output

2010-01-04 Thread Fujii Masao

On Tue, Jan 5, 2010 at 11:20 AM, Tim Uckun  wrote:
> Is there a way to get pg_standby to put timestamps in the output it
> generates?

No.

> I am currently piping the output to a log fie but since it
> contains no timestamps it's of limited use to me.

You can create the script which adds the timestamp into the head of
the output by using awk, perl, etc.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] WAL archiving is stuck on an old file that was deleted -- how to get it going again? (8.4.2)

2010-01-06 Thread Fujii Masao

On Thu, Jan 7, 2010 at 11:29 AM, Alvaro Herrera
 wrote:
> Aleksey Tsalolikhin escribió:
>
>> I do have a cron job that cleans files older than 2 days out of the
>> pg_xlog directory;
>
> Bad, bad idea.  Get rid of that.  Perfect way to corrupt your system.
> Postgres removes pg_xlog files automatically when they are no longer
> necessary.  If it doesn't remove them, something is happening and you
> need to fix *that*.  Deleting files by hand only works around the
> wasted-disk-space symptom in a bad way.

Completely agreed.

>> How do I get Postgres to stop trying to rsync
>> 00010035006E, and to do rsync all the WAL files that ARE
>> there?
>
> You're screwed.  You need to get a new base backup; all the files
> you have archived previous to 00010035006E are useless.
>
> You can get out of the problem by creating a dummy file with that name
> in pg_xlog, but keep in mind that the archive is now completely useless
> and unrecoverable.

Or remove pg_xlog/archive_status/00010035006E.ready instead
of creating a dummy file. Postgres tries to archive the WAL files whose
.ready file exists in archive_status directory.

And, note that you must get out of the archiving problem *before* making
a new base backup because pg_stop_backup() waits until the last WAL file
filled during backup has been archived. Otherwise, pg_stop_backup() would
get stuck.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] equivalent to "replication_timeout" on standby server

2011-11-03 Thread Fujii Masao

On Thu, Nov 3, 2011 at 12:25 AM, Samba  wrote:
> The postgres manual explains the "replication_timeout" to be used to
>
> "Terminate replication connections that are inactive longer than the
> specified number of milliseconds. This is useful for the primary server to
> detect a standby crash or network outage"
>
> Is there a similar configuration parameter that helps the WAL receiver
> processes to terminate the idle connections on the standby servers?

No.

But setting keepalive libpq parameters in primary_conninfo might be useful
to detect the termination of connection from the standby server.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] equivalent to "replication_timeout" on standby server

2011-11-06 Thread Fujii Masao

On Fri, Nov 4, 2011 at 10:58 PM, Samba  wrote:
> although both master(with replication_timeout)  and slave (with tcp timeout
> option in primary_conninfo parameter) closes the connection in quick time
> (based on tcp idle connection  timeout), as of now they do not log such
> information. It would be really helpful if such disconnects are logged with
> appropriate severity so that the problem can identified early and help in
> keeping track of patterns and history of such issues.

Oh, really? Unless I'm missing something, when replication timeout happens,
the following log message would be logged in the master:

terminating walsender process due to replication timeout

OTOH, something like the following would be logged in the standby:

could not receive data from WAL stream..

> Presently, neither master nor standby server attempts to resume streaming
> replication when they happen to see each other after some prolonged
> disconnect. It would be better if either master or slave or both the servers
> makes periodic checks to find if the other is reachable and resume the
> replication( if possible, or else log the message that a full sync may be
> required).

The standby periodically tries reconnecting to the master after it detects
the termination of replication connection. So even after prolonged disconnect,
replication can automatically resume.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Streaming Replication woes

2011-11-06 Thread Fujii Masao

On Mon, Nov 7, 2011 at 8:41 AM, Konstantin Gredeskoul  wrote:
> Does that mean "postgres" superuser is no longer able to also be a
> replication user in 9.1?  If so this is not yet updated on the
> Replication wiki.

No. Superuser has the replication privilege by default. But if you granted
the "postgres" user the superuser privilege by using ALTER ROLE, the
replication privilege would not be granted.

You can check whether the "postgres" user has the replication privilege
by executing "\du" command.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Streaming Replication woes

2011-11-08 Thread Fujii Masao

On Tue, Nov 8, 2011 at 3:45 AM, Konstantin Gredeskoul  wrote:
> The user 'postgres' was created using standard database installation
> procedure.  It has superuser, but does not include an explicit replication
> role:
>
> my_db=# \du
>                       List of roles
> Role name  |            Attributes             | Member of
> +---+---
> postgres   | Superuser, Create role, Create DB | {}
> replicator | Replication                       | {}

Did you restore the database from the dump file created by pg_dumpall
in 9.0 instead of 9.1? If yes, that dump file would contain the "ALTER ROLE
postgres" command and revoke the replication privilege, I guess.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] recovery_target_timeline and multiple slave behavior when master fails

2011-12-18 Thread Fujii Masao

On Fri, Dec 16, 2011 at 3:59 AM, Rick Pufky  wrote:
> Any thoughts on the above snippets? Am I interpreting the documentation
> correctly? Is there any further information needed to debug this?

You need to share the archive directory between all three nodes to use that
trick.

To follow the timeline change that occurs at failover to another standby,
the standby needs to read the timeline history file. This file is created and
archived at failover by new master (i.e., another standby). This file is not
shipped via replication, so the standby needs to read it from the archive.
So you need to have the shared archive directory.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [ADMIN] PG synchronous replication and unresponsive slave

2012-01-16 Thread Fujii Masao

On Tue, Jan 17, 2012 at 3:51 AM, Manoj Govindassamy
 wrote:
>>> 1. Transaction which was stuck right when slave going away never went
>>> thru even after I reloaded master's config with local commit on. I do see
>>> all new transactions on master are going thru fine, except the one which was
>>> stuck initially. How to get this stuck transaction complete or return with
>>> error.

Changing synchronous_commit doesn't affect such a transaction. Instead,
empty synchronous_standby_names and reload the configuration file to
resume that transaction.

>>> 2. Whenever there is a problem with slave, I have to manually reload
>>> master's config with local commit turned on to get master go forward. Is
>>> there any automated way to reload this config with local commit on on
>>> slave's unresponsiveness ? tcp connection timeouts, replication timeouts all
>>> detect the failures, but i want to run some corrective action on these
>>> failure detection.

PostgreSQL doesn't have such a capability, but pgpool-II might have.
Can you ask that in pgpool-II mailing-list?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Does Version 9.1 Streaming Replication Supports Multi-Master?

2012-01-17 Thread Fujii Masao

On Wed, Jan 18, 2012 at 3:09 AM, Jerry Richards
 wrote:
> I know PostgreSQL version 9.1 supports  built-in streaming replication.
> Just wondering if that supports only a single-master or also multi-master
> implementation?

Only a single-master. If you want a multi-master solution, see Postgres-XC.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [ADMIN] PG synchronous replication and unresponsive slave

2012-01-17 Thread Fujii Masao

On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy
 wrote:
> (2) We are not comfortable moving to PGPool just for automatic failback mode
> on hot-standby failure.

Hmm.. my reply might be misleading. What I meant was to use pgpool-II
as a clusterware for PostgreSQL built-in replication, not as a replication
itself. You can health-check, do failover if necessary and manage the
PostgreSQL replication by using pgpool-II. AFAIK pgpool-II has such an
operation mode. But you are still not comfortable in using pgpool-II in
that way?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [ADMIN] PG synchronous replication and unresponsive slave

2012-01-17 Thread Fujii Masao

On Wed, Jan 18, 2012 at 10:54 AM, Manoj Govindassamy
 wrote:
> I am aware of pgpool-II and its features. Just that my requirements are
> little different.  I have a System (PG runs on it) which already has
> Failover mechanism to another System and I want PG to be part of this
> cluster and not clustered on its own. Mean, PG has to be running in Master
> system and in synchronous replication mode with another slave system, but
> the failover is driven from the higher level and not just on PG's failure.
>
> So, whenever PG's slave node is unresponsive, we better let the replication
> cutoff and run the master system independently. So, we need better mechanism
> to detect when Master PG's synchronous replication not working as expected
> or when the slave PG is going unresponsive.  If not, master PG is held back
> by the slave PG and so the whole clustered system is stuck. Hope, I am
> making some sense here. Let me know if there are easy ways to detect Master
> PG's replication not working (via libpq would be more preferable).

You can detect that by checking whether information about synchronous
standby is still in pg_stat_replication or not. But I have no good idea about
the way to automatically run some action like reload of the configuration file
on the failure detection. Maybe you need to implement that on your own...

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] synchronous replication: blocking commit on the master

2012-02-28 Thread Fujii Masao

On Wed, Feb 29, 2012 at 3:22 AM, Jameison Martin  wrote:
> i don't think i've explained things very clearly. the implied contradiction
> is that i'd be using asynchronous replication to catch up a slave after a
> slave failure and thus i'm losing the transactional consistency that i
> suggest i need.  if a slave fails and is brought back on line i am indeed
> proposing that it catch up with the master asynchronously; however,  the
> slave wouldn't be promoted to a hot standby until it is completely caught up
> and could be reestablished as a synchronous replica (at least that is what
> i'd like to do in theory). so i'm proposing that a slave would never be a
> candidate for a HA failover unless it is completely in sync with a master:
> if there is no slave that is in sync with the master at the time the master
> fails, then the master would have to be recovered from the filesystem via
> traditional recovery. the fact that i envision 'catching up' a slave to a
> master using asychronous replication is not particularly relevant to the
> transactional guarantees of the system as a whole if the slave is
> effectively unavailable while catching up.
>
> similarly, any slave that isn't caught up to its master would also not be
> eligible for queries.
>
> i can understand why the master might hang when there is no reachable
> replica during synchronous commit, this is exactly the right thing to do if
> you want to guarantee that you have at least 2 distinct spheres of
> durability. but i'd prefer to sacrifice the extra durability guarantee in
> favor of availability in this case given that recovery from the file system
> is still an option should the master subsequently fail. my availability
> issue is that the master would clearly be hung/unavailable for an unbounded
> amount of time without a strong guarantee about the time it might take to
> bring a replica back up which is not acceptable in my case.
>
> if the master hangs commits because there is no active slave, i believe that
> an administrator would have to
>
> detect that there are no active slaves
> shut the master down
> disable synchronous replication
> bring the master back up

You don't need to restart the server when you disable sync replication.
You can do that by emptying synchronous_standby_names in postgresql.conf
and reloading it (i.e., pg_ctl reload).

BTW, though you can disable sync replication by setting synchronous_commit
to local in postgresql.conf, you should use synchronous_standby_names for
that purpose instead. Setting synchronous_commit to local can prevent new
transactions (which are executed after setting synchronous_commit to local)
from being blocked, but cannot resume the already-blocking transactions.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] restore_command is not running on my standby

2012-03-13 Thread Fujii Masao

On Wed, Mar 14, 2012 at 11:07 AM, Joseph Shraibman  wrote:
> I have twice set up pg hot standbys ala the docs at
> http://www.postgresql.org/docs/9.1/interactive/hot-standby.html
>
> The third time I'm trying this I'm running into trouble.  The first two
> times were with actual servers.  This time I'm trying to set up two pg
> instances on my desktop for testing.
>
> First I create the secondary:
>
> [jks@jks-desktop ~/work/pgmon]{f15}$ time pg_basebackup -D repl-db -P -h
> localhost -U replicator
> 59303/59303 kB (100%), 1/1 tablespace
> NOTICE:  pg_stop_backup complete, all required WAL segments have been
> archived
>
> real    0m1.725s
> user    0m0.061s
> sys     0m0.265s
>
> Then I copy in recovery.conf and the replacement postgresql.conf and try to
> start up.  I get:
>
>
> LOG:  database system was interrupted; last known up at 2012-03-13 21:29:32
> EDT
> LOG:  could not open file "pg_xlog/0001003D" (log file 0,
> segment 61): No such file or directory
> LOG:  invalid checkpoint record
> FATAL:  could not locate required checkpoint record
> HINT:  If you are not restoring from a backup, try removing the file
> "/home/jks/work/pgmon/repl-db/backup_label".
> LOG:  startup process (PID 28220) exited with exit code 1
> LOG:  aborting startup due to startup process failure
>
> Now the file 0001003D does exist in the archive directory,
> but it appears that restore_command is not being run. Originally it was:
> 'cp /home/jks/work/pgmon/wal_drop/%f %p'
>
> Then I changed it to:
>
> restore_command = 'echo f %f p %p >> /tmp/rc.log ; cp
> /home/jks/work/pgmon/wal_drop/%f %p'
>
> /tmp/rc.log was never created, so I assume the whole thing isn't being run
> for some reason.  Any clues where I should look?

Confirm that recovery.conf is properly located under the data directory.
If recovery.conf is located properly and standby_mode is enabled there,
you should get the following log message at the start of recovery:

LOG:  entering standby mode

But you didn't get such message. So I guess that PostgreSQL failed to read
recovery.conf and could not run restore_command because of wrong location
of recovery.conf.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?

2012-04-10 Thread Fujii Masao

On Mon, Apr 9, 2012 at 7:33 PM, 乔志强  wrote:
> Question:
> Why the master deletes the WAL segment before send to standby in synchronous 
> mode?

Otherwise the master might be filled up with lots of unsent WAL files and
which might cause PANIC error in the master, when there is no standby.
IOW, the master tries to avoid a PANIC error rather than termination of
replication.

> It is a streaming replication bug ?

No. It's intentional.

> If use synchronous_standby_names for sync standby, if no online standby, all 
> commit will delay to standby connect to master,
> So wal_keep_segments is only for offline async standby actually.

What if synchronous_commit is set to local or async?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?

2012-04-10 Thread Fujii Masao

On Wed, Apr 11, 2012 at 10:06 AM, 乔志强  wrote:
> synchronous_commit is not set, default is "on" ?
> #synchronous_commit = on                # synchronization level; on, off, or 
> local

Yes.

>>Otherwise the master might be filled up with lots of unsent WAL files and 
>>which might cause PANIC error in the master, when there is no standby.
>>IOW, the master tries to avoid a PANIC error rather than termination of 
>>replication.
>
> Can we have a config option for keep unsent WAL file for replication ?

No. We discussed about such feature before, but it had failed to be committed.
I think it's useful, so I hope it'll be usable in the future release.

> How can I do when I need a backup standby server and
>    wal_keep_segments = 3 for save master disk usage(master will delete wal 
> before send to standby now when heavy load, Need modify some config?)

Yes, increase wal_keep_segments. Even if you set wal_keep_segments to 64,
the amount of disk space for WAL files is only 1GB, so there is no need to worry
so much, I think. No?

> #checkpoint_segments = 3                # in logfile segments, min 1, 16MB 
> each

Increase checkpoint_segments. In this setting, I guess checkpoints run too
frequently in heavy load, and WAL files are removed too aggressively.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 9.1.3 Standby catchup mode

2012-04-10 Thread Fujii Masao

On Fri, Apr 6, 2012 at 1:35 AM, hans wulf  wrote:
> I am wondering how the catchup mode of a hot synchron slave server works on 
> 9.1.3 if there is no WAL archive.
>
> Can the slave only request WALs that are still in the xlog directory of the 
> master server? Or does the master regenerate some kind of fake log for the 
> catchup mode?

No. If the WAL file which the standby requests doesn't exist in the
pg_xlog directory
of the master, replication just fails. In this case, you need to take
a fresh base backup and
start the standby from that backup.

> E.g. in case of a slave failure I could use a weekly backup and let the 
> catchup mode do the rest? Or does that only work if you use WAL archive?

Or increase wal_keep_segments to high so that all WAL files which the
standby requests
are guaranteed to exist in the pg_xlog directory of the master.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?

2012-04-11 Thread Fujii Masao

On Wed, Apr 11, 2012 at 3:31 PM, 乔志强  wrote:
> So in sync streaming replication, if master delete WAL before sent to the 
> only standby, all transaction will fail forever,
> "the master tries to avoid a PANIC error rather than termination of 
> replication." but in sync replication, termination of replication is THE 
> bigger PANIC error.

I see your point. When there are backends waiting for replication, the WAL files
which the standby might not have received yet must not be removed. If they are
removed, replication keeps failing forever because required WAL files don't
exist in the master, and then waiting backends will never be released unless
replication mode is changed to async. This should be avoided.

To fix this issue, we should prevent the master from deleting the WAL files
including the minimum waiting LSN or bigger ones. I'll think more and implement
the patch.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?

2012-04-11 Thread Fujii Masao

On Thu, Apr 12, 2012 at 12:56 AM, Fujii Masao  wrote:
> On Wed, Apr 11, 2012 at 3:31 PM, 乔志强  wrote:
>> So in sync streaming replication, if master delete WAL before sent to the 
>> only standby, all transaction will fail forever,
>> "the master tries to avoid a PANIC error rather than termination of 
>> replication." but in sync replication, termination of replication is THE 
>> bigger PANIC error.
>
> I see your point. When there are backends waiting for replication, the WAL 
> files
> which the standby might not have received yet must not be removed. If they are
> removed, replication keeps failing forever because required WAL files don't
> exist in the master, and then waiting backends will never be released unless
> replication mode is changed to async. This should be avoided.

On second thought, we can avoid the issue by just increasing
wal_keep_segments enough. Even if the issue happens and some backends
get stuck to wait for replication, we can release them by taking fresh backup
and restarting the standby from that backup. This is the basic procedure to
restart replication after replication is terminated because required WAL files
are removed from the master. So this issue might not be worth implementing
the patch for now (though I'm not against improving things in the future), but
it seems just a tuning-problem of wal_keep_segments.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?

2012-04-12 Thread Fujii Masao

On Thu, Apr 12, 2012 at 4:09 AM, Michael Nolan  wrote:
> -- Forwarded message --
> From: Michael Nolan 
> Date: Wed, 11 Apr 2012 14:48:18 -0400
> Subject: Re: [HACKERS] [GENERAL] [streaming replication] 9.1.3
> streaming replication bug ?
> To: Robert Haas 
>
> On Wed, Apr 11, 2012 at 2:14 PM, Robert Haas  wrote:
>
>>
>>
>> We've talked about teaching the master to keep track of how far back
>> all of its known standbys are, and retaining WAL back to that specific
>> point, rather than the shotgun approach that is wal_keep_segments.
>> It's not exactly clear what the interface to that should look like,
>> though.
>>
>>
> Moreover, how does the database decide when to drop a known standby from
> the queue because it has failed or the DBA notify the database that a
> particular standby should no longer be included?

Probably the latter. So as Robert pointed out, we need neat API to register
and drop the standby. Though I have no good idea about this..

BTW, I have another idea about wal_keep_segments problem.
http://archives.postgresql.org/message-id/AANLkTinN=xspooaxzvfsp1okfmdab1f_d-f91xjez...@mail.gmail.com

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pgstat wait timeout

2012-04-20 Thread Fujii Masao

On Thu, Apr 19, 2012 at 3:32 AM, Efraín Déctor
 wrote:
> Hello list:
>
> Today I started to see this messages on the PostgreSQL log:
>
> 2012-04-18 00:01:05 UTC : @  :WARNING:  01000: pgstat wait timeout
> 2012-04-18 00:01:05 UTC : @  :LOCATION:  backend_read_statsfile,
> pgstat.c:3807
>
> I searched and to fix it it was recommended to disable autovacuum, I did it
> and it worked, but how can I fix it without disabling autovacuum?.

This might be alleviated by setting stats_temp_directory to point to a ramdisk.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Streaming replication: sequences on slave seemingly ahead of sequences on master

2012-05-07 Thread Fujii Masao

On Mon, May 7, 2012 at 10:33 PM, Tom Lane  wrote:
> Simon Riggs  writes:
>> On 7 May 2012 09:01, Vincent de Phily  
>> wrote:
>>> Would be nice to see it added to the documentation (unless I just didn't 
>>> find
>>> it ?), as it is quite surprising, and might lead to problems if people 
>>> expect
>>> to be able to read sequence values from the slave.
>
>> If you think so, please submit a patch. That's how it works here.
>
> Any documentation patch should be written by somebody who's actually
> researched the behavior a bit; in particular I believe this can be
> adjusted with the sequence CACHE setting.

No. That behavior is caused by the hard-coded value SEQ_LOG_VALS
(= 32 in sequence.c) rather than CACHE setting.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] tcp_keepalives_idle setting

2008-12-10 Thread Fujii Masao

Hi,

On Wed, Dec 10, 2008 at 7:26 PM, Maximilian Tyrtania
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> recently Tom Lane wrote:
>
> "henry" writes:
>>> I have tried setting tcp_keepalives_idle = 120 (eg), then restarting PG,
>>> but SHOW ALL; shows tcp_keepalives_idle=0 (ignoring my setting).
>
>>Just FYI, this is the expected behavior on platforms where the kernel
>>doesn't allow adjustment of the TCP keepalive parameters from
>>userspace.
>
> I'm running pg 8.3.3 on Mac OS 10.4.11 and I'm seeing the same behaviour as
> Henry.
> I'm trying to find out if Mac OS belongs to those platforms that doesn't
> allow adjustment of the TCP keepalive parameters from userspace, and if so,
> how i can change the systems settings as root.

Do you use TCP/IP socket when you execute SHOW ALL? Those parameters
are ignored via Unix-domain socket.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] New 8.4 hot standby feature

2009-01-27 Thread Fujii Masao

Hi,

On Wed, Jan 28, 2009 at 4:28 AM, Gabi Julien  wrote:
> Yes, the logs are shipped every minute but the recevory is 3 or 4 times
> longer.

Are you disabling full_page_writes? It may slow down recovery several times.

> Thanks I will take a look at it. Also, I came across the record log shipping
> feature too in my research:
>
> http://www.postgresql.org/docs/current/static/warm-standby.html#WARM-STANDBY-RECORD
>
> Could this help? If the logs are smaller then I could potentially afford
> shipping then at a higher frequency.

No. Even if the logs are shipped frequently, they cannot be applied until
the log file fills.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] New 8.4 hot standby feature

2009-01-28 Thread Fujii Masao

Hi,

On Thu, Jan 29, 2009 at 12:23 AM, Jason Long
 wrote:
> Is pg_clearxlogtail going to be in contrib or integrated in some other way?

I also hope so. The related topic was discussed before.
http://archives.postgresql.org/pgsql-hackers/2009-01/msg00639.php

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] speaking of 8.4...

2009-02-24 Thread Fujii Masao

Hi,

On Tue, Feb 24, 2009 at 5:16 AM, John R Pierce  wrote:
> is it looking like the simple replication will make it into 8.4?

You mean the built-in synchronous replication feature? If so, no.
It was decided that synch-rep will be postponed to 8.5.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PITR - warm standby switchover question

2009-04-15 Thread Fujii Masao

Hi,

On Wed, Apr 15, 2009 at 9:23 AM, Dan Hayes  wrote:
> Excellent!  Thanks.  One other quick question...  What would happen if I
> didn't delete the recovery.conf file?  Is that step just to prevent
> accidentally restarting the server with it there?

recovery.conf is automatically renamed recovery.done by
postgres at the end of recovery, which would prevent
accidentally restarting warm-standby. So, you don't need to
delete or rename recovery.conf by hand after recovery.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

87 matches

Mail list logo