Re: [GENERAL] WAL receive process dies

Chris Hundt Sat, 30 Aug 2014 07:05:54 -0700

(One of Patrick's coworkers here; thanks a lot for your assistance)

Just in case you wanted this as well, I ran


psql 'replication=1 dbname=XXX host=127.0.0.1 port=5432 user=XXX
password=XXX' -c 'IDENTIFY_SYSTEM;'

(5432 is the stuck replica and 5445 is the pipe to the working replica)

      systemid       | timeline |   xlogpos
---------------------+----------+--------------
 5964163898407843711 |        1 | 174/B76D16A8


On Fri, Aug 29, 2014 at 7:26 PM, Patrick Krecker <patr...@judicata.com>
wrote:

>
>
>
> On Fri, Aug 29, 2014 at 3:46 PM, Andres Freund <and...@2ndquadrant.com>
> wrote:
>
>> [FWIW: proper quoting makes answering easier and thus more likely]
>>
>> On 2014-08-29 15:37:51 -0700, Patrick Krecker wrote:
>> > I ran the following on the local endpoint of spiped:
>> >
>> > while [ true ]; do psql -h localhost -p 5445 judicata -U marbury -c
>> "select
>> > modtime, pg_last_xlog_receive_location(), pg_last_xlog_replay_location()
>> > from replication_time";  done;
>> >
>> > And the same command on production and I was able to verify that the
>> xlogs
>> > for a given point in time were the same (modtime is updated every
>> second by
>> > an upstart job):
>> >
>> > spiped from office -> production:
>> >           modtime           | pg_last_xlog_receive_location |
>> > pg_last_xlog_replay_location
>> >
>> ----------------------------+-------------------------------+------------------------------
>> >  2014-08-29 15:23:25.563766 | 177/2E80C9F8                  |
>> 177/2E80C9F8
>> >
>> > Ran directly on production replica:
>> >           modtime           | pg_last_xlog_receive_location |
>> > pg_last_xlog_replay_location
>> >
>> ----------------------------+-------------------------------+------------------------------
>> >  2014-08-29 15:23:25.563766 | 177/2E80C9F8                  |
>> 177/2E80C9F8
>> >
>> > To me, this is sufficient proof that spiped is indeed talking to the
>> > machine I think it's talking to (also lsof reports the correct
>> hostname).
>> >
>> > I created another basebackup from the currently stuck postgres intance
>> on
>> > another machine and I also get this error:
>> >
>> > 2014-08-29 15:27:30 PDT FATAL:  could not receive data from WAL stream:
>> > ERROR:  requested starting point 177/2D000000 is ahead of the WAL flush
>> > position of this server 174/B76D16A8
>>
>> Uh. this indicates that the machine you're talking to is *not* one of
>> the above as it has a flush position of '174/B76D16A8' - not something
>> that's really possible when the node actually is at '177/2E80C9F8'.
>>
>> Could you run, on the standby that's having problems, the following
>> command:
>> psql 'host=127.0.0.1 port=5445 user=XXX password=XXX' -c
>> 'IDENTIFY_SYSTEM;'
>>
>> Greetings,
>>
>> Andres Freund
>>
>> --
>>  Andres Freund                     http://www.2ndQuadrant.com/
>>  PostgreSQL Development, 24x7 Support, Training & Services
>>
>
> RE: quoting, I wonder if Gmail is messing it up somehow? Or am I doing
> something else wrong? Sorry :(
>
>  First, I apologize for the misleading information, but when I made
> another basebackup and tried to use it, I configured the machine to cascade
> from the stuck replica, *not* from the spiped endpoint. When I properly
> connected it to the spiped endpoint it synced up fine, giving this log line:
>
> 2014-08-29 16:16:21 PDT LOG:  started streaming WAL from primary at
> 177/4F000000 on timeline 1
>
> The command as you gave reported a syntax error as is, but I googled a
> little bit and run this one:
>
> psql 'replication=1 dbname=XXX host=127.0.0.1 port=5445 user=XXX
> password=XXX' -c 'IDENTIFY_SYSTEM;'
>
> And it gave me this output:
>
>       systemid       | timeline |   xlogpos
> ---------------------+----------+--------------
>  5964163898407843711 |        1 | 177/53091990
>

Re: [GENERAL] WAL receive process dies

Reply via email to