On 25.09.2012 14:10, Amit Kapila wrote:
On Tuesday, September 25, 2012 12:39 PM Heikki Linnakangas wrote:
On 24.09.2012 16:33, Amit Kapila wrote:
On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote:
I've been working on the often-requested feature to handle timeline
changes over streaming replication. At the moment, if you kill the
master and promote a standby server, and you have another standby
server that you'd like to keep following the new master server, you
need a WAL archive in addition to streaming replication to make it
cross the timeline change. Streaming replication will just error
out.
Having a WAL archive is usually a good idea in complex replication
scenarios anyway, but it would be good to not require it.
Confirm my understanding of this feature:
This feature is for case when standby-1 who is going to be promoted
to
master has archive mode 'on'.
No. This is for the case where there is no WAL archive.
archive_mode='off' on all servers.
Or to be precise, you can also have a WAL archive, but this patch
doesn't affect that in any way. This is strictly about streaming
replication.
As in that case only its timeline will change.
The timeline changes whenever you promote a standby. It's not related
to
whether you have a WAL archive or not.
Yes that is correct. I thought timeline change happens only when somebody
does PITR.
Can you please tell me why we change timeline after promotion, because the
original
Timeline concept was for PITR and I am not able to trace from code the
reason
why on promotion it is required?
Bumping the timeline helps to avoid confusion if, for example, the
master crashes, and the standby isn't fully in sync with it. In that
situation, there are some WAL records in the master that are not in the
standby, so promoting the standby is effectively the same as doing PITR.
If you promote the standby, and later try to turn the old master into a
standby server that connects to the new master, things will go wrong.
Assigning the new master a new timeline ID helps the system and the
administrator to notice that.
It's not bulletproof, for example you can easily avoid the timeline
change if you just remove recovery.conf and restart the server, but the
timelines help to manage such situations.
If above is right, then there can be other similar scenario's where
it can
be used:
Scenario-1 (1 Master, 1 Stand-by)
1. Master (archive_mode=on) goes down.
2. Master again comes up
3. Stand-by tries to follow it
Now in above scenario also due to timeline mismatch it gives error,
but your
patch should fix it.
If the master simply crashes or is shut down, and then restarted, the
timeline doesn't change. The standby will reconnect / poll the archive,
and sync up just fine, even without this patch.
How about when Master does PITR when it comes again?
Then the timeline will be bumped and this patch will be helpful.
Assuming the standby is behind the point in time that the master was
recovered to, it will be able to follow the master to the new timeline.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers