On 23 March 2012 14:03, Fujii Masao <masao.fu...@gmail.com> wrote: > On Thu, Mar 22, 2012 at 12:56 AM, Robert Haas <robertmh...@gmail.com> wrote: >> On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao <masao.fu...@gmail.com> wrote: >>> Hi, >>> >>> In streaming replication, after failover, new master might have lots >>> of un-applied >>> WAL files with old timeline ID. They are the WAL files which were recycled >>> as a >>> future ones when the server was running as a standby. Since they will never >>> be >>> used later, they don't need to be archived after failover. But since they >>> have >>> neither .ready nor .done file in archive_status, checkpoints after >>> failover newly >>> create .reacy files for them, and then finally they are archived. >>> Which might cause >>> disk I/O spike both in WAL and archive storage. >>> >>> To avoid the above problem, I think that un-applied WAL files with old >>> timeline ID >>> should be marked as already-archived and recycled immediately at the end of >>> recovery. Thought? >> >> I'm not an expert on this, but that makes sense to me. > > Thanks for agreeing with my idea. > > On second thought, I found other issues about WAL archiving after > failover. So let me clarify the issues again. > > Just after failover, there can be three kinds of WAL files in new > master's pg_xlog directory: > > (1) WAL files which were recycled to by restartpoint > > I've already explained upthread the issue which these WAL files cause > after failover.
This might be a problem, or it might be archiving important data and you have a corrupt WAL file/CRC. I'd rather take the hit than to delete potentially useful data. And it avoids having a bug that deletes useful segments also. > (2) WAL files which were restored from the archive > > In 9.1 or before, the restored WAL files don't remain after failover > because they are always restored onto the temporary filename > "RECOVERYXLOG". So the issue which I explain from now doesn't exist > in 9.1 or before. > > In 9.2dev, as the result of supporting cascade replication, > an archived WAL file is restored onto correct file name so that > cascading walsender can send it to another standby. This restored > WAL file has neither .ready nor .done archive status file. After > failover, checkpoint checks the archive status file of the restored > WAL file to attempt to recycle it, finds that it has neither .ready > nor ,done, and creates .ready. Because of existence of .ready, > it will be archived again even though it obviously already exists in > the archival storage :( > > To prevent a restored WAL file from being archived again, I think > that .done should be created whenever WAL file is successfully > restored (of course this should happen only when archive_mode is > enabled). Thought? Agreed > Since this is the oversight of cascade replication, I'm thinking to > implement the patch for 9.2dev. Very much so. > (3) WAL files which were streamed from the master > > These WAL files also don't have any archive status, so checkpoint > creates .ready for them after failover. And then, all or many of > them will be archived at a time, which would cause I/O spike on > both WAL and archival storage. > > To avoid this problem, I think that we should change walreceiver > so that it creates .ready as soon as it completes the WAL file. Also > we should change the archiver process so that it starts up even in > standby mode and archives the WAL files. > > If each server has its own archival storage, the above solution would > work fine. But if all servers share the archival storage, multiple archiver > processes in those servers might archive the same WAL file to > the shared area at the same time. Is this OK? If not, to avoid this, > we might need to separate archive_mode into two: one for normal mode > (i.e., master), another for standbfy mode. If the archive is shared, > we can ensure that only one archiver in the master copies the WAL file > at the same time by disabling WAL archiving in standby mode but > enabling it in normal mode. Thought? Use %s as an option to be passed to the archive command. > Invoking the archiver process in standby mode is new feature, > not a bug fix. It's too late to propose new feature for 9.2. So I'll > propose this for 9.3. Yep, good idea. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers