Hi,

On Sat, Apr 11, 2009 at 1:31 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
>
> Fujii-san,
>
> I like the new patch using the content of the file to determine the
> mode. Much easier to use at failover time.
>
> On Fri, 2009-04-10 at 12:47 +0900, Fujii Masao wrote:
>
>> > One problem with this patch is that in smart mode, the trigger file is not
>> > deleted. That's different from current pg_standby behavior, and makes
>> > accidental failovers after one failover more likely.
>>
>> Yes, it's because pg_standby cannot be sure when the trigger file
>> can be removed in smart mode. If the trigger file is deleted as soon
>> as it's found, just like in fast mode, pg_standby may keep waiting
>> for WAL file again.
>
> My understanding of smart mode is fairly simple:
>
>   if (triggered)
>   {
>        if (smartMode && nextWALfile+1 exists)
>                exit(0);
>        else
>        {
>                delete trigger file
>                exit(1);
>        }
>   }
>
> If you perform a file lookahead (the +1) as shown above then you avoid
> the problem Heikki observes.

Thanks for the suggestion!

A lookahead (the +1) may have pg_standby get stuck as follows.
Am I missing something?

1. the trigger file containing "smart" is created.
2. pg_standby is executed.
    2-1. nextWALfile is restored.
    2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
3. the restored nextWALfile is applied.
4. pg_standby is executed again to restore nextWALfile+1.
5. pg_standby gets stuck because the trigger file and nextWALfile+1
    don't exist.

But, a lookahead nextWALfile seems to work fine.

if (triggered)
{
    if (smartMode && nextWALfile exists)
        exit(0)
    else
    {
        delete trigger file
        exit(1)
    }
}

1. the trigger file containing "smart" is created.
2. pg_standby is executed.
    2-1. nextWALfile is restored.
3. the restored nextWALfile is applied.
4. pg_standby is executed again to restore nextWALfile+1.
    4-1. the trigger file is deleted because nextWALfile+1 doesn't exist.
5. the startup process fails to read nextWALfile+1.
6. pg_standby is executed again to re-fetch nextWALfile.
    6-1. nextWALfile is restored.
    6-2. pg_standby doesn't get stuck because nextWALfile exists.

Furthermore, pg_standby may have to check if nextWALfile exists
not only in archiveLocation but also in pg_xlog. Because, when
pg_xlog of the primary server can be read at failover, WAL files
in it may be copied to pg_xlog of the standby server to be applied.
(but, not sure if it's better to copy such files to pg_xlog instead of
archiveLocation in this case).

Comments?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to