Hi, On Sat, Apr 11, 2009 at 1:31 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > > Fujii-san, > > I like the new patch using the content of the file to determine the > mode. Much easier to use at failover time. > > On Fri, 2009-04-10 at 12:47 +0900, Fujii Masao wrote: > >> > One problem with this patch is that in smart mode, the trigger file is not >> > deleted. That's different from current pg_standby behavior, and makes >> > accidental failovers after one failover more likely. >> >> Yes, it's because pg_standby cannot be sure when the trigger file >> can be removed in smart mode. If the trigger file is deleted as soon >> as it's found, just like in fast mode, pg_standby may keep waiting >> for WAL file again. > > My understanding of smart mode is fairly simple: > > if (triggered) > { > if (smartMode && nextWALfile+1 exists) > exit(0); > else > { > delete trigger file > exit(1); > } > } > > If you perform a file lookahead (the +1) as shown above then you avoid > the problem Heikki observes.
Thanks for the suggestion! A lookahead (the +1) may have pg_standby get stuck as follows. Am I missing something? 1. the trigger file containing "smart" is created. 2. pg_standby is executed. 2-1. nextWALfile is restored. 2-2. the trigger file is deleted because nextWALfile+1 doesn't exist. 3. the restored nextWALfile is applied. 4. pg_standby is executed again to restore nextWALfile+1. 5. pg_standby gets stuck because the trigger file and nextWALfile+1 don't exist. But, a lookahead nextWALfile seems to work fine. if (triggered) { if (smartMode && nextWALfile exists) exit(0) else { delete trigger file exit(1) } } 1. the trigger file containing "smart" is created. 2. pg_standby is executed. 2-1. nextWALfile is restored. 3. the restored nextWALfile is applied. 4. pg_standby is executed again to restore nextWALfile+1. 4-1. the trigger file is deleted because nextWALfile+1 doesn't exist. 5. the startup process fails to read nextWALfile+1. 6. pg_standby is executed again to re-fetch nextWALfile. 6-1. nextWALfile is restored. 6-2. pg_standby doesn't get stuck because nextWALfile exists. Furthermore, pg_standby may have to check if nextWALfile exists not only in archiveLocation but also in pg_xlog. Because, when pg_xlog of the primary server can be read at failover, WAL files in it may be copied to pg_xlog of the standby server to be applied. (but, not sure if it's better to copy such files to pg_xlog instead of archiveLocation in this case). Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers