On Mon, Jul 16, 2012 at 10:58 PM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > BTW, one little detail that I don't think has been mentioned in this thread > before: Even though the master currently knows whether a standby is > connected or not, and you could write a patch to act based on that, there > are other failure scenarios where you would still not be happy. For example, > imagine that the standby has a disk failure. It stays connected to the > master, but fails to fsync anything to disk. Would you want to fall back to > degraded mode and just do asynchronous replication in that case? How do you > decide when to do that in the master? Or what if the standby keeps making > progress, but becomes incredibly slow for some reason, like disk failure in > a RAID array? I'd rather outsource all that logic to external monitoring > software - software that you should be running anyway.
I would like to express some support for the non-edge nature of this case. Outside of simple loss of availability of a server, losing access to a block device is probably the second-most-common cause of loss of availability for me. It's especially insidious because simple "select 1" checks may continue to return for quite some time, so instead we rely on linux diskstats parsing to see if write progress hits zero for "a while." In cases like these, the overhead of a shell-command to rapidly consort with a decision-making process can be prohibitive -- it's already a pretty big waster of time for me in wal archiving/dearchiving, where process startup and SSL negotiation and lack of parallelization can be pretty slow. This may also exhibit this problem. I would like to plead that whatever is done would be most useful being controllable via non-GUCs in its entirely -- arguably that is already the case, since one can write a replication protocol client to do the job, by faking the standby status update messages, but perhaps there is a more lucid way if one makes accommodation. In particular, the awkwardness of using pg_receivexlog[0] or a similar tool for replacing archive_command is something that I feel should be addressed eventually, as to not be a second-class citizen. Although that is already being worked on[1]...the archive command has no backpressure either, other than "out of disk". The case of restore_command is even more sore: remastering or archive-recovery via streaming protocol actions is kind of a pain at the moment. I haven't thoroughly explored this yet and I don't think it is documented, but it can be hard for something that is dearchiving from wal segments stored somewhere to find exactly the right record to start replaying at: the wal record format is not stable, and it need not be, if the server helps by ignoring records that predate what it requires or can inform the process feeding WAL that it got things wrong. Maybe that is the case, but it is not documented. I also don't think any guarantees around the maximum size or alignment of WAL shipped by the streaming protocol in XLogData messages, and that's too bad. Also, the endianness of WAL position fields in the XLogData is host-byte-order-dependent, which sucks if you are forwarding WAL around but need to know what range is contained in a message. In practice many people can say "all I have is little-endian," but it is somewhat unpleasant and not necessarily the case. Correct me if I'm wrong, I'd be glad for it. [0]: see the notes section, http://www.postgresql.org/docs/devel/static/app-pgreceivexlog.html [1]: http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers