On 5/23/13 12:51 PM, Pavan Deolasee wrote:



On Thu, May 23, 2013 at 11:10 PM, Heikki Linnakangas <hlinnakan...@vmware.com 
<mailto:hlinnakan...@vmware.com>> wrote:

    On 23.05.2013 07:55, Robert Haas wrote:

        On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
        <hlinnakan...@vmware.com <mailto:hlinnakan...@vmware.com>>  wrote:

            1. Scan the WAL log of the old cluster, starting from the point 
where
            the new cluster's timeline history forked off from the old cluster. 
For each
            WAL record, make a note of the data blocks that are touched. This 
yields a
            list of all the data blocks that were changed in the old cluster, 
after the
            new cluster forked off.


        Suppose that a transaction is open and has written tuples at the point
        where WAL forks.  After WAL forks, the transaction commits.  Then, it
        hints some of the tuples that it wrote.  There is no record in WAL
        that those blocks are changed, but failing to revert them leads to
        data corruption.


    Bummer, you're right. Hmm, if you have checksums enabled, however, we'll 
WAL log a full-page every time a page is dirtied for setting a hint bit, which 
fixes the problem. So, there's a caveat with pg_rewind; you must have checksums 
enabled.


I was quite impressed with the idea, but hint bits indeed are problem. I 
realised the same issue also applies to the other idea that Fujii-san and 
others have suggested about waiting for dirty buffers to be written until the 
WAL is received at the standby. But since that idea would anyways need to be 
implemented in the core, we could teach SetHintBits() to return false unless 
the corresponding commit WAL records are written to the standby first.

Would it be useful to turn this problem around? Heikki's proposal is based on being able 
to track (without fail) all blocks that have been modified; could we instead track blocks 
that we know for certain have NOT been modified? The difference there is that we can be 
more conservative in stating "we know this block is the same"; worst case we 
just do some extra copying.

<thinking out loud...>
One possibility would be to use file timestamps. For files that are past a 
certain age on both master and slave, if we force the timestamp on the slave to 
match the timestamp from the master, rsync will be able to safely ignore that 
file. I realize that's not as good as block-level detection, but it's probably 
a tremendous improvement over what we have today. The critical thing in this 
case would be to *guarantee* that the timestamps did not match on modified 
files.

Of course, screwing around with FS timestamps in this manner is pretty grotty, 
at least on a live system. Perhaps there's some way to track that info 
separately and then use it to change file timestamps before running rsync. Or 
if we are able to define a list of files that we think may have changed, we 
just feed that list to rsync and let it do the heavy lifting.
--
Jim C. Nasby, Data Architect                       j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to