On Tue, Sep 20, 2016 at 10:27 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Tue, Sep 20, 2016 at 10:24 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > > On Thu, Sep 15, 2016 at 11:42 PM, Amit Kapila <amit.kapil...@gmail.com> > > wrote: > >> > >> > >> Okay, Thanks for pointing out the same. I have fixed it. Apart from > >> that, I have changed _hash_alloc_buckets() to initialize the page > >> instead of making it completely Zero because of problems discussed in > >> another related thread [1]. I have also updated README. > >> > > > > with v7 of the concurrent has patch and v4 of the write ahead log patch > and > > the latest relcache patch (I don't know how important that is to > reproducing > > this, I suspect it is not), I once got this error: > > > > > > 38422 00000 2016-09-19 16:25:50.055 PDT:LOG: database system was > > interrupted; last known up at 2016-09-19 16:25:49 PDT > > 38422 00000 2016-09-19 16:25:50.057 PDT:LOG: database system was not > > properly shut down; automatic recovery in progress > > 38422 00000 2016-09-19 16:25:50.057 PDT:LOG: redo starts at 3F/2200DE90 > > 38422 01000 2016-09-19 16:25:50.061 PDT:WARNING: page verification > failed, > > calculated checksum 65067 but expected 21260 > > 38422 01000 2016-09-19 16:25:50.061 PDT:CONTEXT: xlog redo at > 3F/22053B50 > > for Hash/ADD_OVFL_PAGE: bmsize 4096, bmpage_found T > > 38422 XX001 2016-09-19 16:25:50.071 PDT:FATAL: invalid page in block 9 > of > > relation base/16384/17334 > > 38422 XX001 2016-09-19 16:25:50.071 PDT:CONTEXT: xlog redo at > 3F/22053B50 > > for Hash/ADD_OVFL_PAGE: bmsize 4096, bmpage_found T > > > > > > The original page with the invalid checksum is: > > > > I think this is a example of torn page problem, which seems to be > happening because of the below code in your test. > > ! if (JJ_torn_page > 0 && counter++ > JJ_torn_page && > !RecoveryInProgress()) { > ! nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ/3); > ! ereport(FATAL, > ! (errcode(ERRCODE_DISK_FULL), > ! errmsg("could not write block %u of relation %s: wrote only %d of %d > bytes", > ! blocknum, > ! relpath(reln->smgr_rnode, forknum), > ! nbytes, BLCKSZ), > ! errhint("JJ is screwing with the database."))); > ! } else { > ! nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ); > ! } > > If you are running the above test by disabling JJ_torn_page, then it > is a different matter and we need to investigate it, but l assume you > are running by enabling it. > > I think this could happen if the actual change in page is in 2/3 part > of page which you are not writing in above code. The checksum in page > header which is written as part of partial page write (1/3 part of > page) would have considered the actual change you have made whereas > after restart when it again read the page to apply redo, the checksum > calculation won't include the change being made in 2/3 part. > Correct. But any torn page write must be covered by the restoration of a full page image during replay, shouldn't it? And that restoration should happen blindly, without first reading in the old page and verifying the checksum. Failure to restore the page from a FPI would be a bug. (That was the purpose for which I wrote this testing harness in the first place, to verify that the restoration of FPI happens correctly; although most of the bugs it happens to uncover have been unrelated to that.) > > Today, Ashutosh has shared the logs of his test run where he has shown > similar problem for HEAP page. I think this could happen though > rarely for any page with the above kind of tests. > I think Ashutosh's examples are of warnings, not errors. I think the warnings occur when replay needs to read in the block (for reason's I don't understand yet) but then doesn't care if it passes the checksum or not because it will just be blown away by the replay anyway. > Does this explanation explains the reason of problem you are seeing? > If it can't survive artificial torn page writes, then it probably can't survive reals ones either. So I am pretty sure it is a bug of some sort. Perhaps the bug is that it is generating an ERROR when should just be a WARNING? > > > > > If I ignore the checksum failure and re-start the system, the page gets > > restored to be a bitmap page. > > > > Okay, but have you ensured in some way that redo is applied to bitmap page? > I haven't done that yet. I can't start the system without destroying the evidence, and I haven't figured out yet how to import a specific block from a shut-down server into a bytea of a running server, in order to inspect it using pageinspect. Today, while thinking on this problem, I realized that currently in > patch we are using REGBUF_NO_IMAGE for bitmap page for one of the > problem reported by you [1]. That change will fix the problem > reported by you, but it will expose bitmap pages for torn-page > hazards. I think the right fix there is to make pd_lower equal to > pd_upper for bitmap page, so that full page writes doesn't exclude the > data in bitmappage. > I'm afraid that is over my head. I can study it until it makes sense, but it will take me a while. Cheers, Jeff > [1] - https://www.postgresql.org/message-id/CAA4eK1KJOfVvFUmi6dcX9Y2- > 0PFHkomDzGuyoC%3DaD3Qj9WPpFA%40mail.gmail.com > >