Re:Re: some question about _bt_getbuf

Imai, Yoshikazu Tue, 26 Jun 2018 01:10:26 -0700

Hi,

> At 2018-05-15 01:49:41, "Tom Lane" <[email protected]> wrote:
> >=?GBK?B?19S8ug==?= <[email protected]> writes:
> >> i run test using pg10.0 on my machine, and the program crashed on 
> >> _bt_getbuf.
> >> And i found the following code:
> >> the routine _bt_page_recyclable say maybe the page is all-zero page,
> >> if so then the code run (BTPageOpaque) PageGetSpecialPointer(page);
> >> it will be failed because it access invalid memory.
> >> I don't know whether it is so. Look forward t your reply, thanks.
> >
> >This code's clearly broken, as was discussed before:
> >
> >https://www.postgresql.org/message-id/flat/2628.1474272158%40localhost
> >
> >but nothing was done about it, perhaps partly because we didn't have a
> >reproducible test case.  Do you have one?
> >
> >                     regards, tom lane
> 
> Unfortunately, I don't have a complete test case.


I recently checked about this code and previous discussion and tried to occur a 
crash.
I will describe how to occur a crash in the last of this mail, but I don't know 
whether it is useful because I used gdb to occur a crash, that it is not 
actually a reproducible test case.

As was discussed before, this crash happens when recycling an all-zeroes page 
in an index.
Referring to below comments in code, an all-zeroes page is created when backend 
downs in the split process after extending the index's relation to get a new 
page and before making WAL entries for that.

        bool
        _bt_page_recyclable(Page page)
        {
                BTPageOpaque opaque;

                /*
                 * It's possible to find an all-zeroes page in an index --- for 
example, a
                 * backend might successfully extend the relation one page and 
then crash
                 * before it is able to make a WAL entry for adding the page. 
If we find a
                 * zeroed page then reclaim it.
                 */
                if (PageIsNew(page))
                        return true;
                                ...
                }

After backend down at that time, an extended new page is not initialized since 
a recovery process after a backend down do nothing because of no WAL entry 
about a new page, and it will be recyclable when vacuum runs. 


Considering above conditions, I reproduced a crash as below.
I tested at version in master(11beta1), compiled with --enable-cassert and 
--enable-debug, with hot-standby.

<<method for making recyclable new page>>
(psql) CREATE TABLE mytab (id int, val int);
(psql) CREATE INDEX idx_val ON mytab(val);
(gdb) b nbtinsert.c:1467   (at XLogBeginInsert(); in _bt_split())
(gdb) c
while(breakpoint is not hit){
    (psql) INSERT INTO mytab SELECT t, t FROM generate_series(1, 3000) t;
}
[bash] kill -s SIGKILL (backend pid)
(psql) VACUUM;

<<method for occuring a crash>>
while(crash is not occurred){
    (psql) INSERT INTO mytab SELECT t, t FROM generate_series(1, 3000) t;
}


Yoshikazu Imai

Re:Re: some question about _bt_getbuf

Reply via email to