On Wed, Sep 5, 2018 at 4:59 AM, R, Siva <sivas...@amazon.com> wrote: > Hi, > > We recently encountered an issue where the opaque data flags on a gin data > leaf page was corrupted while replaying a gin insert WAL record. Upon > further examination of the redo code, we found a bug in ginRedoRecompress > code, which extracts the WAL information and updates the page. > > Specifically, when a new segment is inserted in the middle of a page, a > memmove operation is performed [1] at the current point in the page to make > room for the new segment. If this segment insertion is followed by delete > segment actions that are yet to be processed and the total data size is very > close to GinDataPageMaxDataSize, then we may move the data portion beyond > the boundary causing the opaque data to be corrupted. > > One way of solving this problem is to perform the replay work on a scratch > space, perform sanity check on the total size of the data portion before > copying it back to the actual page. While it involves additional memory > allocation and memcpy operations, it is safer and similar to the 'do' code > path where we ensure to make a copy of all segment past the first modified > segment before placing them back on the page [2]. >
Hmm, could you share the sequence of what kind of WAL has applied to the broken page? I suspect the segment list contains GIN_SEGMENT_REPLACE before GIN_SEGMENT_INSERT. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center