Also, the behavior (=line of code) added by the bug fix is the same as existing 
code in the same function, _bt_first(), at lines 898, 1096, 1132, 1367. And the 
calls to _bt_parallel_readpage(), line 903, and _bt_steppage(), line 1416, will 
also ultimately call _bt_parallel_done(). So the bug seems to be a pretty 
simple oversight: in 6 out of 7 cases in _bt_first(), we call 
_bt_parallel_done() before returning "false"; but in the 7th case (fixed in 
this bug fix), we do not. The fix is to make case #7 the same as the other 6.

James

On 9/9/20, 7:11 AM, "Jameson, Hunter 'James'" <hunj...@amazon.com> wrote:

    Hi, I spent some time trying to create a repro (other than testing it on 
the production instance where we encountered the bug), but was unable to create 
one within a reasonable time.

    The tricky part is that the bug symptoms are run-time symptoms -- so not 
only do you need, first, to satisfy conditions (1), (2), and (3), without the 
query optimizer optimizing them away! -- but you also need, second, a query 
that runs long enough for one or more of the parallel workers' state machines 
to get confused. (This wasn't a problem on the production instance where we 
encountered the bug and I tested the fix.)

    Also, third-- passing InvalidBlockNumber to ReadBuffer() generally just 
appends a new block to the relation, so the bug doesn't even result in an error 
condition on an RW instance. (The production instance was RO...) So the bug, 
although very small!, is annoying!

    James

    On 9/9/20, 6:14 AM, "Amit Kapila" <amit.kapil...@gmail.com> wrote:

        CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



        On Tue, Sep 8, 2020 at 11:55 PM Jameson, Hunter 'James'
        <hunj...@amazon.com> wrote:
        >
        > Hi, I ran across a small (but annoying) bug in initializing parallel 
BTree scans, which causes the parallel-scan state machine to get confused.
        >
        >
        > To reproduce, you need a query that:
        >
        >
        >
        > 1. Executes parallel BTree index scan;
        >
        > 2. Has an IN-list of size > 1;
        >
        > 3. Has an additional index filter that makes it impossible to satisfy 
the
        >
        >     first IN-list condition.
        >
        >
        >
        > (We encountered such a query, and therefore the bug, on a production 
instance.)
        >
        >

        I think I can understand what you are pointing out here but it would
        be great if you can have a reproducible test case because that will
        make it apparent and we might want to include that in the regression
        tests if possible.

        --
        With Regards,
        Amit Kapila.


Reply via email to