On Mon, Sep 17, 2018 at 10:38 AM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > While performing some benchmarks on REL_11_STABLE (at 444455c2d9), I've > repeatedly hit an apparent infinite loop on TPC-H query 4. I don't know > what exactly are the triggering conditions, but the symptoms are these: > > 1) A parallel worker" process is consuming 100% CPU, with per for > reporting profile like this: > > 34.66% postgres [.] get_segment_by_index > 29.44% postgres [.] get_best_segment > 29.22% postgres [.] unlink_segment.isra.2 > 6.66% postgres [.] fls > 0.02% [unknown] [k] 0xffffffffb10014b0 > > So all the time seems to be spent within get_best_segment. > > 2) The backtrace looks like this (full backtrace attached): > > #0 0x0000561a748c4f89 in get_segment_by_index > #1 0x0000561a748c5653 in get_best_segment > #2 0x0000561a748c67a9 in dsa_allocate_extended > #3 0x0000561a7466ddb4 in ExecParallelHashTupleAlloc > #4 0x0000561a7466e00a in ExecParallelHashTableInsertCurrentBatch > #5 0x0000561a7466fe00 in ExecParallelHashJoinNewBatch > #6 ExecHashJoinImpl > #7 ExecParallelHashJoin > #8 ExecProcNode > ... > > 3) The infinite loop seems to be pretty obvious - after setting > breakpoint on get_segment_by_index we get this: > > Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... > (gdb) c > Continuing. > > Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... > (gdb) c > Continuing. > > Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... > (gdb) c > Continuing. > > That is, we call the function with the same index over and over. > > Why is that? Well: > > (gdb) print *area->segment_maps[3].header > $1 = {magic = 216163851, usable_pages = 512, size = 2105344, prev = 3, > next = 3, bin = 0, freed = false} > > So, we loop forever. > > I don't know what exactly are the triggering conditions here. I've only > ever observed the issue on TPC-H with scale 16GB, partitioned lineitem > table and work_mem set to 8MB and query #4. And it seems I can reproduce > it pretty reliably.
Urgh. Thanks Tomas. I will investigate. -- Thomas Munro http://www.enterprisedb.com