On Sat, Mar 14, 2020 at 12:36 PM James Coleman <jtc...@gmail.com> wrote: > > On Sat, Mar 14, 2020 at 12:24 PM James Coleman <jtc...@gmail.com> wrote: > > > > On Sat, Mar 14, 2020 at 12:07 PM James Coleman <jtc...@gmail.com> wrote: > > > > > > On Fri, Mar 13, 2020 at 8:23 PM James Coleman <jtc...@gmail.com> wrote: > > > > > > > > On Friday, March 13, 2020, Tomas Vondra <tomas.von...@2ndquadrant.com> > > > > wrote: > > > >> > > > >> On Fri, Mar 13, 2020 at 04:31:16PM -0400, James Coleman wrote: > > > >>> > > > >>> On Fri, Mar 13, 2020 at 2:23 PM James Coleman <jtc...@gmail.com> > > > >>> wrote: > > > >>>> > > > >>>> > > > >>>> On Tue, Mar 10, 2020 at 10:44 PM Tomas Vondra > > > >>>> <tomas.von...@2ndquadrant.com> wrote: > > > >>>> > 3) Most of the execution plans look reasonable, except that some > > > >>>> > of the > > > >>>> > plans look like this: > > > >>>> > > > > >>>> > > > > >>>> > QUERY PLAN > > > >>>> > --------------------------------------------------------- > > > >>>> > Limit > > > >>>> > -> GroupAggregate > > > >>>> > Group Key: t.a, t.b, t.c, t.d > > > >>>> > -> Incremental Sort > > > >>>> > Sort Key: t.a, t.b, t.c, t.d > > > >>>> > Presorted Key: t.a, t.b, t.c > > > >>>> > -> Incremental Sort > > > >>>> > Sort Key: t.a, t.b, t.c > > > >>>> > Presorted Key: t.a, t.b > > > >>>> > -> Index Scan using t_a_b_idx on t > > > >>>> > (10 rows) > > > >>>> > > > > >>>> > i.e. there are two incremental sorts on top of each other, with > > > >>>> > different prefixes. But this this is not a new issue - it happens > > > >>>> > with > > > >>>> > queries like this: > > > >>>> > > > > >>>> > SELECT a, b, c, d, count(*) FROM ( > > > >>>> > SELECT * FROM t ORDER BY a, b, c > > > >>>> > ) foo GROUP BY a, b, c, d limit 1000; > > > >>>> > > > > >>>> > i.e. there's a subquery with a subset of pathkeys. Without > > > >>>> > incremental > > > >>>> > sort the plan looks like this: > > > >>>> > > > > >>>> > QUERY PLAN > > > >>>> > --------------------------------------------- > > > >>>> > Limit > > > >>>> > -> GroupAggregate > > > >>>> > Group Key: t.a, t.b, t.c, t.d > > > >>>> > -> Sort > > > >>>> > Sort Key: t.a, t.b, t.c, t.d > > > >>>> > -> Sort > > > >>>> > Sort Key: t.a, t.b, t.c > > > >>>> > -> Seq Scan on t > > > >>>> > (8 rows) > > > >>>> > > > > >>>> > so essentially the same plan shape. What bugs me though is that > > > >>>> > there > > > >>>> > seems to be some sort of memory leak, so that this query consumes > > > >>>> > gigabytes os RAM before it gets killed by OOM. But the memory > > > >>>> > seems not > > > >>>> > to be allocated in any memory context (at least MemoryContextStats > > > >>>> > don't > > > >>>> > show anything like that), so I'm not sure what's going on. > > > >>>> > > > > >>>> > Reproducing it is fairly simple: > > > >>>> > > > > >>>> > CREATE TABLE t (a bigint, b bigint, c bigint, d bigint); > > > >>>> > INSERT INTO t SELECT > > > >>>> > 1000*random(), 1000*random(), 1000*random(), 1000*random() > > > >>>> > FROM generate_series(1,10000000) s(i); > > > >>>> > CREATE INDEX idx ON t(a,b); > > > >>>> > ANALYZE t; > > > >>>> > > > > >>>> > EXPLAIN ANALYZE SELECT a, b, c, d, count(*) > > > >>>> > FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d > > > >>>> > LIMIT 100; > > > >>>> > > > >>>> While trying to reproduce this, instead of lots of memory usage, I > > > >>>> got > > > >>>> the attached assertion failure instead. > > > >>> > > > >>> > > > >>> And, without the EXPLAIN ANALYZE was able to get this one, which will > > > >>> probably be a lot more helpful. > > > >>> > > > >> > > > >> Hmmm, I'll try reproducing it, but can you investigate the values in > > > >> the > > > >> Assert? I mean, it fails on this: > > > >> > > > >> Assert(total_allocated == context->mem_allocated); > > > >> > > > >> so can you get a core or attach to the process using gdb, and see > > > >> what's > > > >> the expected / total value? > > > > > > I've reproduced this on multiple machines (though all are Ubuntu or > > > Debian derivatives...I don't think that's likely to matter). A core > > > dump is ~150MB, so I've uploaded to Dropbox [1]. > > > > > > I didn't find an obvious first-level member of Tuplesortstate that was > > > covered by either of the two blocks in the AllocSet (both are 8KB in > > > size). > > > > > > James > > > > > > [1]: > > > https://www.dropbox.com/s/jwndwp4634hzywk/aset_assertion_failure.core?dl=0 > > > > And...I think I might have found out the issue (though haven't proved > > it 100% yet or fixed it): > > > > The incremental sort node calls `tuplesort_puttupleslot`, which > > switches the memory context to `sortcontext`. It then calls > > `puttuple_common`. `puttuple_common` may then call `grow_memtuples` > > which reallocs space for `sortstate->memtuples`, but `memtuples` is > > elsewhere allocated in the memory context maincontext. > > > > I had earlier in this debugging process noticed that `sortcontext` was > > allocated in `maincontext`, which seemed conceptually odd if our goal > > is to reuse the sort state, and I also found a comment that needed to > > be changed relative to cleaning up the per-sort context (that talks > > about it freeing the sort state itself), but the `memtuples` array was > > in fact freed additionally at reset, so it seemed safe. > > > > Given this issue though, I think I'm going to go ahead and rework so > > that the `memtuples` array lies within the `sortcontext` instead. > > Perhaps I spoke too soon: I didn't realize repalloc(_huge) didn't need > a memory context switch, so this likely isn't the issue.
It looks like the issue is actually into the `tuplecontext`, which is currently a child context of `sortcontext`: #3 0x0000558cd153b565 in AllocSetCheck (context=context@entry=0x558cd28e0b70) at aset.c:1573 1573 Assert(total_allocated == context->mem_allocated); (gdb) p total_allocated $1 = 16384 (gdb) p context->mem_allocated $2 = 8192 (gdb) p context->name $3 = 0x558cd16c8ccd "Caller tuples" I stuck in several more AllocSetCheck calls in aset.c and got the attached backtrace. James
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007fd9fcac7801 in __GI_abort () at abort.c:79 #2 0x000055e09eb918a6 in ExceptionalCondition (conditionName=conditionName@entry=0x55e09ed427d0 "total_allocated == context->mem_allocated", errorType=errorType@entry=0x55e09ebe9e08 "FailedAssertion", fileName=fileName@entry=0x55e09ed42553 "aset.c", lineNumber=lineNumber@entry=1573) at assert.c:67 #3 0x000055e09ebb8565 in AllocSetCheck (context=context@entry=0x55e0a0414730) at aset.c:1573 #4 0x000055e09ebb8b54 in AllocSetAlloc (context=0x55e0a0414730, size=48) at aset.c:751 #5 0x000055e09ebc0436 in palloc (size=size@entry=48) at mcxt.c:963 #6 0x000055e09e6c89ab in minimal_tuple_from_heap_tuple (htup=0x55e0a03eec08) at heaptuple.c:1487 #7 0x000055e09e8d3edb in tts_buffer_heap_copy_minimal_tuple (slot=0x55e0a03eebb8) at execTuples.c:825 #8 0x000055e09ebc7a86 in ExecCopySlotMinimalTuple (slot=<optimized out>) at ../../../../src/include/executor/tuptable.h:465 #9 copytup_heap (state=0x55e0a040e820, stup=0x7ffdea921280, tup=<optimized out>) at tuplesort.c:3718 #10 0x000055e09ebcf0dc in tuplesort_puttupleslot (state=state@entry=0x55e0a040e820, slot=slot@entry=0x55e0a03eebb8) at tuplesort.c:1581 #11 0x000055e09e8ec28b in ExecIncrementalSort (pstate=0x55e0a03ee6f0) at nodeIncrementalSort.c:670 #12 0x000055e09e8cec81 in ExecProcNodeFirst (node=0x55e0a03ee6f0) at execProcnode.c:450 #13 0x000055e09e8ec264 in ExecProcNode (node=0x55e0a03ee6f0) at ../../../src/include/executor/executor.h:245 #14 ExecIncrementalSort (pstate=0x55e0a03ee4d8) at nodeIncrementalSort.c:638 #15 0x000055e09e8cec81 in ExecProcNodeFirst (node=0x55e0a03ee4d8) at execProcnode.c:450 #16 0x000055e09e8da237 in ExecProcNode (node=0x55e0a03ee4d8) at ../../../src/include/executor/executor.h:245 #17 fetch_input_tuple (aggstate=aggstate@entry=0x55e0a03ee0d8) at nodeAgg.c:411 #18 0x000055e09e8dce0f in agg_retrieve_direct (aggstate=aggstate@entry=0x55e0a03ee0d8) at nodeAgg.c:1782 #19 0x000055e09e8dd128 in ExecAgg (pstate=0x55e0a03ee0d8) at nodeAgg.c:1597 #20 0x000055e09e8cec81 in ExecProcNodeFirst (node=0x55e0a03ee0d8) at execProcnode.c:450 #21 0x000055e09e8eff77 in ExecProcNode (node=0x55e0a03ee0d8) at ../../../src/include/executor/executor.h:245 #22 ExecLimit (pstate=0x55e0a03eded0) at nodeLimit.c:95 #23 0x000055e09e8cec81 in ExecProcNodeFirst (node=0x55e0a03eded0) at execProcnode.c:450 #24 0x000055e09e8c6cfa in ExecProcNode (node=0x55e0a03eded0) at ../../../src/include/executor/executor.h:245 #25 ExecutePlan (estate=estate@entry=0x55e0a03edc90, planstate=0x55e0a03eded0, use_parallel_mode=<optimized out>, operation=operation@entry=CMD_SELECT, sendTuples=sendTuples@entry=true, numberTuples=numberTuples@entry=0, direction=ForwardScanDirection, dest=0x55e0a03fafd8, execute_once=true) at execMain.c:1646 #26 0x000055e09e8c79c7 in standard_ExecutorRun (queryDesc=0x55e0a0344b20, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at execMain.c:364 #27 0x000055e09e8c7a82 in ExecutorRun (queryDesc=queryDesc@entry=0x55e0a0344b20, direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=<optimized out>) at execMain.c:308 #28 0x000055e09ea66fbd in PortalRunSelect (portal=portal@entry=0x55e0a0387b70, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x55e0a03fafd8) at pquery.c:912 #29 0x000055e09ea686b1 in PortalRun (portal=portal@entry=0x55e0a0387b70, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55e0a03fafd8, altdest=altdest@entry=0x55e0a03fafd8, qc=0x7ffdea9217d0) at pquery.c:756 #30 0x000055e09ea646af in exec_simple_query (query_string=query_string@entry=0x55e0a03211e0 "SELECT a, b, c, d, count(*)\n FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d\n LIMIT 100;") at postgres.c:1236 #31 0x000055e09ea6694a in PostgresMain (argc=<optimized out>, argv=argv@entry=0x55e0a034bc48, dbname=<optimized out>, username=<optimized out>) at postgres.c:4295 #32 0x000055e09e9d2d73 in BackendRun (port=port@entry=0x55e0a0341d70) at postmaster.c:4510 #33 0x000055e09e9d6039 in BackendStartup (port=port@entry=0x55e0a0341d70) at postmaster.c:4202 #34 0x000055e09e9d6353 in ServerLoop () at postmaster.c:1727 #35 0x000055e09e9d7785 in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1400 #36 0x000055e09e9205ae in main (argc=3, argv=0x55e0a031b900) at main.c:210