On Sat, Mar 14, 2020 at 12:07 PM James Coleman <jtc...@gmail.com> wrote: > > On Fri, Mar 13, 2020 at 8:23 PM James Coleman <jtc...@gmail.com> wrote: > > > > On Friday, March 13, 2020, Tomas Vondra <tomas.von...@2ndquadrant.com> > > wrote: > >> > >> On Fri, Mar 13, 2020 at 04:31:16PM -0400, James Coleman wrote: > >>> > >>> On Fri, Mar 13, 2020 at 2:23 PM James Coleman <jtc...@gmail.com> wrote: > >>>> > >>>> > >>>> On Tue, Mar 10, 2020 at 10:44 PM Tomas Vondra > >>>> <tomas.von...@2ndquadrant.com> wrote: > >>>> > 3) Most of the execution plans look reasonable, except that some of the > >>>> > plans look like this: > >>>> > > >>>> > > >>>> > QUERY PLAN > >>>> > --------------------------------------------------------- > >>>> > Limit > >>>> > -> GroupAggregate > >>>> > Group Key: t.a, t.b, t.c, t.d > >>>> > -> Incremental Sort > >>>> > Sort Key: t.a, t.b, t.c, t.d > >>>> > Presorted Key: t.a, t.b, t.c > >>>> > -> Incremental Sort > >>>> > Sort Key: t.a, t.b, t.c > >>>> > Presorted Key: t.a, t.b > >>>> > -> Index Scan using t_a_b_idx on t > >>>> > (10 rows) > >>>> > > >>>> > i.e. there are two incremental sorts on top of each other, with > >>>> > different prefixes. But this this is not a new issue - it happens with > >>>> > queries like this: > >>>> > > >>>> > SELECT a, b, c, d, count(*) FROM ( > >>>> > SELECT * FROM t ORDER BY a, b, c > >>>> > ) foo GROUP BY a, b, c, d limit 1000; > >>>> > > >>>> > i.e. there's a subquery with a subset of pathkeys. Without incremental > >>>> > sort the plan looks like this: > >>>> > > >>>> > QUERY PLAN > >>>> > --------------------------------------------- > >>>> > Limit > >>>> > -> GroupAggregate > >>>> > Group Key: t.a, t.b, t.c, t.d > >>>> > -> Sort > >>>> > Sort Key: t.a, t.b, t.c, t.d > >>>> > -> Sort > >>>> > Sort Key: t.a, t.b, t.c > >>>> > -> Seq Scan on t > >>>> > (8 rows) > >>>> > > >>>> > so essentially the same plan shape. What bugs me though is that there > >>>> > seems to be some sort of memory leak, so that this query consumes > >>>> > gigabytes os RAM before it gets killed by OOM. But the memory seems not > >>>> > to be allocated in any memory context (at least MemoryContextStats > >>>> > don't > >>>> > show anything like that), so I'm not sure what's going on. > >>>> > > >>>> > Reproducing it is fairly simple: > >>>> > > >>>> > CREATE TABLE t (a bigint, b bigint, c bigint, d bigint); > >>>> > INSERT INTO t SELECT > >>>> > 1000*random(), 1000*random(), 1000*random(), 1000*random() > >>>> > FROM generate_series(1,10000000) s(i); > >>>> > CREATE INDEX idx ON t(a,b); > >>>> > ANALYZE t; > >>>> > > >>>> > EXPLAIN ANALYZE SELECT a, b, c, d, count(*) > >>>> > FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d > >>>> > LIMIT 100; > >>>> > >>>> While trying to reproduce this, instead of lots of memory usage, I got > >>>> the attached assertion failure instead. > >>> > >>> > >>> And, without the EXPLAIN ANALYZE was able to get this one, which will > >>> probably be a lot more helpful. > >>> > >> > >> Hmmm, I'll try reproducing it, but can you investigate the values in the > >> Assert? I mean, it fails on this: > >> > >> Assert(total_allocated == context->mem_allocated); > >> > >> so can you get a core or attach to the process using gdb, and see what's > >> the expected / total value? > > I've reproduced this on multiple machines (though all are Ubuntu or > Debian derivatives...I don't think that's likely to matter). A core > dump is ~150MB, so I've uploaded to Dropbox [1]. > > I didn't find an obvious first-level member of Tuplesortstate that was > covered by either of the two blocks in the AllocSet (both are 8KB in > size). > > James > > [1]: > https://www.dropbox.com/s/jwndwp4634hzywk/aset_assertion_failure.core?dl=0
And...I think I might have found out the issue (though haven't proved it 100% yet or fixed it): The incremental sort node calls `tuplesort_puttupleslot`, which switches the memory context to `sortcontext`. It then calls `puttuple_common`. `puttuple_common` may then call `grow_memtuples` which reallocs space for `sortstate->memtuples`, but `memtuples` is elsewhere allocated in the memory context maincontext. I had earlier in this debugging process noticed that `sortcontext` was allocated in `maincontext`, which seemed conceptually odd if our goal is to reuse the sort state, and I also found a comment that needed to be changed relative to cleaning up the per-sort context (that talks about it freeing the sort state itself), but the `memtuples` array was in fact freed additionally at reset, so it seemed safe. Given this issue though, I think I'm going to go ahead and rework so that the `memtuples` array lies within the `sortcontext` instead. James