On Sat, Mar 14, 2020 at 12:24 PM James Coleman <jtc...@gmail.com> wrote: > > On Sat, Mar 14, 2020 at 12:07 PM James Coleman <jtc...@gmail.com> wrote: > > > > On Fri, Mar 13, 2020 at 8:23 PM James Coleman <jtc...@gmail.com> wrote: > > > > > > On Friday, March 13, 2020, Tomas Vondra <tomas.von...@2ndquadrant.com> > > > wrote: > > >> > > >> On Fri, Mar 13, 2020 at 04:31:16PM -0400, James Coleman wrote: > > >>> > > >>> On Fri, Mar 13, 2020 at 2:23 PM James Coleman <jtc...@gmail.com> wrote: > > >>>> > > >>>> > > >>>> On Tue, Mar 10, 2020 at 10:44 PM Tomas Vondra > > >>>> <tomas.von...@2ndquadrant.com> wrote: > > >>>> > 3) Most of the execution plans look reasonable, except that some of > > >>>> > the > > >>>> > plans look like this: > > >>>> > > > >>>> > > > >>>> > QUERY PLAN > > >>>> > --------------------------------------------------------- > > >>>> > Limit > > >>>> > -> GroupAggregate > > >>>> > Group Key: t.a, t.b, t.c, t.d > > >>>> > -> Incremental Sort > > >>>> > Sort Key: t.a, t.b, t.c, t.d > > >>>> > Presorted Key: t.a, t.b, t.c > > >>>> > -> Incremental Sort > > >>>> > Sort Key: t.a, t.b, t.c > > >>>> > Presorted Key: t.a, t.b > > >>>> > -> Index Scan using t_a_b_idx on t > > >>>> > (10 rows) > > >>>> > > > >>>> > i.e. there are two incremental sorts on top of each other, with > > >>>> > different prefixes. But this this is not a new issue - it happens > > >>>> > with > > >>>> > queries like this: > > >>>> > > > >>>> > SELECT a, b, c, d, count(*) FROM ( > > >>>> > SELECT * FROM t ORDER BY a, b, c > > >>>> > ) foo GROUP BY a, b, c, d limit 1000; > > >>>> > > > >>>> > i.e. there's a subquery with a subset of pathkeys. Without > > >>>> > incremental > > >>>> > sort the plan looks like this: > > >>>> > > > >>>> > QUERY PLAN > > >>>> > --------------------------------------------- > > >>>> > Limit > > >>>> > -> GroupAggregate > > >>>> > Group Key: t.a, t.b, t.c, t.d > > >>>> > -> Sort > > >>>> > Sort Key: t.a, t.b, t.c, t.d > > >>>> > -> Sort > > >>>> > Sort Key: t.a, t.b, t.c > > >>>> > -> Seq Scan on t > > >>>> > (8 rows) > > >>>> > > > >>>> > so essentially the same plan shape. What bugs me though is that there > > >>>> > seems to be some sort of memory leak, so that this query consumes > > >>>> > gigabytes os RAM before it gets killed by OOM. But the memory seems > > >>>> > not > > >>>> > to be allocated in any memory context (at least MemoryContextStats > > >>>> > don't > > >>>> > show anything like that), so I'm not sure what's going on. > > >>>> > > > >>>> > Reproducing it is fairly simple: > > >>>> > > > >>>> > CREATE TABLE t (a bigint, b bigint, c bigint, d bigint); > > >>>> > INSERT INTO t SELECT > > >>>> > 1000*random(), 1000*random(), 1000*random(), 1000*random() > > >>>> > FROM generate_series(1,10000000) s(i); > > >>>> > CREATE INDEX idx ON t(a,b); > > >>>> > ANALYZE t; > > >>>> > > > >>>> > EXPLAIN ANALYZE SELECT a, b, c, d, count(*) > > >>>> > FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d > > >>>> > LIMIT 100; > > >>>> > > >>>> While trying to reproduce this, instead of lots of memory usage, I got > > >>>> the attached assertion failure instead. > > >>> > > >>> > > >>> And, without the EXPLAIN ANALYZE was able to get this one, which will > > >>> probably be a lot more helpful. > > >>> > > >> > > >> Hmmm, I'll try reproducing it, but can you investigate the values in the > > >> Assert? I mean, it fails on this: > > >> > > >> Assert(total_allocated == context->mem_allocated); > > >> > > >> so can you get a core or attach to the process using gdb, and see what's > > >> the expected / total value? > > > > I've reproduced this on multiple machines (though all are Ubuntu or > > Debian derivatives...I don't think that's likely to matter). A core > > dump is ~150MB, so I've uploaded to Dropbox [1]. > > > > I didn't find an obvious first-level member of Tuplesortstate that was > > covered by either of the two blocks in the AllocSet (both are 8KB in > > size). > > > > James > > > > [1]: > > https://www.dropbox.com/s/jwndwp4634hzywk/aset_assertion_failure.core?dl=0 > > And...I think I might have found out the issue (though haven't proved > it 100% yet or fixed it): > > The incremental sort node calls `tuplesort_puttupleslot`, which > switches the memory context to `sortcontext`. It then calls > `puttuple_common`. `puttuple_common` may then call `grow_memtuples` > which reallocs space for `sortstate->memtuples`, but `memtuples` is > elsewhere allocated in the memory context maincontext. > > I had earlier in this debugging process noticed that `sortcontext` was > allocated in `maincontext`, which seemed conceptually odd if our goal > is to reuse the sort state, and I also found a comment that needed to > be changed relative to cleaning up the per-sort context (that talks > about it freeing the sort state itself), but the `memtuples` array was > in fact freed additionally at reset, so it seemed safe. > > Given this issue though, I think I'm going to go ahead and rework so > that the `memtuples` array lies within the `sortcontext` instead.
Perhaps I spoke too soon: I didn't realize repalloc(_huge) didn't need a memory context switch, so this likely isn't the issue. James