Re: [HACKERS] Suggestion for concurrent index creation using a single full scan operation

Tim Kane Wed, 24 Jul 2013 06:23:46 -0700

Wow.. thanks guys, really appreciate the detailed analysis.

Tim



On Wed, Jul 24, 2013 at 4:08 AM, Noah Misch <n...@leadboat.com> wrote:

> On Tue, Jul 23, 2013 at 01:06:26PM +0100, Tim Kane wrote:
> > I haven't given this a lot of thought, but it struck me that when
> > rebuilding tables (be it for a restore process, or some other operational
> > activity) - there is more often than not a need to build an index or two,
> > sometimes many indexes, against the same relation.
> >
> > It strikes me that in order to build just one index, we probably need to
> > perform a full table scan (in a lot of cases).   If we are building
> > multiple indexes sequentially against that same table, then we're
> probably
> > performing multiple sequential scans in succession, once for each index.
>
> Check.
>
> > Could we architect a mechanism that allowed multiple index creation
> > statements to execute concurrently, with all of their inputs fed directly
> > from a single sequential scan against the full relation?
> >
> > From a language construct point of view, this may not be trivial to
> > implement for raw/interactive SQL - but possibly this is a candidate for
> > the custom format restore?
>
> As Greg Stark mentioned, pg_restore can already issue index build commands
> in
> parallel.  Where applicable, that's probably superior to having one backend
> build multiple indexes during a single heap scan.  Index builds are
> CPU-intensive, and the pg_restore approach takes advantage of additional
> CPU
> cores in addition to possibly saving I/O.
>
> However, the pg_restore method is not applicable if you want CREATE INDEX
> CONCURRENTLY, and it's not applicable for implicit index building such as
> happens for ALTER TABLE rewrites and for VACUUM FULL.  Backend-managed
> concurrent index builds could shine there.
>
> > I presume this would substantially increase the memory overhead required
> to
> > build those indexes, though the performance gains may be advantageous.
>
> The multi-index-build should respect maintenance_work_mem overall.
>  Avoiding
> cases where that makes concurrent builds slower than sequential builds is a
> key challenge for such a project:
>
> - If the index builds each fit in maintenance_work_mem when run
> sequentially
>   and some spill to disk when run concurrently, expect concurrency to lose.
> - If the heap is small enough to stay in cache from one index build to the
>   next, performing the builds concurrently is probably a wash or a loss.
> - Concurrency should help when a wide-row table large enough to exhaust OS
>   cache has narrow indexes that all fit in maintenance_work_mem.  I don't
> know
>   whether concurrency would help for a huge-table scenario where the
> indexes
>   do overspill maintenance_work_mem.  You would have N indexes worth of
>   external merge files competing for disk bandwidth; that could cancel out
>   heap I/O savings.
>
> Overall, it's easy to end up with a loss.  We could punt by having an
> index_build_concurrency GUC, much like pg_restore relies on the user to
> discover a good "-j" value.  But if finding cases where concurrency helps
> is
> too hard, leaving the GUC at one would become the standard advice.
>
> > Apologies in advance if this is not the correct forum for suggestions..
>
> It's the right forum.
>
> Thanks,
> nm
>
> --
> Noah Misch
> EnterpriseDB                                 http://www.enterprisedb.com
>

Re: [HACKERS] Suggestion for concurrent index creation using a single full scan operation

Reply via email to