Re: ATTACH/DETACH PARTITION CONCURRENTLY

Robert Haas Tue, 06 Nov 2018 10:11:15 -0800

On Thu, Oct 25, 2018 at 4:26 PM Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> Firstly, I took Robert's advice and removed the CONCURRENTLY keyword
> from the syntax.  We just do it that way always.  When there's a default
> partition, only that partition is locked with an AEL; all the rest is
> locked with ShareUpdateExclusive only.


Check.

> Then there are a few other implementation details worth mentioning:
>
> 3. parallel query: when a worker starts on a snapshot that has a
> partition descriptor cache, we need to transmit those partdescs from
> leader via shmem ... but we cannot send the full struct, so we just send
> the OID list of partitions, then rebuild the descriptor in the worker.
> Side effect: if a partition is detached right between the leader taking
> the partdesc and the worker starting, the partition loses its
> relpartbound column, so it's not possible to reconstruct the partdesc.
> In this case, we raise an error.  Hopefully this should be rare.

I don't think it's a good idea to for parallel query to just randomly
fail in cases where a non-parallel query would have worked.  I tried
pretty hard to avoid that while working on the feature, and it would
be a shame to see that work undone.

It strikes me that it would be a good idea to break this work into two
phases.  In phase 1, let's support ATTACH and CREATE TABLE ..
PARTITION OF without requiring AccessExclusiveLock.  In phase 2, think
about concurrency for DETACH (and possibly DROP).

I suspect phase 1 actually isn't that hard.  It seems to me that the
only thing we REALLY need to ensure is that the executor doesn't blow
up if a relcache reload occurs.  There are probably a few different
approaches to that problem, but I think it basically boils down to (1)
making sure that the executor is holding onto pointers to the exact
objects it wants to use and not re-finding them through the relcache
and (2) making sure that the relcache doesn't free and rebuild those
objects but rather holds onto the existing copies.  With this
approach, already-running queries won't take into account the fact
that new partitions have been added, but that seems at least tolerable
and perhaps desirable.

For phase 2, we're not just talking about adding stuff that need not
be used immediately, but about removing stuff which may already be in
use.  Your email doesn't seem to describe what we want the *behavior*
to be in that case.  Leave aside for a moment the issue of not
crashing: what are the desired semantics?  I think it would be pretty
strange if you had a COPY running targeting a partitioned table,
detached a partition, and the COPY continued to route tuples to the
detached partition even though it was now an independent table.  It
also seems pretty strange if the tuples just get thrown away.  If the
COPY isn't trying to send any tuples to the now-detached partition,
then it's fine, but if it is, then I have trouble seeing any behavior
other than an error as sane, unless perhaps a new partition has been
attached or created for that part of the key space.

If you adopt that proposal, then the problem of parallel query
behaving differently from non-parallel query goes away.  You just get
an error in both cases, probably to the effect that there is no
(longer) a partition matching the tuple you are trying to insert (or
update).

If you're not hacking on this patch set too actively right at the
moment, I'd like to spend some time hacking on the CREATE/ATTACH side
of things and see if I can produce something committable for that
portion of the problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ATTACH/DETACH PARTITION CONCURRENTLY

Reply via email to