On Thu, Oct 25, 2018 at 4:26 PM Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > Firstly, I took Robert's advice and removed the CONCURRENTLY keyword > from the syntax. We just do it that way always. When there's a default > partition, only that partition is locked with an AEL; all the rest is > locked with ShareUpdateExclusive only.
Check. > Then there are a few other implementation details worth mentioning: > > 3. parallel query: when a worker starts on a snapshot that has a > partition descriptor cache, we need to transmit those partdescs from > leader via shmem ... but we cannot send the full struct, so we just send > the OID list of partitions, then rebuild the descriptor in the worker. > Side effect: if a partition is detached right between the leader taking > the partdesc and the worker starting, the partition loses its > relpartbound column, so it's not possible to reconstruct the partdesc. > In this case, we raise an error. Hopefully this should be rare. I don't think it's a good idea to for parallel query to just randomly fail in cases where a non-parallel query would have worked. I tried pretty hard to avoid that while working on the feature, and it would be a shame to see that work undone. It strikes me that it would be a good idea to break this work into two phases. In phase 1, let's support ATTACH and CREATE TABLE .. PARTITION OF without requiring AccessExclusiveLock. In phase 2, think about concurrency for DETACH (and possibly DROP). I suspect phase 1 actually isn't that hard. It seems to me that the only thing we REALLY need to ensure is that the executor doesn't blow up if a relcache reload occurs. There are probably a few different approaches to that problem, but I think it basically boils down to (1) making sure that the executor is holding onto pointers to the exact objects it wants to use and not re-finding them through the relcache and (2) making sure that the relcache doesn't free and rebuild those objects but rather holds onto the existing copies. With this approach, already-running queries won't take into account the fact that new partitions have been added, but that seems at least tolerable and perhaps desirable. For phase 2, we're not just talking about adding stuff that need not be used immediately, but about removing stuff which may already be in use. Your email doesn't seem to describe what we want the *behavior* to be in that case. Leave aside for a moment the issue of not crashing: what are the desired semantics? I think it would be pretty strange if you had a COPY running targeting a partitioned table, detached a partition, and the COPY continued to route tuples to the detached partition even though it was now an independent table. It also seems pretty strange if the tuples just get thrown away. If the COPY isn't trying to send any tuples to the now-detached partition, then it's fine, but if it is, then I have trouble seeing any behavior other than an error as sane, unless perhaps a new partition has been attached or created for that part of the key space. If you adopt that proposal, then the problem of parallel query behaving differently from non-parallel query goes away. You just get an error in both cases, probably to the effect that there is no (longer) a partition matching the tuple you are trying to insert (or update). If you're not hacking on this patch set too actively right at the moment, I'd like to spend some time hacking on the CREATE/ATTACH side of things and see if I can produce something committable for that portion of the problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company