Re: 回复：how to create index concurrently on partitioned table

Justin Pryzby Mon, 05 Oct 2020 20:08:12 -0700

On Tue, Oct 06, 2020 at 11:42:55AM +0900, Michael Paquier wrote:
> On Mon, Oct 05, 2020 at 03:08:32PM -0500, Justin Pryzby wrote:
> > It means that we might do N catalog updates for a partition heirarchy 
> > that's N
> > levels deep.  Normally, N=2, and we'd clear indisclustered for the index as
> > well as its parent.  This is not essential, though.
> 
> Hmm.  I got to think more about this one, and being able to ensure a
> consistent state of indisclustered for partitioned tables and all
> their partitions across multiple transactions at all times would be a
> nice property, as we could easily track down if an operation has
> failed in-flight.  The problem here is that we are limited by
> indisclustered being a boolean, so we may set indisclustered one way
> or another on some partitions, or on some parent partitioned tables,
> but we are not sure if what's down is actually clustered for the
> leaves of a partitioned index, or not.  Or maybe we even have an
> inconsistent state, so this does not provide a good user experience.
> The consistency of a partition tree is a key thing, and we have such 
> guarantees with REINDEX (with or without concurrently), so what I'd
> like to think that what makes sense for indisclustered on a
> partitioned index is to have the following set of guarantees, and
> relying on indisvalid as being true iff an index partition tree is
> complete on a given table partition tree is really important:
> - If indisclustered is set to true for a partitioned index, then we
> have the guarantee that all its partition and partitioned indexes have
> indisclustered set to true, across all the layers down to the leaves.
> - If indisclustered is set to false for a partitioned index, then we
> have the guarantee that all its partition and partitioned indexes have
> indisclustered set to false.
> 
> If we happen to attach a new partition to a partitioned table of such
> a tree, I guess that it is then logic to have indisclustered set to
> the same value as the partitioned index when the partition inherits an
> index.  So, it seems to me that with the current catalogs we are
> limited by indisclustered as being a boolean to maintain some kind of
> consistency across transactions of CLUSTER, as we would use one
> transaction per leaf for the work.  In order to make things better
> here, we could switch indisclustered to a char, able to use three
> states:
> - 'e' or enabled, equivalent to indisclustered == true.  There should
> be only one index on a table with 'e' set at a given time.
> - 'd' or disabled, equivalent to indisclustered == false.
> - a new third state, for an equivalent of work-in-progress, let's call
> it 'w', or whatever.
> 
> Then, when we begin a CLUSTER on a partitioned table, I would imagine
> the following:
> - Switch all the indexes selected to 'w' in a first transaction, and
> do not reset the state of other indexes if there is one 'e'.  For
> CLUSTER without USING, we switch the existing 'e' to 'w' if there is
> such an index.  If there are no indexes select-able, issue an error.
> If we find an index with 'w', we have a candidate from a previous
> failed command, so this gets used.  I don't think that this design
> requires a new DDL either as we could reset all 'w' and 'e' to 'd' if
> using ALTER TABLE SET WITHOUT CLUSTER on a partitioned table.  For
> CLUSTER with USING, the partitioned index selected and its leaves are
> switched to 'w', similarly.
> - Then do the work for all the partitions, one partition per
> transaction, keeping 'w'.
> - In a last transaction, switch all the partitions from 'w' to 'e',
> resetting any existing 'e'.


Honestly, I think you're over-thinking and over-engineering indisclustered.

If "clusteredness" was something we offered to maintain across DML, I think
that might be important to provide stronger guarantees.  As it is now, I don't
think this patch is worth changing the catalog definition.

> ALTER TABLE CLUSTER ON should also be a supported operation, where 'e'
> gets switched for all the partitions in a single transaction.  Of
> course, this should not work for an invalid index.  Even with such a
> design, I got to wonder if there could be cases where a user does
> *not* want to cluster the leaves of a partition tree with the same
> index.  If that were to happen, the partition tree may need a redesign
> so making things work so as we force partitions to inherit what's
> wanted for the partitioned table makes the most sense to me.

I wondered the same thing: should clustering a parent index cause the the child
indexes to be marked as clustered ?  Or should it be possible for an
intermediate child to say (marked as) clustered on some other index.  I don't
have strong feelings, but the patch currently sets indisclustered as a natural
consequence of the implementation, so I tentatively think that's what's right.
After all, CLUSTER sets indisclustered without having to also say
"ALTER..CLUSTER ON".

This is relevant to the question I raised about unsetting indisclustered for
each indexes *parent* when clustering on a different index.

I think it would be strange if we refused "ALTER..CLUSTER ON" for a partition
just because a different partitioned index was set clustered.  We'd clear that,
like always, and then (in my proposal) also clear its parents "indisclustered".
I still don't think that's essential, though.

> By the way, I mentioned that previously, but this thread got used for
> REINDEX that is finished, and now we have a discussion going on with
> CLUSTER.  There is also stuff related to CIC and DIC.  There is also
> only one CF entry for all that.  I really think that this work had
> better be split into separate threads, with separate CF entries. Do
> you mind if I change the contents of the CF app so as the existing 
> item is renamed for REINDEX, that got committed?  Then we could create
> a new entry for CIC/DIC (it may make sense to split these two as
> well, but I am not if there are any overlaps between the APIs of the
> two), and a new entry for CLUSTER, depending on the state of the work.
> The original subject of the thread is also unrelated, this makes the
> review process unnecessarily complicated, and that's really
> confusing.  Each discussion should go into its own, independent,
> thread.

I didn't think it's worth the overhead of closing and opening more CFs.
But I don't mind.

-- 
Justin

Re: 回复：how to create index concurrently on partitioned table

Reply via email to