On Tue, Oct 06, 2020 at 11:42:55AM +0900, Michael Paquier wrote: > On Mon, Oct 05, 2020 at 03:08:32PM -0500, Justin Pryzby wrote: > > It means that we might do N catalog updates for a partition heirarchy > > that's N > > levels deep. Normally, N=2, and we'd clear indisclustered for the index as > > well as its parent. This is not essential, though. > > Hmm. I got to think more about this one, and being able to ensure a > consistent state of indisclustered for partitioned tables and all > their partitions across multiple transactions at all times would be a > nice property, as we could easily track down if an operation has > failed in-flight. The problem here is that we are limited by > indisclustered being a boolean, so we may set indisclustered one way > or another on some partitions, or on some parent partitioned tables, > but we are not sure if what's down is actually clustered for the > leaves of a partitioned index, or not. Or maybe we even have an > inconsistent state, so this does not provide a good user experience. > The consistency of a partition tree is a key thing, and we have such > guarantees with REINDEX (with or without concurrently), so what I'd > like to think that what makes sense for indisclustered on a > partitioned index is to have the following set of guarantees, and > relying on indisvalid as being true iff an index partition tree is > complete on a given table partition tree is really important: > - If indisclustered is set to true for a partitioned index, then we > have the guarantee that all its partition and partitioned indexes have > indisclustered set to true, across all the layers down to the leaves. > - If indisclustered is set to false for a partitioned index, then we > have the guarantee that all its partition and partitioned indexes have > indisclustered set to false. > > If we happen to attach a new partition to a partitioned table of such > a tree, I guess that it is then logic to have indisclustered set to > the same value as the partitioned index when the partition inherits an > index. So, it seems to me that with the current catalogs we are > limited by indisclustered as being a boolean to maintain some kind of > consistency across transactions of CLUSTER, as we would use one > transaction per leaf for the work. In order to make things better > here, we could switch indisclustered to a char, able to use three > states: > - 'e' or enabled, equivalent to indisclustered == true. There should > be only one index on a table with 'e' set at a given time. > - 'd' or disabled, equivalent to indisclustered == false. > - a new third state, for an equivalent of work-in-progress, let's call > it 'w', or whatever. > > Then, when we begin a CLUSTER on a partitioned table, I would imagine > the following: > - Switch all the indexes selected to 'w' in a first transaction, and > do not reset the state of other indexes if there is one 'e'. For > CLUSTER without USING, we switch the existing 'e' to 'w' if there is > such an index. If there are no indexes select-able, issue an error. > If we find an index with 'w', we have a candidate from a previous > failed command, so this gets used. I don't think that this design > requires a new DDL either as we could reset all 'w' and 'e' to 'd' if > using ALTER TABLE SET WITHOUT CLUSTER on a partitioned table. For > CLUSTER with USING, the partitioned index selected and its leaves are > switched to 'w', similarly. > - Then do the work for all the partitions, one partition per > transaction, keeping 'w'. > - In a last transaction, switch all the partitions from 'w' to 'e', > resetting any existing 'e'.
Honestly, I think you're over-thinking and over-engineering indisclustered. If "clusteredness" was something we offered to maintain across DML, I think that might be important to provide stronger guarantees. As it is now, I don't think this patch is worth changing the catalog definition. > ALTER TABLE CLUSTER ON should also be a supported operation, where 'e' > gets switched for all the partitions in a single transaction. Of > course, this should not work for an invalid index. Even with such a > design, I got to wonder if there could be cases where a user does > *not* want to cluster the leaves of a partition tree with the same > index. If that were to happen, the partition tree may need a redesign > so making things work so as we force partitions to inherit what's > wanted for the partitioned table makes the most sense to me. I wondered the same thing: should clustering a parent index cause the the child indexes to be marked as clustered ? Or should it be possible for an intermediate child to say (marked as) clustered on some other index. I don't have strong feelings, but the patch currently sets indisclustered as a natural consequence of the implementation, so I tentatively think that's what's right. After all, CLUSTER sets indisclustered without having to also say "ALTER..CLUSTER ON". This is relevant to the question I raised about unsetting indisclustered for each indexes *parent* when clustering on a different index. I think it would be strange if we refused "ALTER..CLUSTER ON" for a partition just because a different partitioned index was set clustered. We'd clear that, like always, and then (in my proposal) also clear its parents "indisclustered". I still don't think that's essential, though. > By the way, I mentioned that previously, but this thread got used for > REINDEX that is finished, and now we have a discussion going on with > CLUSTER. There is also stuff related to CIC and DIC. There is also > only one CF entry for all that. I really think that this work had > better be split into separate threads, with separate CF entries. Do > you mind if I change the contents of the CF app so as the existing > item is renamed for REINDEX, that got committed? Then we could create > a new entry for CIC/DIC (it may make sense to split these two as > well, but I am not if there are any overlaps between the APIs of the > two), and a new entry for CLUSTER, depending on the state of the work. > The original subject of the thread is also unrelated, this makes the > review process unnecessarily complicated, and that's really > confusing. Each discussion should go into its own, independent, > thread. I didn't think it's worth the overhead of closing and opening more CFs. But I don't mind. -- Justin