On Tue, Oct 4, 2016 at 4:02 AM, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > [ latest patch set ]
Reviewing 0003: + This form attaches an existing table (partitioned or otherwise) as (which might itself be partitioned) + partition of the target table. Partition bound specification must + correspond with the partition method and the key of the target table. The partition bound specification must correspond to the partitioning method and partitioning key of the target table. + The table being attached must have all the columns of the target table + with matching types and no more. Also, it must have all the matching The table to be attached must have all of the same columns as the target table and no more; moreover, the column types must also match. + with matching types and no more. Also, it must have all the matching + constraints as the target table. That includes both <literal>NOT NULL</> + and <literal>CHECK</> constraints. If some <literal>CHECK</literal> + constraint of the table being attached is marked <literal>NO INHERIT</literal>, + the command will fail; such constraints must either be dropped or + recreated without the <literal>NO INHERIT</literal> mark. Why all of these requirements? We could instead perform a scan to validate that the constraints are met. I think the way this should work is: 1. ATTACH PARTITION works whether matching NOT NULL and CHECK constraints are present or not. 2. If all of the constraints are present, and a validated constraint matching the implicit partitioning constraint is also present, then ATTACH PARTITION does not scan the table to validate constraints; otherwise, it does. 3. NO VALIDATE is not an option. + Currently <literal>UNIQUE</literal>, <literal>PRIMARY KEY</literal>, and + <literal>FOREIGN KEY</literal> constraints are not considered, but that + might change in the future. Really? Changing that sounds impractical to me. + This form detaches specified partition of the target table. The + detached partition continues to exist as a standalone table with no ties + remaining with the target table. continues to exist as a standalone table, but no longer has any ties to the table from which it was detached. + Note that if a partition being detached is itself a partitioned table, + it continues to exist as such. You don't really need to say this, I think. All of the properties of the detached table are retained, not only its partitioning status. You wouldn't like it if I told you to document "note that if a partition being detached is unlogged, it will still be unlogged". To add the table as a new child of a parent table, you must own the - parent table as well. + parent table as well. That applies to both adding the table as a + inheritance child of a parent table and attaching a table as partition to + the table. To add the table as a new child of a parent table, or as a new partition of a partitioned table, you must own the parent table as well. + The name of the table to attach as a new partition to or detach from this table. s/to or/or to/ + <literal>NO VALIDATE</> option is spcified. Typo, but see comments above about nuking this altogether. A recursive <literal>DROP COLUMN</literal> operation will remove a descendant table's column only if the descendant does not inherit that column from any other parents and never had an independent - definition of the column. A nonrecursive <literal>DROP + definition of the column (which always holds if the descendant table + is a partition). A nonrecursive <literal>DROP COLUMN</literal> (i.e., <command>ALTER TABLE ONLY ... DROP COLUMN</command>) never removes any descendant columns, but - instead marks them as independently defined rather than inherited. + instead marks them as independently defined rather than inherited, + unless the descendant table is a partition. This is a hairy explanation. I suggest that the documentation say this (and that the code implement it): A nonrecursive DROP TABLE command will fail for a partitioned table, because all partitions of a table must have the same columns as the partitioning root. - that are not marked <literal>NO INHERIT</>. + that are not marked <literal>NO INHERIT</> which are unsupported if + the table is a partitioned table. I think you can omit this hunk. + If <literal>PARTITION OF</literal> clause is specified then the table is + created as a partition of <literal>parent_table</literal> with specified + bounds. However, unlike regular tables, one cannot specify + <literal>PARTITION BY</literal> clause which means foreign tables can + only be created as leaf partitions. I'd delete the sentence beginning with "However". + Create foreign table <structname>measurement_y2016m07</>, which will be + accessed through the server <structname>server_07</>, that is partition + of the range partitioned table <structname>measurement</>: s/, that is/ as a/ +<phrase>and <replaceable class="PARAMETER">partition_bound_spec</replaceable> is:</phrase> + +FOR VALUES { <replaceable class="PARAMETER">list_spec</replaceable> | <replaceable class="PARAMETER">range_spec</replaceable> } I think you can inline the definitions of list_spec and range_spec here instead of making them separate productions, and I think that would be preferable. FOR VALUES { IN ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) | START <replaceable class="PARAMETER">lower-bound</replaceable> [ INCLUSIVE | EXCLUSIVE ] END <replaceable class="PARAMETER">upper-bound</replaceable> [ INCLUSIVE | EXCLUSIVE ] } + parent table (name optionally schema-qualified). Parenthetical phrase isn't needed. + A partition bound specification must be present and must correspond with + partition method and key of the parent table. It is checked using the + specification that the new partition does not overlap with any existing + partitions of the parent. The partition bound specification must correspond to the partitioning method and partitioning key of the parent table, and must not overlap with any existing partition of that parent. + clause, if any. Defaults and constraints can optionally be specified + for each of the inherited columns, which override those in the parent. Surely not. You can't "override" an inherited constraint or an inherited default. The child may have extra constraints not present in the parent, and may have different defaults when it is directly targeted by an insertion, but it can't possibly override the parent defaults. + One can also specify table constraints, in addition to those inherited + from the parent. Note that all subsequent schema modifications to the + parent propagate to partition. The first part of this seems right, but then what's going on with the reference to constraints in the previous sentence? (See previous review comment.) The second sentence I would delete (but see below). + <para> + Any data row subsequently inserted into the parent table is mapped to + and stored in the partition, provided partition key of the row falls + within the partition bounds. + </para> How about: Rows inserted into a partitioned table will be automatically routed to the correct partition. If no suitable partition exists, an error will occur. + <para> + A partition is dropped or truncated when the parent table is dropped or + truncated. Dropping it directly using <literal>DROP TABLE</literal> + will fail; it must first be <firstterm>detached</> from the parent. + However, truncating a partition directly works. + </para> How about: A partition must have the same column names and types as the table of which it is a partition. Therefore, modifications the column names or types of the partitioned table will automatically propagate to all children, as will operations such as TRUNCATE which normally affect a table and all of its inheritance children. It is also possible to TRUNCATE a partition individually, just as for an inheritance child. Insisting that you can't drop a child without detaching it first seems wrong to me. If I already made this comment and you responded to it, please point me back to whatever you said. However, my feeling is this is flat wrong and absolutely must be changed. - if (is_no_inherit) + + /* Discard the NO INHERIT flag if the relation is a partition */ + if (is_no_inherit && !rel->rd_rel->relispartition) Something about this seems fishy. In StoreRelCheck(), you disallow the addition of a NO INHERIT constraint on a partition, but here you seem happy to accept it and ignore the NO INHERIT property. That doesn't seem consistent. We should adopt a consistent policy about what to do about such constraints, and I submit that throwing an error is better than silently changing things, unless you have some reason for doing that which I'm not seeing. Anyway, we should have the same behavior in both cases. We should also standardize on what value of conislocal and coninhcount children end up with; presumably the latter should be 1, but I'm not sure if the former should be true or false. In either case, anything that can vary between children probably needs to be dumped, so let's enforce that it doesn't so we don't have to dump it. I'm not sure whether the code here achieves those objectives, though, and note the comment in the function header about making sure the logic here matches MergeConstraintsIntoExisting. I think the overriding principle here should be: If you attach a table as a partition, it must not be part of a standard inheritance hierarchy, and it must not be a partition of any other table. It can, however, be partitioned itself. If you later detach a partition, it ends up as a standalone table with a copy of each constraint it had as a partition - probably including the implicit partition constraint. The DBA can drop those constraints if they're not wanted. I wonder if it's really a good idea for the partition constraints to be implicit; what is the benefit of leaving those uncatalogued? + * Depending on whether the relation in question is list or range + * partitioned, one of the fields is set. + */ +typedef struct BoundCollectionData +{ + struct ListInfo *listinfo; + struct RangeInfo *rangeinfo; +} BoundCollectionData; This seems like an odd design. First, when you have a pointer to either of two things, the normal tool for that in C would be a union, not a struct. Second, in PostgreSQL we typically handle that by making both of the things nodes and then you can use IsA() or switch on nodeTag() to figure out what you've got. Third, the only place this is used at least in 0003 is as part of PartitionDescData, which only has 3 members, so if you were going to do it with 2 members, you could just include these two members directly. Considering all of the foregoing, I'd suggest making this a union and including partstrategy in PartitionDescData. I think that the names ListInfo and RangeInfo are far too generic for something that's specific to partitioning. +/* + * Range bound collection - sorted array of ranges of partitions of a range + * partitioned table + */ +typedef struct RangeInfo +{ + struct PartitionRange **ranges; +} RangeInfo; + +/* One partition's range */ +typedef struct PartitionRange +{ + struct PartitionRangeBound *lower; + struct PartitionRangeBound *upper; +} PartitionRange; This representation doesn't seem optimal, because in most cases the lower bound of one partition will be the upper bound of the next. I suggest that you flatten this representation into a single list of relevant bounds, each flagged as to whether it is exclusive and whether it is finite; and a list with one more element of bounds. For example, suppose the partition bounds are [10, 20), [20, 30), (30, 40), and [50, 60). You first construct a list of all of the distinct bounds, flipping inclusive/exclusive for the lefthand bound in each case. So you get: 10 EXCLUSIVE 20 EXCLUSIVE 30 EXCLUSIVE 30 INCLUSIVE 40 EXCLUSIVE 50 EXCLUSIVE 60 EXCLUSIVE When ordering items for this list, if the same item appears twice, the EXCLUSIVE copy always appears before INCLUSIVE. When comparing against an EXCLUSIVE item, we move to the first half of the array if we are searching for a value strictly less than that value; when comparing against an INCLUSIVE item, we move to the first half of the array if we are searching for a value less than or equal to that value. This is a list of seven items, so a binary search will return a position between 0 (less than all items) and 7 (greater than all items). So we need a list of 8 partition mappings, which in this case will look like this: -1, 0, 1, -1, 2, -1, 3, -1. In this particular example, there are only two adjacent partitions, so we end up with 7 bounds with this representation vs. 8 with yours, but in general I think the gains will be bigger. If you've got 1000 partitions and they're all adjacent, which is likely, you store 1000 bounds instead of 2000 bounds by doing it this way. + * Note: This function should be called only when it is known that 'relid' + * is a partition. Why? How about "Because this function assumes that the relation whose OID is passed as an argument will have precisely one parent, it should only been called when it is known that the relation is a partition." + /* + * Translate vars in the generated expression to have correct attnos. + * Note that the vars in my_qual bear attnos dictated by key which carries + * physical attnos of the parent. We must allow for a case where physical + * attnos of a partition can be different from the parent. + */ + partexprs_item = list_head(key->partexprs); + for (i = 0; i < key->partnatts; i++) + { + AttrNumber attno = key->partattrs[i], + new_attno; + char *attname; + + if (attno != 0) + { + /* Simple column reference */ + attname = get_attname(RelationGetRelid(parent), attno); + new_attno = get_attnum(RelationGetRelid(rel), attname); + + if (new_attno != attno) + my_qual = (List *) translate_var_attno((Node *) my_qual, + attno, + new_attno); It can't really be safe to do this one attribute number at a time, or even if by some good fortune it can't be broken, it at least it seems extremely fragile. Suppose that you translate 0 -> 3 and then 3 -> 5; now the result is garbage. It's not very efficient to do this one attno at a time, either. + if (classform->relispartition) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("\"%s\" is a partition of \"%s\"", rel->relname, + get_rel_name(get_partition_parent(relOid))), + errhint("Use ALTER TABLE DETACH PARTITION to be able to drop it."))); + RangeVarCallbackForDropRelation should do only minimal sanity checks; defer this until after we have a relation lock. I didn't get all the way through this patch, so this is a pretty incomplete review, but it's late here and I'm out of steam for tonight. Some general comments: 1. I think that this patch seems to introduce an awful lot of new structures with confusingly similar names and purposes: PartitionBoundList, PartitionBoundRange, ListInfo, RangeInfo, PartitionRange, PartitionList, ListValue, RangePartition. You've got 4 different structures here that all have "Partition" and "Range" in the name someplace, including both PartitionRange and RangePartition. Maybe there's no way around that kind of duplication; after all there are quite a few moving parts here. But it seems like it would be good to try to simplify it. 2. I'm also a bit concerned about the fairly large amount of apparently-boilerplate code in partition.c, all having to do with how we create all of these data structures and translate between different forms of them. I haven't understood that stuff thoroughly enough to have a specific idea about how we might be able to get rid of any of it, and maybe there's no way. But that too seems like a topic for futher investigation. One idea is that maybe some of these things should be nodes that piggyback on the existing infrastructure in src/backend/nodes instead of inventing a new way to do something similar. 3. There are a lot of places where you have separate code for the range and list partitioning cases, and I'm suspicious that there are ways that code could be unified. For example, with list partitioning, you have a bunch of Datums which represent the specific values that can appear in the various partitions, and with range partitioning, you have a bunch of Datums that represent the edges of partitions. Well, if you used the same array for both purposes, you could use the same code to copy it. That would involve flattening away a lot of the subsidiary structure under PartitionDescData and pulling up stuff that is buried lower down into the main structure, but I think that's likely a good idea anyway - see also point #1. 4. I'm somewhat wondering if we ought to just legislate that the lower bound is always inclusive and the upper bound is always exclusive. The decision to support both inclusive and exclusive partition bounds is responsible for an enormous truckload of complexity in this patch, and I have a feeling it is also going to be a not-infrequent cause of user error. 5. I wonder how well this handles multi-column partition keys. You've just got one Datum flag and one is-finite flag per partition, but I wonder if you don't need to track is-finite on a per-column basis, so that you could partition on (a, b) and have the first partition go up to (10, 10), the second to (10, infinity), the third to (20, 10), the fourth to (20, infinity), and the last to (infinity, infinity). FWIW, Oracle supports this sort of thing, so perhaps we should, too. On a related note, I'm not sure it's going to work to treat a composite partition key as a record type. The user may want to specify a separate opfamily and collation for each column, not just inherit whatever the record behavior is. I'm not sure if that's what you are doing, but the relcache structures don't seem adapted to storing one Datum per partitioning column per partition, but rather just one Datum per partition, and I'm not sure that's going to work very well. Thanks for working on this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers