On 2015/12/18 3:56, Robert Haas wrote: > On Mon, Dec 14, 2015 at 2:14 AM, Amit Langote > <langote_amit...@lab.ntt.co.jp> wrote: >> Syntax to create a partitioned table (up to 2 levels of partitioning): >> >> CREATE TABLE foo ( >> ... >> ) >> PARTITION BY R/L ON (key0) >> SUBPARTITION BY R/L ON (key1) >> [(PARTITION foo_1 FOR VALUES <val> [<storage_params>] [<tblspc>] >> [(SUBPARTITION foo_1_1 FOR VALUES <val> [<storage_params>] [<tblspc>], >> ...)], ...)]; >> >> The above creates two pg_partitioned_rel entries for foo with partlevel 0 >> and 1, for key0 and key1, respectively. For foo_1 and foo_1_1, this >> creates pg_partition entries, with foo and foo_1 as partparent, >> respectively. >> >> Why just 2 levels? - it seems commonplace and makes the syntax more >> intuitive? I guess it might be possible to generalize the syntax for >> multi-level partitioning. Ideas? If we want to support the notion of >> sub-partition template in future, that would require some thought, more >> importantly proper catalog organization for the same. > > I do not think this is a particularly good idea. You're going to need > to dump each partition separately at least in --binary-upgrade mode, > because each is going to have its own magic OIDs that need to be > restored, and also because there will most likely be at least some > properties that are going to vary between partitions. You could > require that every partition have exactly the same set of columns, > constraints, rules, triggers, policies, attribute defaults, comments, > column comments, and everything else that might be different from one > partition to another, and further require that they have exactly > matching indexes. It would take a fair amount of code to prohibit all > that, but it could be done. However, do we really want that? There > may well be some things were we want to enforce that the parent and > the child are exactly identical, but I doubt we want that for > absolutely every property, current and future, of the partition. And > even if you did, because of the --binary-upgrade stuff, you still need > to to be able to dump them separately. > > Therefore, I believe it is a whole lot better to make the primary > syntax for table partitioning something where you issue a CREATE > statement for the parent and then a CREATE statement for each child. > If we want to also have a convenience syntax so that people who want > to create a parent and a bunch of children in one fell swoop can do > so, fine.
Regarding --binary-upgrade dump mode, how about we teach pg_dump to dump each partition separately using ALTER TABLE parent ADD PARTITION especially for the "magic OIDs" reason? It may very well be a CREATE PARTITION-style command though. Note that each such command could specify properties that can be different per partition. I said in my email, perhaps not so clearly, that "only" WITH options, tablespace and relpersistence can be different per partition. But I can see why that may be severely restrictive at this stage. By the way, what do you think about SUBPARTITION keyword-based syntax for multi-level partitioning? Should we instead require that each partition has its own PARTITION BY in its creation command? > > I would not choose to model the syntax for creating partitions on > Oracle. I don't find that syntax particularly nice or easy to > remember. I say PARTITION BY RANGE, and then inside the parentheses I > use the PARTITION keyword for each partition? Really? But I think > copying the style while having the details be incompatible is an even > worse idea. As for the convenience syntax (if at all), how about: CREATE TABLE foo ( ... ) PARTITION BY ... ON (...) SUBPARTITION BY ... ON (...) opt_partition_list; where opt_partition_list is: PARTITIONS ( partname FOR VALUES ... [WITH] [TABLESPACE] opt_subpart_list [, ...] ) where opt_subpart_list is: SUBPARTITIONS ( subpartname FOR VALUES ... [WITH] [ TABLESPACE] [, ...] ) PARTITIONS, SUBPARTITIONS would be new unreserved keywords. Or we can do away with them. >> What about ALTER TABLE? - Instead of allowing ALTER TABLE to be applied >> directly to partitions on case-by-case basis (they are tables under the >> hood after all), we should restrict AT to the master table. Most of the AT >> changes implicitly propagate from the master table to its partitions. Some >> of them could be directly applied to partitions and/or sub-partitions such >> as rename, storage manipulations like - changing tablespace, storage >> parameters (reloptions), etc.: >> >> ALTER TABLE foo >> RENAME PARTITION <partition-name> TO <new-name>; >> >> ALTER TABLE foo >> RENAME SUBPARTITION <sub-partition-name> TO <new-name>; >> >> ALTER TABLE foo >> SET TABLESPACE ... [DEFAULT] FOR PARTITION <partition-name>; >> >> ALTER TABLE foo >> SET TABLESPACE ... FOR SUBPARTITION <sub-partition-name>; >> >> ALTER TABLE foo >> SET (storage_parameter = value) [DEFAULT] FOR PARTITION <partition-name>; >> >> ALTER TABLE foo >> SET (storage_parameter = value) FOR SUBPARTITION <sub-partition-name>; > > I don't think this is a very good idea. This is basically proposing > that for every DDL command that you can apply to a table, you have to > spell it differently for a partition. That seems like a lot of extra > work for no additional functionality. I didn't mean to propose the special syntax for every available DDL command. Just those that modify properties that could be different per partition. I thought there would be very few such properties and hence not a lot of special commands. But then again, I may be wrong about what those properties are. > >> By the way, should we also allow changing the logging of >> partitions/sub-partitions as follows? > > Again, I think you're coming at this from the wrong direction. > Instead of saying we're going to disallow all changes to the > partitions and then deciding we need to allow certain changes after > all, I think we should allow everything that is currently allowed for > an inherited table and then decide which of those things we need to > prohibit, and why. For example, if you insist that a child table has > to have a tuple descriptor that matches the parent, that can improve > efficiency: Append won't need to project, and so on. But it now > becomes very difficult to support taking a stand-alone table and > making it a partition of an existing partitioned table, because the > set of dropped columns might not match. Having to give an error in > that case amounts to "we're sorry, we can't attach your partition to > the partitioning hierarchy because of some invisible state that you > can't see" isn't very nice. Now I'm not saying that isn't the right > decision, but I think the design choices here need to be carefully > thought about. Yeah, I am concerned about the ATTACH PARTITION USING TABLE case for the very point you mention. And I can see how it may result from the restrictive model I propose. FWIW, other databases impose a number of restrictions on the partition roll-in case but not sure if for the internal reasons we might want to. > > Stepping away from that particular example, a blanket prohibition on > changing any attribute of a child table seems like it will prohibit a > lot of useful things that really ought to work. And again, I don't > think it's a good idea to implement separate syntax for changing a > partition vs. changing a table. If I want to set a partition as > unlogged, I should be able to say ALTER TABLE partition_name UNLOGGED > or maybe ALTER PARTITION partition_name UNLOGGED, not be forced to use > some new grammar production that looks completely different. Okay. ALTER PARTITION may be the way to go. > >> What about index constraints, ie, PRIMARY KEY, UNIQUE and EXCLUSION >> constraints - 2 things must be clear here: cannot create these constraints >> on individual partitions and all partition columns (key0 + key1) must be >> the leading columns of the key. On a related note, creating index on the >> master table should create the index on all "leaf" partitions. The index >> on the mater table itself would be just a logical index. Should we allow >> creating or dropping indexes on partitions directly? > > I don't find this to be particularly clear. You are assuming that > nobody wants to create a constraint that a certain value is unique > within a partition. That might not be a real common thing to want to > do, but it could certainly be useful to somebody, and the current > system with table inheritance allows it. For example, suppose that we > have orders partitioned on the order_date column, by month. The user > might want to create a UNIQUE index on order_id on each partition. > Maybe they start over with order_id 1 at the beginning of each month. > But even if, as is more likely, the order IDs keep counting up from > month to month, they don't want to be forced to include the whole > partitioning key in the index in order to have it marked UNIQUE. That > may be enough, in practice, to ensure the global uniqueness of order > IDs even though the system doesn't technically enforce it in all > cases. Okay, I didn't consider that one may want to create UNIQUE constraint per partition. Do they need to be allowed to be different per partition? Sorry, I could not understand your last sentence regarding global uniqueness. Given the restrictions on its definition, in what ways could the system fail to enforce it? > > If you want an index created on the parent to cascade down to all > children, that's a big project to get right. Suppose I create an > index on the parent table. After a while, I notice that it's getting > bloated, so I created another index with the same definition. Now, I > drop one of the two indexes. One of the two indexes from each child > table needs to go away, and moreover it can't be picked arbitrarily - > it has to be the one that was created at the same time as the parent > index I'm dropping. If you want it to behave like this, you need a > whole system of bookkeeping to make it work right. > > For version 1, I'd go the other way and prohibit index definitions on > the empty parent rels. Let people create indexes on the children as > they wish. In a future version, we can add stuff to cascade from > parents to children. Okay, I may be missing the subtleties of global uniqueness case. I agree we can leave it out of version 1. > >> It would be interesting to talk about features like SPLIT, MERGE, EXCHANGE >> but I'm inclined to leave them as future enhancements. For a functionality >> similar to EXCHANGE, there are commands like ATTACH/DETACH in the latest >> patch. We could extend them to also consider sub-partitions: > > We don't need to have these in the first version, but we have to make > some architectural decisions that affect how feasible they are to > implement and in which cases, as noted above. > >> One cannot define rules, triggers, and RLS policies on them. Although, >> AR triggers defined on a partitioned master table are propagated to the >> "leaf" partitions. > > What value do you see us getting out of restricting these particular things? Perhaps, I'm wrong in thinking that we should limit these to be associated with only the top-level partitioned table. Thanks a lot for the comments! Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers