On Fri, May 12, 2017 at 2:45 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Yeah, that isn't really appetizing at all. If we were doing physical > partitioning below the user-visible level, we could make it fly. > But the existing design makes the partition boundaries user-visible > which means we have to insist that the partitioning rule is immutable > (in the strongest sense of being invariant across all installations > you could possibly wish to transport data between).
I think you're right. > Now, we already have some issues of that sort with range partitioning; > if say you try to transport data to another system with a different > sort ordering, you have probably got problems. But that issue already > existed with the old manual partitioning approach, ie if you have a > CHECK constraint like "(col >= 'foo' && col < 'gob')" then a transfer > to such a system could fail already. So I'm okay with that. But it > seems like hash partitioning is going to introduce new, exciting, and > hard-to-document-or-avoid portability gotchas. Do we really need > to go there? That is a good question. I think it basically amounts to this question: is hash partitioning useful, and if so, for what? Range and list partitioning inherently provide management benefits. For example, if you range-partition your data by year, then when you want to get rid of the oldest year worth of data, you can drop the entire partition at once, which is likely to be much faster than a DELETE statement. But hash partitioning provide no such benefits because, by design, the distribution of the data among the partitions is essentially random. Dropping one partition will not usually be a useful thing to do because the rows in that partition are logically unconnected. So, if you have a natural way of partitioning a table by range or list, that's probably going to be superior to hash partitioning, but sometimes there isn't an obviously useful way of breaking up the data, but you still want to partition it in order to reduce the size of the individual tables. That, in turn, allows maintenance operations to be performed each partition separately. For example, suppose your table is big enough that running CLUSTER or VACUUM FULL on it takes eight hours, but you can't really afford an eight-hour maintenance window when the table gets bloated. However, you can afford (on Sunday nights when activity is lowest) a window of, say, one hour. Well, if you hash-partition the table, you can CLUSTER or VACUUM FULL one partition a week for N weeks until you get to them all. Similarly, if you need to create an index, you can build it on one partition at a time. You can even add the index to one partition to see how well it works -- for example, does it turn too many HOT updates into non-HOT updates, causing bloat? -- and try it out before you go do it across the board. And an unpartitioned table can only accommodate one VACUUM process at a time, but a partitioned table can have one per partition. Partitioning a table also means that you don't need a single disk volume large enough to hold the whole thing; instead, you can put each partition in a separate tablespace or (in some exciting future world where PostgreSQL looks more like a distributed system) perhaps even on another server. For a table that is a few tens of gigabytes, none of this amounts to a hill of beans, but for a table that is a few tens of terabytes, the time it takes to perform administrative operations can become a really big problem. A good example is, say, a table full of orders. Imagine a high-velocity business like Amazon or UPS that has a constant stream of new orders, or a mobile app that has a constant stream of new user profiles. If that data grows fast enough that the table needs to be partitioned, how should it be done? It's often desirable to create partitions of about equal size and about equal hotness, and range-partitioning on the creation date or order ID number means continually creating new partitions, and having all of the hot data in the same partition. In my experience, it is *definitely* the case that users with very large tables are experiencing significant pain right now. The freeze map changes were a good step towards ameliorating some of that pain, but I think hash partitioning is another step in that direction, and I think there will be other applications as well. Even for users who don't have data that large, there are also interesting query-performance implications of hash partitioning. Joins between very large tables tend to get implemented as merge joins, which often means sorting all the data, which is O(n lg n) and not easy to parallelize. But if those very large tables happened to be compatibly partitioned on the join key, you could do a partitionwise join per the patch Ashutosh proposed, which would be cheaper and easier to do in parallel. Maybe a shorter argument for hash partitioning is that not one but two different people proposed patches for it within months of the initial partitioning patch going in. When multiple people are thinking about implementing the same feature almost immediately after the prerequisite patches land, that's a good clue that it's a desirable feature. So I think we should try to solve the problems, rather than giving up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers