On Tue, May 16, 2017 at 4:25 PM, Jeff Davis <pg...@j-davis.com> wrote: > Why can't hash partitions be stored in tables the same way as we do TOAST? > That should take care of the naming problem.
Hmm, yeah, something like that could be done, but every place where you are currently allowed to refer to a partition by name would have to be be changed to accept some other syntax for specifying the partition. Even with all the things you propose to disallow, things like CLUSTER, VACUUM, ANALYZE, etc. would still have to be accepted on individual partitions. I suspect even CREATE INDEX and DROP INDEX would need to be accepted on individual partitions, because an index on one partition somehow becomes bloated while the corresponding indexes on other partitions are OK, you'll want to create a replacement index concurrently and drop the old one. Of course, for similar reasons, you'd need some way for \d on the parent to display information on indexes on all the children, and all of that output would have to frobbed to use whatever syntax is now required, in lieu of a name, to use an individual partition. Error messages would have to be adjusted in probably quite a few places to use the new notation, too. And on and on. It's not impossible to do, but we could end up chasing down loose ends for a very long time. Beyond that, I think it's a bad idea to make hash partitions behave completely differently from list and range partitions. That's a lot of code extra code to maintain, and a lot of extra notional complexity for users, for really not a lot of benefit. I think we're taking a significant but not overwhelming problem -- our current hash functions aren't portable -- and through a chain of logical links eventually ending up with the conclusion that the design of partitioning needs to be totally overhauled. I want to resist that conclusion. I'm not saying that the problem isn't a problem, or that there's not some logic to each step in the chain, but it's not that hard to blow a small problem up into a huge one by assuming the worst possible consequences or the tightest possible requirements at each step. http://tvtropes.org/pmwiki/pmwiki.php/Main/ForWantOfANail is not an argument for stricter regulation of the blacksmith industry. >> If Java has portable hash functions, why can't we? > > Java standardizes on a particular unicode encoding (utf-16). Are you > suggesting that we do the same? Or is there another solution that I am > missing? Well, I've already said several times (and Peter Eisentraut has agreed) that we don't really need the hash functions to be portable across encodings. I think there are at least three good arguments for that position. First, as Peter Geoghegan points out, the word is increasingly standardizing on Unicode, and that trend seems likely to continue. I strongly suspect that UTF-8 is the most common database encoding by a wide margin. There may occasionally be reasons to avoid it if, for example, you're using one of the Eastern languages that doesn't play entirely nicely with UTF-8, or if you happen to be storing a large number of characters that can be represented in a single byte in some other encoding but which require multiple bytes in UTF-8, but for an awful lot of people UTF-8 just works and there's no need to think any further. So, a lot of people will never hit the problem of needing to migrate a database between encodings because they'll just use UTF-8. Second, if the previous argument turns out to be wrong and the world abandons UTF-8 in favor of some new and better system (UTF-9?), or if users frequently want to make some other encoding conversion like Tom's original example of LATIN1 -> UTF-8, we've already got a proposed workaround for that case which seems like it will work just fine. Just dump your data with pg_dump --insert-hash-partitioned-data-into-parent and reload on the new system. This isn't absolutely guaranteed to work if you've done something silly that will make the load of a particular row work on one partition and fail on some other one, but you probably won't do that because it would be dumb. Also, it will be a bit slower than a regular dump-and-reload cycle because tuple routing isn't free. Neither of these problems really sound very bad. If we're going to start fixing things that could cause database migrations/upgrades to occasionally fail in corner cases or run more slowly than expected, there's a long list of things that you can do in your DDL that will make pg_upgrade bomb out, and many of them are things that bite users with some regularity (e.g. tablespaces inside the data directory or other tablespaces, dependencies on system objects that are changed in the new version, ALTER USER .. SET ROLE). For whatever reason, we haven't viewed those warts as really high-priority items in need of fixing; in some cases, instead of actually trying to improve usability, we've all but mocked the people reporting those issues for having the temerity to do configure the system in a way that we think isn't very smart. How can we then turn around and say "it's absolutely unacceptable for there to be any case where dump and reload fails even if there's a workaround switch for pg_dump that will almost always solve the problem"? That's holding hash partitioning to a stricter standard than our existing features, which seems unjustified. Third, the fact that we support multiple encodings ought to be a strong point of PostgreSQL, not an excuse for running away from features. This same issue came up with JSON support, and the question was, well, what do we do about the places where JSON assumes that everything is UTF-8, like in \uXXXX escape sequences, given that we support multiple encodings? We could have answered that question by giving up and saying that JSON support is just too hard in our multi-encoding environment, but we found some solution that let us move forward. I think the result is that JSON is not really quite per-spec unless you are using UTF-8 as your encoding, but I am confident that it's better to have made that trade-off than to just not support JSON at all. Similarly, I think that if we use the fact that we support multiple encodings either as an excuse for why we shouldn't have hash partitioning at all or for why the design needs to be vastly more complex than would otherwise be required, we've made a liability of what should be an asset. Tom's willing to make those arguments, but he doesn't think hash partitioning is worthwhile in the first place. If we believe it is worthwhile (and I do), then we should be looking for a set of design decisions that make implementing it reasonably practical, maybe with some trade-offs, rather than a set of decisions that increase the complexity to the point where it will take us 5-10 years to unravel all the problems. Also, it's worth noting that our decision here doesn't need to be monolithic. If we want, we can ship one opfamily that hashes strings in the obvious way - i.e. fast but not portable across encodings - and another that converts the input string to UTF-8 and then hashes the result. The latter will be somewhat slow, but it will be portable across encodings, and I don't have any problem at all with providing it if somebody wants to do the legwork. It'll also fail in any encodings that can't be converted to UTF-8, but it seems a bit much to insist that hash partitioning must solve a problem that the Unicode consortium hasn't managed to overcome. Anyway, if we do something like that, then users can choose whether they want the portability for which Tom advocates or the speed which I believe most users will prefer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers