On Tue, Apr 5, 2016 at 2:52 PM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > Robert Haas wrote: >> Now, let's suppose that the user sets up a sharded table and then >> says: SELECT a, SUM(b), AVG(c) FROM sometab. At this point, what we'd >> like to have happen is that for each child foreign table, we go and >> fetch partially aggregated results. Those children might be running >> any version of PostgreSQL - I was not assuming that we'd insist on >> matching major versions, although of course that could be done - and >> there would probably need to be a minimum version of PostgreSQL >> anyway. They could even be running some other database. As long as >> they can spit out partial aggregates in a format that we can >> understand, we can deserialize those aggregates and run combine >> functions on them. But if the remote side is, say, MariaDB, it's >> probably much easier to get it to spit out something that looks like a >> PostgreSQL array than it is to make it spit out some bytea blob that's >> in an entirely PostgreSQL-specific format. > > Basing parts of the Postgres sharding mechanism on FDWs sounds > acceptable. Trying to design things so that *any* FDW can be part of a > shard, so that you have some shards in Postgres and other shards in > MariaDB, seems ludicrous to me. Down that path lies madness.
I'm doubtful that anyone wants to do the work to make that happen, but I don't agree that we shouldn't care about whether it's possible. Extensibility is a feature of the system that we've worked hard for, and we shouldn't casually surrender it. For example, postgres_fdw now implements join pushdown, and I suspect few other FDW authors will care to do the work to add similar support to their implementations. But some may, and it's good that the code is structured in such a way that they have the option. Actually, though, MariaDB is a bad example. What somebody's much more likely to want to do is have PostgreSQL as a frontend accessing data that's actually stored in Hadoop. There are lots of SQL interfaces to Hadoop already, so it's clearly a thing people want, and our SQL is the best SQL (right?) so if you could put that on top of Hadoop somebody'd probably like it. I'm not planning to try it myself, but I think it would be cool if somebody else did. I have been very pleased to see that many of the bits and pieces of infrastructure that I created for parallel query (parallel background workers, DSM, shm_mq) have attracted quite a bit of interest from other developers for totally unrelated purposes, and I think anything we do around horizontal scalability should be designed the same way: the goal should be to work with PostgreSQL on the other side, but the bits that can be made reusable for other purposes should be so constructed. > In fact, trying to ensure cross-major-version compatibility already > sounds like asking for too much. Requiring matching major versions > sounds a perfectly acceptable restricting to me. I disagree. One of the motivations (not the only one, by any means) for wanting logical replication in PostgreSQL is precisely to get around the fact that physical replication requires matching major versions. That restriction turns out to be fairly onerous, not least because when you've got a cluster of several machines you'd rather upgrade them one at a time rather than all at once. That's going to be even more true with a sharded cluster, which will probably tend to involve more machines than a replication cluster, maybe a lot more. If you say that the user has got to shut the entire thing down, upgrade all the machines, and turn it all back on again, and just hope it works, that's going to be really painful. I think that we should treat this more like we do with libpq, where each major release can add new capabilities that new applications can use, but the old stuff has got to keep working essentially forever. Maybe the requirements here are not quite so tight, because it's probably reasonable to say, e.g. that you must upgrade every machine in the cluster to at least release 11.1 before upgrading any machine to 11.3 or higher, but the fewer such requirements we have, the better. Getting people to upgrade to new major releases is already fairly hard, and anything that makes it harder is an unwelcome development from my point of view. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers