On Sun, Nov 8, 2015 at 6:35 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: > Sure. Now imagine that the pg_twophase entry is corrupted for this > transaction on one node. This would trigger a PANIC on it, and > transaction would not be committed everywhere.
If the database is corrupted, there's no way to guarantee that anything works as planned. This is like saying that criticizing somebody's disaster recovery plan on the basis that it will be inadequate if the entire planet earth is destroyed. > I am aware of the fact > that by definition PREPARE TRANSACTION ensures that a transaction will > be committed with COMMIT PREPARED, just trying to see any corner cases > with the approach proposed. The DTM approach is actually rather close > to what a GTM in Postgres-XC does :) Yes. I think that we should try to learn as much as possible from the XC experience, but that doesn't mean we should incorporate XC's fuzzy thinking about 2PC into PG. We should not. One point I'd like to mention is that it's absolutely critical to design this in a way that minimizes network roundtrips without compromising correctness. XC's GTM proxy suggests that they failed to do that. I think we really need to look at what's going to be on the other sides of the proposed APIs and think about whether it's going to be possible to have a strong local caching layer that keeps network roundtrips to a minimum. We should consider whether the need for such a caching layer has any impact on what the APIs should look like. For example, consider a 10-node cluster where each node has 32 cores and 32 clients, and each client is running lots of short-running SQL statements. The demand for snapshots will be intense. If every backend separately requests a snapshot for every SQL statement from the coordinator, that's probably going to be terrible. We can make it the problem of the stuff behind the DTM API to figure out a way to avoid that, but maybe that's going to result in every DTM needing to solve the same problems. Another thing that I think we need to consider is fault-tolerance. For example, suppose that transaction commit and snapshot coordination services are being provided by a central server which keeps track of the global commit ordering. When that server gets hit by a freak bold of lightning and melted into a heap of slag, somebody else needs to take over. Again, this would live below the API proposed here, but I think it really deserves some thought before we get too far down the path. XC didn't start thinking about how to add fault-tolerance until quite late in the project, I think, and the same could be said of PostgreSQL itself: some newer systems have easier-to-use fault tolerance mechanisms because it was designed in from the beginning. Distributed systems by nature need high availability to a far greater degree than single systems, because when there are more nodes, node failures are more frequent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers