On Mon, May 14, 2018 at 7:20 AM, Stas Kelvich <s.kelv...@postgrespro.ru> wrote: > Summarising, I think, that introducing some permanent connections to > postgres_fdw node will put too much burden on this patch set and that it will > be possible to address that later (in a long run such connection will be > anyway > needed at least for a deadlock detection). However, if you think that current > behavior + STO analog isn't good enough, then I'm ready to pursue that track.
I don't think I'd be willing to commit to a particular approach at this point. I think the STO analog is an interesting idea and worth more investigation, and I think the idea of a permanent connection with chatter that can be used to resolve deadlocks, coordinate shared state, etc. is also an interesting idea. But there are probably lots of ideas that somebody could come up with in this area that would sound interesting but ultimately not work out. Also, an awful lot depends on quality of implementation. If you come up with an implementation of a permanent connection for coordination "chatter", and the patch gets rejected, it's almost certainly not a sign that we don't want that thing in general. It means we don't want yours. :-) Actually, I think if we're going to pursue that approach, we ought to back off a bit from thinking about global snapshots and think about what kind of general mechanism we want. For example, maybe you can imagine it like a message bus, where there are a bunch of named channels on which the server publishes messages and you can listen to the ones you care about. There could, for example, be a channel that publishes the new system-wide globalxmin every time it changes, and another channel that publishes the wait graph every time the deadlock detector runs, and so on. In fact, perhaps we should consider implementing it using the existing LISTEN/NOTIFY framework: have a bunch of channels that are predefined by PostgreSQL itself, and set things up so that the server automatically begins publishing to those channels as soon as anybody starts listening to them. I have to imagine that if we had a good mechanism for this, we'd get all sorts of proposals for things to publish. As long as they don't impose overhead when nobody's listening, we should be able to be fairly accommodating of such requests. Or maybe that model is too limiting, either because we don't want to broadcast to everyone but rather send specific messages to specific connections, or else because we need a request-and-response mechanism rather than what is in some sense a one-way communication channel. Regardless, we should start by coming up with the right model for the protocol first, bearing in mind how it's going to be used and other things for which somebody might want to use it (deadlock detection, failover, leader election), and then implement whatever we need for global snapshots on top of it. I don't think that writing the code here is going to be hugely difficult, but coming up with a good design is going to require some thought and discussion. And, for that matter, I think the same thing is true for global snapshots. The coding is a lot harder for that than it is for some new subprotocol, I'd imagine, but it's still easier than coming up with a good design. As far as I can see, and everybody can decide for themselves how far they think that is, the proposal you're making now sounds like a significant improvement over the XTM proposal. In particular, the provisioning and deprovisioning issues sound like they have been thought through a lot more. I'm happy to call that progress. At the same time, progress on a journey is not synonymous with arrival at the destination, and I guess it seems to me that you have some further research to do along the lines you've described: 1. Can we hold back xmin only when necessary and to the extent necessary instead of all the time? 2. Can we use something like an STO analog, maybe as an optional feature, rather than actually holding back xmin? And I'd add: 3. Is there another approach altogether that doesn't rely on holding back xmin at all? For example, if you constructed the happens-after graph between transactions in shared memory, including actions on all nodes, and looked for cycles, you could abort transactions that would complete a cycle. (We say A happens-after B if A reads or writes data previously written by B.) If no cycle exists then all is well. I'm pretty sure it's been well-established that a naive implementation of this algorithm is terribly unperformant, but for example SSI works on this principle. It reduces the bookkeeping involved by being willing to abort transactions that aren't really creating a cycle if they look like they *might* create a cycle. Now that's an implementation *on top of* snapshots for the purpose of getting true serializability rather than a way of getting global snapshots per se, so it's not suitable for what you're trying do here, but I think it shows that algorithms based on cycle detection can be made practical in some cases, and so maybe this is another such case. On the other hand, this whole line of thinking could also be a dead end... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company