On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote: >> The question I have is how would we deal with a foreign server that is >> not available for longer duration due to crash, longer network outage >> etc. Example is the foreign server crashed/got disconnected after >> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain >> blocked for much longer duration without user having an idea of what's >> going on. May be we should add some timeout. > > After more thought, I agree with adding some timeout. I can image > there are users who want the timeout, for example, who cannot accept > even a few seconds latency. If the timeout occurs backend unlocks the > foreign transactions and breaks the loop. The resolver process will > keep to continue to resolve foreign transactions at certain interval.
I don't think a timeout is a very good idea. There is no timeout for synchronous replication and the issues here are similar. I will not try to block a patch adding a timeout, but I think it had better be disabled by default and have very clear documentation explaining why it's really dangerous. And this is why: with no timeout, you can count on being able to see the effects of your own previous transactions, unless at some point you sent a query cancel or got disconnected. With a timeout, you may or may not see the effects of your own previous transactions depending on whether or not you hit the timeout, which you have no sure way of knowing. >>> transactions after the coordinator server recovered. On the other >>> hand, for the reading a consistent result on such situation by >>> subsequent reads, for example, we can disallow backends to inquiry SQL >>> to the foreign server if a foreign transaction of the foreign server >>> is remained. >> >> +1 for the last sentence. If we do that, we don't need the backend to >> be blocked by resolver since a subsequent read accessing that foreign >> server would get an error and not inconsistent data. > > Yeah, however the disadvantage of this is that we manage foreign > transactions per foreign servers. If a transaction that modified even > one table is remained as a in-doubt transaction, we cannot issue any > SQL that touches that foreign server. Can we occur an error at > ExecInitForeignScan()? I really feel strongly we shouldn't complicate the initial patch with this kind of thing. Let's make it enough for this patch to guarantee that either all parts of the transaction commit eventually or they all abort eventually. Ensuring consistent visibility is a different and hard project, and if we try to do that now, this patch is not going to be done any time soon. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company