Hi, > As far as I can think of, it should probably be a single background task > > checking whether the server is down. If so, sending an invalidation > message > > to all the backends such that related backends could act on the > > invalidation and throw an error. This is to cover the use-case you > > described on [1]. > > Indeed your approach covers the use case I said, but I'm not sure whether > it is really good. > In your approach, once the background worker process will manage all > foreign servers. > It may be OK if there are a few servers, but if there are hundreds of > servers, > the time interval during checks will be longer. >
I expect users typically will have a lot more backends than the servers. We can have a threshold for spinning a new bg worker (e.g., every 10 servers gets a new bg worker etc.). Still, I think that'd be an optimization that is probably not necessary for the majority of the users? > Currently, each FDW can decide whether we do health checks or not per the > backend. > For example, we can skip health checks if the foreign server is not used > now. > The background worker cannot control such a way. > Based on the above, I do not agree that we introduce a new background > worker and make it to do a health check. > Again, the definition of "health check" is probably different for me. I'd expect the health check to happen continuously, ideally keeping track of how many consecutive times it succeeded and/or last time it failed/succeeded etc. A transaction failing with a bad error message (or holding some resources locally until the transaction is committed) doesn't sound essential to me. Is there any specific workload are you referring for optimizing to rollback a transaction earlier if a remote server dies? What kind of workload would benefit from that? Maybe there is, but not clear to me and haven't seen discussed on the thread (sorry if I missed). I'm trying to understand if we are trying to solve a problem that does not really exists. I'm bringing this up, because I often deal with architectures where there is a local node and remote transaction on different Postgres servers. And, I have not encountered many (or any) pattern that'd benefit from this change much. In fact, I think, on the contrary, this might add significant overhead for OLTP type of high query throughput systems. > Moreover, methods to connect to foreign servers and check health are > different per FDW. > In terms of mysql_fdw [1], we must do mysql_init() and > mysql_real_connect(). > About file_fdw, we do not have to connect, but developers may want to > calculate checksum and compare. > Therefore, we must provide callback functions anyway. > > I think providing callback functions is useful for any case. Each fdw (or in general extension) should be able to provide its own "health check" function. Thanks, Onder KALACI