Hi Tom, Thanks for the feedback. I should clarify the use case - we're not mixing read-write and read-only hosts under one DNS name by accident. This is intentional for HA failover. We run a PostgreSQL clusters with streaming replication. After a failover, the old primary becomes a standby and vice versa. The challenge is: how do clients find the new primary? Current options:
1. Update DNS on every failover - operationally complex, TTL delays, requires automation 2. Consul/etcd - adds operational complexity and another failure domain 3. Multiple hosts in connection string - requires application changes when cluster topology changes (e.g., adding a new standby) The proposed approach: * Single A-record (db.internal) pointing to all cluster member IPs * Clients connect with host=db.internal target_session_attrs=read-write * libpq tries each IP until it finds the primary IIUC this​ is how JDBC's targetServerType=primary works - it iterates through all resolved addresses. The "useless connection attempts" are actually the feature: it's probing to find the right server, same as when you specify multiple hosts explicitly. The only difference from host=pg1,pg2,pg3 is that DNS provides the list instead of the connection string. From libpq's perspective, why should it matter where the address list came from? ________________________________ From: Tom Lane <[email protected]> Sent: Thursday, March 5, 2026 2:55 PM To: Evgeny Kuzin <[email protected]> Cc: [email protected] <[email protected]> Subject: Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch Evgeny Kuzin <[email protected]> writes: > We've been running into an issue with "target_session_attrs" when using > dns-based service discovery. Currently, when libpq connects to a host with > multiple A-records and the connection succeeds but is rejected due to > target_session_attrs mismatch (e.g., connecting to a read-only server with > target_session_attrs=read-write), it skips all remaining addresses for that > hostname and moves directly to the next host in the connection string. > Looking at git history, I found this was a deliberate choice by Robert Haas > in commit 721f7bd3cbc (2016), where he noted "I changed Mithun's patch to > skip all remaining IPs for a host if we reject a connection based on this new > parameter." The original mailing list discussion is at [1], though I wasn't > able to find a clear explanation of why this approach was preferred over > trying all addresses. > This makes it impractical to use a single multi-A-record DNS name pointing to > all cluster members with target_session_attrs=read-write to find the primary > - only the first responding IP is tried before giving up on that hostname. > The attached patch changes the behavior to try all addresses for a hostname > before moving to the next host, matching the existing behavior for connection > failures. This would enable simpler DNS-based service discovery without > requiring external tools like Consul or explicit multi-host connection > strings. TBH, I'd say that your DNS setup is broken and you should fix it. It makes no sense to have the same DNS entry pointing to both read-write and read-only hosts. The proposed patch will mainly result in useless connection attempts in more-sanely-constructed setups. regards, tom lane
