Looks like this needed another rebase to account for the oauth commit. Rebase attached.
On Mon, Feb 24, 2025 at 9:38 AM Andrew Jackson <andrewjackson...@gmail.com> wrote: > Hi, > > Thank you for the review! > > Review Response > > - Made a first pass at a real commit message > - Fixed the condition on the if statement to use strcmp > - Added a test suite in the files `src/interfaces/libpq/t/ > 006_target_session_attr_dns.pl` and `src/interfaces/libpq/t/ > 007_load_balance_dns_check_all_addrs.pl` which checks the > target_session_attrs as when used with and without load balancing. > > Regarding the name of the variable itself I am definitely open to opinions > on this. I didn't put too much thought initially and just chose > `check_all_addrs`. I feel like given that it modifies the behaviour of > `target_session_attrs` ideally it should reference that in the name but > that would make that variable name very long: something akin to > `target_session_attrs_check_all_addrs`. > > Context > > I tested some drivers as well and found that pgx, psycopg, and > rust-postgres all traverse every IP address when looking for a matching > target_session_attrs. Asyncpg and psycopg2 on the other hand follow libpq > and terminate additional attempts after the first failure. Given this it > seems like there is a decent amount of fragmentation in the ecosystem as to > how exactly to implement this feature. I believe some drivers choose to > traverse all addresses because they have users target the same use case > outlined above. > > Thanks again, > Andrew Jackson > > On Sun, Feb 16, 2025 at 6:03 AM Andrey Borodin <x4...@yandex-team.ru> > wrote: > >> Hi Andrew! >> >> cc Jelte, I suspect he might be interested. >> >> > On 20 Nov 2024, at 20:51, Andrew Jackson <andrewjackson...@gmail.com> >> wrote: >> > >> > Would appreciate any feedback on the applicability/relevancy of the >> goal here or the implementation. >> >> Thank you for raising the issue. Following our discussion in Discord I'm >> putting my thoughts to list. >> >> >> Context >> >> A DNS record might return several IPs. Consider we have a connection >> string with "host=A,B", A is resolved to 1.1.1.1,2.2.2.2, B to >> 3.3.3.3,4.4.4.4. >> If we connect with "target_session_attrs=read-write" IPs 1.1.1.1 and >> 3.3.3.3 will be probed, but 2.2.2.2 and 4.4.4.4 won't (if 1.1.1.1 and >> 3.3.3.3 responded). >> >> If we enable libpq load balancing some random 2 IPs will be probed. >> >> IMO it's a bug, at least when load balancing is enabled. Let's consider >> if we can change default behavior here. I suspect we can't do it for >> "load_balance_hosts=disable". And even for load balancing this might be too >> unexpected change for someone. >> >> Further I only consider proposal not as a bug fix, but as a feature. >> >> In Discord we have surveyed some other drivers. >> pgx treats all IPs as different servers [1]. npgsql goes through all IPs >> one-by-one always [2]. PGJDBC are somewhat in a decision process [3] (cc >> Dave and Vladimir, if they would like to provide some input). >> >> >> Review >> >> The patch needs a rebase. It's trivial, so please fine attached. The >> patch needs real commit message, it's not trivial :) >> >> We definitely need to adjust tests [0]. We need to change >> 004_load_balance_dns.pl so that it tests target_session_attrs too. >> >> Some documentation would be nice. >> >> I do not like how this check is performed >> + if (conn->check_all_addrs >> && conn->check_all_addrs[0] == '1') >> Let's make it like load balancing is done [4]. >> >> Finally, let's think about naming alternatives for "check_all_addrs". >> >> I think that's enough for a first round of the review. If it's not a bug, >> but a feature - it's a very narrow window to get to 18. But we might be >> lucky... >> >> Thank you! >> >> >> Best regards, Andrey Borodin. >> >> [0] >> https://github.com/postgres/postgres/commit/7f5b19817eaf38e70ad1153db4e644ee9456853e#diff-b05b74d2a97d7f74d4311ba1702d732f0df1b101c6ac99c146b51215174cf3ffR94 >> [1] https://github.com/jackc/pgx/blob/master/pgconn/pgconn.go#L177 >> [2] >> https://github.com/npgsql/npgsql/blob/7f1a59fa8dc1ccc34a70154f49a768e1abf826ba/src/Npgsql/Internal/NpgsqlConnector.cs#L986 >> [3] https://github.com/pgjdbc/pgjdbc/pull/3012#discussion_r1408069450 >> [4] >> https://github.com/postgres/postgres/commit/7f5b19817eaf38e70ad1153db4e644ee9456853e#diff-8d819454e061b9d4cdae9c8922ded05753a629d70f2ac1de1d4f6d5a4aeb7f68R1660 >> >
From f2a3b1e51824da7b4a3eb096ed2215d64e617abd Mon Sep 17 00:00:00 2001 From: Andrey Borodin <amboro...@acm.org> Date: Fri, 14 Feb 2025 23:13:55 +0500 Subject: [PATCH v4] Add option to check all addrs for target_session. The current behaviour of libpq with regard to searching for a matching target_session_attrs in a list of addrs is that after successfully connecting to a server if the servers session_attr does not match the request target_session_attrs no futher address is considered. This behaviour is extremely inconvenient in environments where the user is attempting to implement a high availability setup without having to modify DNS records or a proxy server config. This PR adds a client side option called check_all_addrs. When set to 1 this option will tell libpq to continue checking any remaining addresses even if there was a target_session_attrs mismatch on one of them. Author: Andrew Jackson Reviewed-by: Andrey Borodin Discussion: https://www.postgresql.org/message-id/flat/CAKK5BkESSc69sp2TiTWHvvOHCUey0rDWXSrR9pinyRqyfamUYg%40mail.gmail.com --- doc/src/sgml/libpq.sgml | 33 +++++ src/interfaces/libpq/fe-connect.c | 25 ++-- src/interfaces/libpq/libpq-int.h | 1 + .../libpq/t/006_target_session_attr_dns.pl | 129 ++++++++++++++++++ .../t/007_load_balance_dns_check_all_addrs.pl | 128 +++++++++++++++++ 5 files changed, 306 insertions(+), 10 deletions(-) create mode 100644 src/interfaces/libpq/t/006_target_session_attr_dns.pl create mode 100644 src/interfaces/libpq/t/007_load_balance_dns_check_all_addrs.pl diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index ddb3596df83..c5d29d7a0f9 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -2483,6 +2483,39 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname </listitem> </varlistentry> + <varlistentry id="libpq-connect-check-all-addrs" xreflabel="check_all_addrs"> + <term><literal>check_all_addrs</literal></term> + <listitem> + <para> + Controls whether or not all addresses within a hostname are checked when trying to make a connection + when attempting to find a connection with a matching <xref linkend="libpq-connect-target-session-attr"/>. + + There are two modes: + <variablelist> + <varlistentry> + <term><literal>0</literal> (default)</term> + <listitem> + <para> + If a successful connection is made and that connection is found to have a + mismatching <xref linkend="libpq-connect-target-session-attr"/> do not check + any additional addresses and move onto the next host if one was provided. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><literal>1</literal></term> + <listitem> + <para> + If a successful connection is made and that connection is found to have a + mismatching <xref linkend="libpq-connect-target-session-attr"/> proceed + to check any additional addresses. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + </listitem> + </varlistentry> </variablelist> </para> </sect2> diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index d5051f5e820..bd0265c553e 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -373,6 +373,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = { {"scram_server_key", NULL, NULL, NULL, "SCRAM-Server-Key", "D", SCRAM_MAX_KEY_LEN * 2, offsetof(struct pg_conn, scram_server_key)}, + {"check_all_addrs", "PGCHECKALLADDRS", + DefaultLoadBalanceHosts, NULL, + "Check-All-Addrs", "", 1, + offsetof(struct pg_conn, check_all_addrs)}, /* OAuth v2 */ {"oauth_issuer", NULL, NULL, NULL, @@ -4362,11 +4366,11 @@ keep_going: /* We will come back to here until there is conn->status = CONNECTION_OK; sendTerminateConn(conn); - /* - * Try next host if any, but we don't want to consider - * additional addresses for this host. - */ - conn->try_next_host = true; + if (strcmp(conn->check_all_addrs, "1") == 0) + conn->try_next_addr = true; + else + conn->try_next_host = true; + goto keep_going; } } @@ -4417,11 +4421,11 @@ keep_going: /* We will come back to here until there is conn->status = CONNECTION_OK; sendTerminateConn(conn); - /* - * Try next host if any, but we don't want to consider - * additional addresses for this host. - */ - conn->try_next_host = true; + if (strcmp(conn->check_all_addrs, "1") == 0) + conn->try_next_addr = true; + else + conn->try_next_host = true; + goto keep_going; } } @@ -5047,6 +5051,7 @@ freePGconn(PGconn *conn) free(conn->oauth_client_id); free(conn->oauth_client_secret); free(conn->oauth_scope); + free(conn->check_all_addrs); termPQExpBuffer(&conn->errorMessage); termPQExpBuffer(&conn->workBuffer); diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index f36f7f19d58..27f162c1c29 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -432,6 +432,7 @@ struct pg_conn char *load_balance_hosts; /* load balance over hosts */ char *scram_client_key; /* base64-encoded SCRAM client key */ char *scram_server_key; /* base64-encoded SCRAM server key */ + char *check_all_addrs; /* whether to check all ips within a host or terminate on failure */ bool cancelRequest; /* true if this connection is used to send a * cancel request, instead of being a normal diff --git a/src/interfaces/libpq/t/006_target_session_attr_dns.pl b/src/interfaces/libpq/t/006_target_session_attr_dns.pl new file mode 100644 index 00000000000..914f6c472f4 --- /dev/null +++ b/src/interfaces/libpq/t/006_target_session_attr_dns.pl @@ -0,0 +1,129 @@ + +# Copyright (c) 2023-2025, PostgreSQL Global Development Group +use strict; +use warnings FATAL => 'all'; +use Config; +use PostgreSQL::Test::Utils; +use PostgreSQL::Test::Cluster; +use Test::More; + +if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/) +{ + plan skip_all => + 'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA'; +} + +# Cluster setup which is shared for testing both load balancing methods +my $can_bind_to_127_0_0_2 = + $Config{osname} eq 'linux' || $PostgreSQL::Test::Utils::windows_os; + +# Checks for the requirements for testing load balancing method 2 +if (!$can_bind_to_127_0_0_2) +{ + plan skip_all => 'load_balance test only supported on Linux and Windows'; +} + +my $hosts_path; +if ($windows_os) +{ + $hosts_path = 'c:\Windows\System32\Drivers\etc\hosts'; +} +else +{ + $hosts_path = '/etc/hosts'; +} + +my $hosts_content = PostgreSQL::Test::Utils::slurp_file($hosts_path); + +my $hosts_count = () = + $hosts_content =~ /127\.0\.0\.[1-3] pg-loadbalancetest/g; +if ($hosts_count != 3) +{ + # Host file is not prepared for this test + plan skip_all => "hosts file was not prepared for DNS load balance test"; +} + +$PostgreSQL::Test::Cluster::use_tcp = 1; +$PostgreSQL::Test::Cluster::test_pghost = '127.0.0.1'; +my $port = PostgreSQL::Test::Cluster::get_free_port(); + +my $node_primary1 = PostgreSQL::Test::Cluster->new('primary1', port => $port); +$node_primary1->init(has_archiving => 1, allows_streaming => 1); + +# Start it +$node_primary1->start; + +# Take backup from which all operations will be run +$node_primary1->backup('my_backup'); + +my $node_standby = PostgreSQL::Test::Cluster->new('standby', port => $port, own_host => 1); +$node_standby->init_from_backup($node_primary1, 'my_backup', + has_restoring => 1); +$node_standby->start(); + +my $node_primary2 = PostgreSQL::Test::Cluster->new('node1', port => $port, own_host => 1); +$node_primary2 ->init(); +$node_primary2 ->start(); + +# target_session_attrs=primary should always choose the first one. +$node_primary1->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=primary check_all_addrs=1", + "target_session_attrs=primary connects to the first node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); +$node_primary1->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=read-write check_all_addrs=1", + "target_session_attrs=read-write connects to the first node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); +$node_primary1->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=any check_all_addrs=1", + "target_session_attrs=any connects to the first node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); +$node_standby->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=standby check_all_addrs=1", + "target_session_attrs=standby connects to the third node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); +$node_standby->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=read-only check_all_addrs=1", + "target_session_attrs=read-only connects to the third node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); + + +$node_primary1->stop(); + +# target_session_attrs=primary should always choose the first one. +$node_primary2->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=primary check_all_addrs=1", + "target_session_attrs=primary connects to the first node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); +$node_primary2->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=read-write check_all_addrs=1", + "target_session_attrs=read-write connects to the first node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); +$node_standby->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=any check_all_addrs=1", + "target_session_attrs=any connects to the first node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); +$node_standby->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=standby check_all_addrs=1", + "target_session_attrs=standby connects to the third node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); +$node_standby->connect_ok( + "host=pg-loadbalancetest port=$port target_session_attrs=read-only check_all_addrs=1", + "target_session_attrs=read-only connects to the third node", + sql => "SELECT 'connect1'", + log_like => [qr/statement: SELECT 'connect1'/]); + +$node_primary2->stop(); +$node_standby->stop(); + + +done_testing(); diff --git a/src/interfaces/libpq/t/007_load_balance_dns_check_all_addrs.pl b/src/interfaces/libpq/t/007_load_balance_dns_check_all_addrs.pl new file mode 100644 index 00000000000..d3405598e67 --- /dev/null +++ b/src/interfaces/libpq/t/007_load_balance_dns_check_all_addrs.pl @@ -0,0 +1,128 @@ +# Copyright (c) 2023-2025, PostgreSQL Global Development Group +use strict; +use warnings FATAL => 'all'; +use Config; +use PostgreSQL::Test::Utils; +use PostgreSQL::Test::Cluster; +use Test::More; + +if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/) +{ + plan skip_all => + 'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA'; +} + +my $can_bind_to_127_0_0_2 = + $Config{osname} eq 'linux' || $PostgreSQL::Test::Utils::windows_os; + +# Checks for the requirements for testing load balancing method 2 +if (!$can_bind_to_127_0_0_2) +{ + plan skip_all => 'load_balance test only supported on Linux and Windows'; +} + +my $hosts_path; +if ($windows_os) +{ + $hosts_path = 'c:\Windows\System32\Drivers\etc\hosts'; +} +else +{ + $hosts_path = '/etc/hosts'; +} + +my $hosts_content = PostgreSQL::Test::Utils::slurp_file($hosts_path); + +my $hosts_count = () = + $hosts_content =~ /127\.0\.0\.[1-3] pg-loadbalancetest/g; +if ($hosts_count != 3) +{ + # Host file is not prepared for this test + plan skip_all => "hosts file was not prepared for DNS load balance test"; +} + +$PostgreSQL::Test::Cluster::use_tcp = 1; +$PostgreSQL::Test::Cluster::test_pghost = '127.0.0.1'; + +my $port = PostgreSQL::Test::Cluster::get_free_port(); +local $Test::Builder::Level = $Test::Builder::Level + 1; +my $node_primary1 = PostgreSQL::Test::Cluster->new("primary1", port => $port); +$node_primary1->init(has_archiving => 1, allows_streaming => 1); + +# Start it +$node_primary1->start(); + +# Take backup from which all operations will be run +$node_primary1->backup("my_backup"); + +my $node_standby = PostgreSQL::Test::Cluster->new("standby", port => $port, own_host => 1); +$node_standby->init_from_backup($node_primary1, "my_backup", + has_restoring => 1); +$node_standby->start(); + +my $node_primary2 = PostgreSQL::Test::Cluster->new("node1", port => $port, own_host => 1); +$node_primary2->init(); +$node_primary2->start(); +sub test_target_session_attr { + my $target_session_attrs = shift; + my $test_num = shift; + my $primary1_expect_traffic = shift; + my $standby_expeect_traffic = shift; + my $primary2_expect_traffic = shift; + # Statistically the following loop with load_balance_hosts=random will almost + # certainly connect at least once to each of the nodes. The chance of that not + # happening is so small that it's negligible: (2/3)^50 = 1.56832855e-9 + foreach my $i (1 .. 50) + { + $node_primary1->connect_ok( + "host=pg-loadbalancetest port=$port load_balance_hosts=random target_session_attrs=${target_session_attrs} check_all_addrs=1", + "repeated connections with random load balancing", + sql => "SELECT 'connect${test_num}'"); + } + my $node_primary1_occurrences = () = + $node_primary1->log_content() =~ /statement: SELECT 'connect${test_num}'/g; + my $node_standby_occurrences = () = + $node_standby->log_content() =~ /statement: SELECT 'connect${test_num}'/g; + my $node_primary2_occurrences = () = + $node_primary2->log_content() =~ /statement: SELECT 'connect${test_num}'/g; + + my $total_occurrences = + $node_primary1_occurrences + $node_standby_occurrences + $node_primary2_occurrences; + + if ($primary1_expect_traffic == 1) { + ok($node_primary1_occurrences > 0, "received at least one connection on node1"); + }else{ + ok($node_primary1_occurrences == 0, "received at least one connection on node1"); + } + if ($standby_expeect_traffic == 1) { + ok($node_standby_occurrences > 0, "received at least one connection on node1"); + }else{ + ok($node_standby_occurrences == 0, "received at least one connection on node1"); + } + + if ($primary2_expect_traffic == 1) { + ok($node_primary2_occurrences > 0, "received at least one connection on node1"); + }else{ + ok($node_primary2_occurrences == 0, "received at least one connection on node1"); + } + + ok($total_occurrences == 50, "received 50 connections across all nodes"); +} + +test_target_session_attr('any', + 1, 1, 1, 1); +test_target_session_attr('read-only', + 2, 0, 1, 0); +test_target_session_attr('read-write', + 3, 1, 0, 1); +test_target_session_attr('primary', + 4, 1, 0, 1); +test_target_session_attr('standby', + 5, 0, 1, 0); + + +$node_primary1->stop(); +$node_primary2->stop(); +$node_standby->stop(); + +done_testing(); -- 2.47.2