Hi,

On Tue, Nov 29, 2022 at 02:57:08PM +0000, Jelte Fennema wrote:
> Attached is a new version with the tests cleaned up a bit (more
> comments mostly).
> 
> @Michael, did you have a chance to look at the last version? Because I
> feel that the patch is pretty much ready for a committer to look at,
> at this point.

I had another look; it still applies and tests pass. I still find the
distribution over three hosts a bit skewed (even when OpenSSL is
enabled, which wasn't the case when I first tested it), but couldn't
find a general fault and it worked well enough in my testing.

I wonder whether making the parameter a boolean will paint us into a
corner, and whether maybe additional modes could be envisioned in the
future, but I can't think of some right now (you can pretty neatly
restrict load-balancing to standbys by setting
target_session_attrs=standby in addition). Maybe a way to change the
behaviour if a dns hostname is backed by multiple entries?

Some further (mostly nitpicking) comments on the patch:

> From 6e20bb223012b666161521b5e7249c066467a5f3 Mon Sep 17 00:00:00 2001
> From: Jelte Fennema <github-t...@jeltef.nl>
> Date: Mon, 12 Sep 2022 09:44:06 +0200
> Subject: [PATCH v5] Support load balancing in libpq
> 
> Load balancing connections across multiple read replicas is a pretty
> common way of scaling out read queries. There are two main ways of doing
> so, both with their own advantages and disadvantages:
> 1. Load balancing at the client level
> 2. Load balancing by connecting to an intermediary load balancer
> 
> Both JBDC (Java) and Npgsql (C#) already support client level load
> balancing (option #1). This patch implements client level load balancing
> for libpq as well. To stay consistent with the JDBC and Npgsql part of
> the  ecosystem, a similar implementation and name for the option are
> used. 

I still think all of the above has no business in the commit message,
though maybe the first paragraph can stay as introduction.

> It contains two levels of load balancing:
> 1. The given hosts are randomly shuffled, before resolving them
>     one-by-one.
> 2. Once a host its addresses get resolved, those addresses are shuffled,
>     before trying to connect to them one-by-one.

That's fine.

What should be in the commit message is at least a mention of what the
new connection parameter is called and possibly what is done to
accomplish it.

But the committer will pick this up if needed I guess.

> diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
> index f9558dec3b..6ce7a0c9cc 100644
> --- a/doc/src/sgml/libpq.sgml
> +++ b/doc/src/sgml/libpq.sgml
> @@ -1316,6 +1316,54 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
>        </listitem>
>       </varlistentry>
>  
> +     <varlistentry id="libpq-load-balance-hosts" 
> xreflabel="load_balance_hosts">
> +      <term><literal>load_balance_hosts</literal></term>
> +      <listitem>
> +       <para>
> +        Controls whether the client load balances connections across hosts 
> and
> +        adresses. The default value is 0, meaning off, this means that hosts 
> are
> +        tried in order they are provided and addresses are tried in the order
> +        they are received from DNS or a hosts file. If this value is set to 
> 1,
> +        meaning on, the hosts and addresses that they resolve to are tried in
> +        random order. Subsequent queries once connected will still be sent to
> +        the same server. Setting this to 1, is mostly useful when opening
> +        multiple connections at the same time, possibly from different 
> machines.
> +        This way connections can be load balanced across multiple Postgres
> +        servers.
> +       </para>
> +       <para>
> +        When providing multiple hosts, these hosts are resolved in random 
> order.
> +        Then if that host resolves to multiple addresses, these addresses are
> +        connected to in random order. Only once all addresses for a single 
> host
> +        have been tried, the addresses for the next random host will be
> +        resolved. This behaviour can lead to non-uniform address selection in
> +        certain cases. Such as when not all hosts resolve to the same number 
> of
> +        addresses, or when multiple hosts resolve to the same address. So if 
> you
> +        want uniform load balancing, this is something to keep in mind. 
> However,
> +        non-uniform load balancing also has usecases, e.g. providing the
> +        hostname of a larger server multiple times in the host string so it 
> gets
> +        more requests.
> +       </para>
> +       <para>
> +        When using this setting it's recommended to also configure a 
> reasonable
> +        value for <xref linkend="libpq-connect-connect-timeout"/>. Because 
> then,
> +        if one of the nodes that are used for load balancing is not 
> responding,
> +        a new node will be tried.
> +       </para>
> +      </listitem>
> +     </varlistentry>

I think this whole section is generally fine, but needs some
copyediting.

> +     <varlistentry id="libpq-random-seed" xreflabel="random_seed">
> +      <term><literal>random_seed</literal></term>
> +      <listitem>
> +       <para>
> +        Sets the random seed that is used by <xref 
> linkend="libpq-load-balance-hosts"/>
> +        to randomize the host order. This option is mostly useful when 
> running
> +        tests that require a stable random order.
> +       </para>
> +      </listitem>
> +     </varlistentry>

I wonder whether this needs to be documented if it is mostly a
development/testing parameter?

> diff --git a/src/include/libpq/pqcomm.h b/src/include/libpq/pqcomm.h
> index fcf68df39b..39e93b1392 100644
> --- a/src/include/libpq/pqcomm.h
> +++ b/src/include/libpq/pqcomm.h
> @@ -27,6 +27,12 @@ typedef struct
>       socklen_t       salen;
>  } SockAddr;
>  
> +typedef struct
> +{
> +     int                     family;
> +     SockAddr        addr;
> +}                    AddrInfo;

That last line looks weirdly indented compared to SockAddr; in the
struct above.

>  /* Configure the UNIX socket location for the well known port. */
>  
>  #define UNIXSOCK_PATH(path, port, sockdir) \
> diff --git a/src/interfaces/libpq/fe-connect.c 
> b/src/interfaces/libpq/fe-connect.c
> index f88d672c6c..b4d3613713 100644
> --- a/src/interfaces/libpq/fe-connect.c
> +++ b/src/interfaces/libpq/fe-connect.c
> @@ -241,6 +241,14 @@ static const internalPQconninfoOption 
> PQconninfoOptions[] = {
>               "Fallback-Application-Name", "", 64,
>       offsetof(struct pg_conn, fbappname)},
>  
> +     {"load_balance_hosts", NULL, NULL, NULL,
> +             "Load-Balance", "", 1,  /* should be just '0' or '1' */
> +     offsetof(struct pg_conn, loadbalance)},
> +
> +     {"random_seed", NULL, NULL, NULL,
> +             "Random-Seed", "", 10,  /* strlen(INT32_MAX) == 10 */
> +     offsetof(struct pg_conn, randomseed)},
> +
>       {"keepalives", NULL, NULL, NULL,
>               "TCP-Keepalives", "", 1,        /* should be just '0' or '1' */
>       offsetof(struct pg_conn, keepalives)},
> @@ -379,6 +387,7 @@ static bool fillPGconn(PGconn *conn, PQconninfoOption 
> *connOptions);
>  static void freePGconn(PGconn *conn);
>  static void closePGconn(PGconn *conn);
>  static void release_conn_addrinfo(PGconn *conn);
> +static bool store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist);
>  static void sendTerminateConn(PGconn *conn);
>  static PQconninfoOption *conninfo_init(PQExpBuffer errorMessage);
>  static PQconninfoOption *parse_connection_string(const char *connstr,
> @@ -424,6 +433,9 @@ static void pgpassfileWarning(PGconn *conn);
>  static void default_threadlock(int acquire);
>  static bool sslVerifyProtocolVersion(const char *version);
>  static bool sslVerifyProtocolRange(const char *min, const char *max);
> +static int   loadBalance(PGconn *conn);
> +static bool parse_int_param(const char *value, int *result, PGconn *conn,
> +                                                     const char *context);
>  
>  
>  /* global variable because fe-auth.c needs to access it */
> @@ -1007,6 +1019,46 @@ parse_comma_separated_list(char **startptr, bool *more)
>       return p;
>  }
>  
> +/*
> + * Initializes the prng_state field of the connection. We want something
> + * unpredictable, so if possible, use high-quality random bits for the
> + * seed. Otherwise, fall back to a seed based on timestamp and PID.
> + */
> +static bool
> +libpq_prng_init(PGconn *conn)
> +{
> +     if (unlikely(conn->randomseed))
> +     {
> +             int                     rseed;
> +
> +             if (!parse_int_param(conn->randomseed, &rseed, conn, 
> "random_seed"))
> +             {
> +                     return false;
> +             };

I think it's project policy to drop the braces for single statements in
if blocks.

> +             pg_prng_seed(&conn->prng_state, rseed);
> +     }
> +     else if (unlikely(!pg_prng_strong_seed(&conn->prng_state)))
> +     {
> +             uint64          rseed;
> +             time_t          now = time(NULL);
> +
> +             /*
> +              * Since PIDs and timestamps tend to change more frequently in 
> their
> +              * least significant bits, shift the timestamp left to allow a 
> larger
> +              * total number of seeds in a given time period.  Since that 
> would
> +              * leave only 20 bits of the timestamp that cycle every ~1 
> second,
> +              * also mix in some higher bits.
> +              */
> +             rseed = ((uint64) getpid()) ^
> +                     ((uint64) now << 12) ^
> +                     ((uint64) now >> 20);
> +
> +             pg_prng_seed(&conn->prng_state, rseed);
> +     }
> +     return true;
> +}
> +
> +
>  /*

Additional newline.

> @@ -1164,6 +1217,36 @@ connectOptions2(PGconn *conn)
>               }
>       }
>  
> +     if (loadbalancehosts < 0)
> +     {
> +             appendPQExpBufferStr(&conn->errorMessage,
> +                                                      
> libpq_gettext("loadbalance parameter must be an integer\n"));
> +             return false;
> +     }
> +
> +     if (loadbalancehosts)
> +     {
> +             if (!libpq_prng_init(conn))
> +             {
> +                     return false;
> +             }
> +
> +             /*
> +              * Shuffle connhost with a Durstenfeld/Knuth version of the
> +              * Fisher-Yates shuffle. Source:
> +              * 
> https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm
> +              */
> +             for (i = conn->nconnhost - 1; i > 0; i--)
> +             {
> +                     int                     j = 
> pg_prng_uint64_range(&conn->prng_state, 0, i);
> +                     pg_conn_host temp = conn->connhost[j];
> +
> +                     conn->connhost[j] = conn->connhost[i];
> +                     conn->connhost[i] = temp;
> +             }
> +     }
> +
> +
>       /*

Additional newline.

> @@ -1726,6 +1809,27 @@ connectFailureMessage(PGconn *conn, int errorno)
>               libpq_append_conn_error(conn, "\tIs the server running on that 
> host and accepting TCP/IP connections?");
>  }
>  
> +/*
> + * Should we load balance across hosts? Returns 1 if yes, 0 if no, and -1 if
> + * conn->loadbalance is set to a value which is not parseable as an integer.
> + */
> +static int
> +loadBalance(PGconn *conn)
> +{
> +     char       *ep;
> +     int                     val;
> +
> +     if (conn->loadbalance == NULL)
> +     {
> +             return 0;
> +     }

Another case of additional braces.

> +     val = strtol(conn->loadbalance, &ep, 10);
> +     if (*ep)
> +             return -1;
> +     return val != 0 ? 1 : 0;
> +}
> +
> +
>  /*

Additional newline.

> @@ -4041,6 +4154,63 @@ freePGconn(PGconn *conn)
>       free(conn);
>  }
>  
> +
> +/*

Additional newline.

> + * Copies over the AddrInfos from addrlist to the PGconn.
> + */
> +static bool
> +store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist)
> +{
> +     struct addrinfo *ai = addrlist;
> +
> +     conn->whichaddr = 0;
> +
> +     conn->naddr = 0;
> +     while (ai)
> +     {
> +             ai = ai->ai_next;
> +             conn->naddr++;
> +     }
> +
> +     conn->addr = calloc(conn->naddr, sizeof(AddrInfo));
> +     if (conn->addr == NULL)
> +     {
> +             return false;
> +     }

Additional braces.

> @@ -4048,11 +4218,10 @@ freePGconn(PGconn *conn)
>  static void
>  release_conn_addrinfo(PGconn *conn)
>  {
> -     if (conn->addrlist)
> +     if (conn->addr)
>       {
> -             pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
> -             conn->addrlist = NULL;
> -             conn->addr_cur = NULL;  /* for safety */
> +             free(conn->addr);
> +             conn->addr = NULL;
>       }
>  }
>  
> diff --git a/src/interfaces/libpq/libpq-int.h 
> b/src/interfaces/libpq/libpq-int.h
> index 512762f999..76ee988038 100644
> --- a/src/interfaces/libpq/libpq-int.h
> +++ b/src/interfaces/libpq/libpq-int.h
> @@ -82,6 +82,8 @@ typedef struct
>  #endif
>  #endif                                                       /* USE_OPENSSL 
> */
>  
> +#include "common/pg_prng.h"
> +
>  /*
>   * POSTGRES backend dependent Constants.
>   */
> @@ -373,6 +375,8 @@ struct pg_conn
>       char       *pgpassfile;         /* path to a file containing 
> password(s) */
>       char       *channel_binding;    /* channel binding mode
>                                                                        * 
> (require,prefer,disable) */
> +     char       *loadbalance;        /* load balance over hosts */
> +     char       *randomseed;         /* seed for randomization of load 
> balancing */
>       char       *keepalives;         /* use TCP keepalives? */

A bit unclear why you put the variables at this point in the list, but
the list doesn't seem to be ordered strictly anyway; still, maybe they
would fit best at the bottom below target_session_attrs?

>       char       *keepalives_idle;    /* time between TCP keepalives */
>       char       *keepalives_interval;        /* time between TCP keepalive
> @@ -461,8 +465,10 @@ struct pg_conn
>       PGTargetServerType target_server_type;  /* desired session properties */
>       bool            try_next_addr;  /* time to advance to next 
> address/host? */
>       bool            try_next_host;  /* time to advance to next connhost[]? 
> */
> -     struct addrinfo *addrlist;      /* list of addresses for current 
> connhost */
> -     struct addrinfo *addr_cur;      /* the one currently being tried */
> +     int                     naddr;                  /* number of addrs 
> returned by getaddrinfo */
> +     int                     whichaddr;              /* the addr currently 
> being tried */

Address(es) is always spelt out in the comments, those two addr(s)
should also I think.

> diff --git a/src/interfaces/libpq/t/003_loadbalance.pl 
> b/src/interfaces/libpq/t/003_loadbalance.pl
> new file mode 100644
> index 0000000000..07eddbe9cc
> --- /dev/null
> +++ b/src/interfaces/libpq/t/003_loadbalance.pl
> @@ -0,0 +1,167 @@
> +# Copyright (c) 2022, PostgreSQL Global Development Group

Copyright bump needed.


Cheers,

Michael


Reply via email to