Hello Tom,

Some points for discussion and review:

1.  The bulk of patch 0001 is indeed just
s/printfPQExpBuffer/appendPQExpBuffer/g, though I also made an attempt
to use appendPQExpBufferStr wherever possible.  There are two categories
of printfPQExpBuffer calls I didn't change:

1a. Calls reporting an out-of-memory situation.  There was already a
policy in some places that we'd intentionally printf not append such
messages, and I made that explicit.  The idea is that we might not
have room to add more text to errorMessage, so resetting the buffer
provides more certainty of success.  However, this makes for a pretty
weird inconsistency in the code; there are a lot of functions now in
which half of the messages are printf'd and half are append'd, so I'm
afraid that future patches will get it wrong as to which to use.  Also,
in reality there often *would* be room to append "out of memory" without
enlarging errorMessage's buffer, so that this could just be pointless
destruction of useful error history.  I didn't do anything about it here,
but I'm tempted to replace all the printf's for OOM messages with a
subroutine that will append if it can do so without enlarging the existing
buffer, else replace.

1b. There are a few places where it didn't seem worth changing anything
because it'd just result in needing to add a resetPQExpBuffer in the same
function or very nearby.  Mostly in fe-lobj.c.

2. I had to drop some code (notably in pqPrepareAsyncResult) that
attempted to force conn->errorMessage to always have the same contents
as the current PGresult's PQresultErrorMessage; as it stood, server
errors such as "role does not exist" wiped out preceding contents of
errorMessage, breaking cases such as the second example above.  This is
slightly annoying, but in the new dispensation it's possible that
conn->errorMessage contains a mix of libpq-generated errors and text
received from the server, so I'm not sure that preserving the old
equivalence is a useful goal anyway.  We shouldn't cram pre-existing
errorMessage text back into a server-generated error; that would
invalidate the server's auxiliary error info such as the SQLSTATE.
I don't know if it's possible to do better here.  One idea is to tweak
pqPrepareAsyncResult so that it does the message synchronization only
when in CONNECTION_OK state, allowing conn->errorMessage to be a
historical record only for connection processing.  But that seems like
a hack.

3. In some places I failed to resist the temptation to copy-edit error
messages that didn't meet style guidelines, or to add libpq_gettext()
annotation where it seemed to have been missed.

4. connectDBComplete() cleared errorMessage after a successful
completion, but that's the wrong place to do it; we have to do that
at the success exits from PQconnectPoll, else applications that
use PQconnectPoll directly will see leftover garbage there.

5. There were places in PQconnectPoll that temporarily set
conn->status = CONNECTION_OK to prevent PQsendQueryStart from
complaining.  A better idea is to leave the status alone and change
that test in PQsendQueryStart to be status != CONNECTION_BAD.  This
not only simplifies PQconnectPoll, but allows PQsendQueryStart
to tell whether we're still in the connection sequence.  I fixed
it to not clear errorMessage when we are, thus solving the problem
of PQsendQuery destroying prior connection error history.  That allows
getting rid of saveErrorMessage/restoreErrorMessage, which were
(a) an incredibly ugly and inefficient hack, and (b) inadequate,
since there were other paths through PQconnectPoll that could reach
PQsendQueryStart but were not protected with calls to those.

6. In the 0002 patch, I have it printing the annotation only when
there's more than one connhost[], and only once per connhost not
once per IP address.  The motivation for doing it like that was to
not change the behavior for simple cases with only one host name.
There is certainly room to argue that we should make an annotation
once per IP address, but that'd change the behavior for pretty common
cases like "localhost" resolving to both 127.0.0.1 and ::1, so I don't
know if it'd be a good idea or not.  There's also an interaction to
consider with my patch at
https://www.postgresql.org/message-id/4913.1533827102%40sss.pgh.pa.us
namely that we have to be sure the labeling behaves sanely for
connhosts that fail to resolve any IP addresses.

I'll put this into the September commitfest.

* About the *first* patch

It applies cleanly and compiles.

Alas, global "make check" is ok although there is no change on regression tests, which suggest that the multiple error message reporting is not tested anywhere. Should there be at least one test?

Getting rid of somehow ugly "let's save the current message and restore it later" logic in places looks like a definite improvement.

There are still 86 instances of "printfPQExpBuffer", which seems like a lot. They are mostly OOM messages, but not only. This make checking the patch quite cumbersome.

I'd rather there would be only one rule, and clear reset with a comment when needded (eg "fe-lobj.c").

I agree with your suggestion of adding a function trying to preserve messages on OOM errors if possible. Maybe "appendPQExpBufferOOM"?

It is unclear to me why the "printf" variant is used in "PQencryptPasswordConn" and "connectOptions2", it looks that you skipped these functions.

In passing, some OOM message are rather terse: "out of memory\n", other are more precise "out of memory allocating GSSAPI buffer (%d)\n".

I do not think it is worth creating a l10n exception on a should not happen message, eg (there are a few):

 -  /* shouldn't happen */
 -  printfPQExpBuffer(&conn->errorMessage,
 -                    libpq_gettext("invalid SCRAM exchange state\n"));
 +  /* shouldn't happen, so we don't translate the msg */
 +  appendPQExpBufferStr(&conn->errorMessage,
 +                       "invalid SCRAM exchange state\n");

I would have let "libpq_gettext" it as it was.

I'm ok of libpq & server messages are mixed, because I do not see why one may take precedent on the others, and when some strange & combined errors, every clue is worthy.

I have not seen anything which raises a "risky" alarm about undesired consequences, but who knows?

Comments update seem ok. It is possible that some were missed.


* About the second patch:

It applies cleanly on top of previous and compiles. Yet again, global "make check" is ok although there were no test changes, which just show the abysmal coverage of psql regression tests.

Should there be at least one test?

The feature answers some of my issues with the "libpq should not look up all host addresses at once" patch.

* "server \"%s\" port %s:\n"

I do not like the resulting output

server:
problem found
  more details

I'd rather have

server: problem found
  more details

which would require to append a '\n' at the beginning of the line from the second attempt.

ISTM that both the hostname and ip should be shown to avoid confusion about hosts with multiple ips, esp. as ips are given in any order by the dns.

The patch misses the one server with multiple ips case. ISTM that the introductory line should be shown in such case as well.

 sh> psql "host=local.coelho.net,localhost port=5434"
 psql: server "local.coelho.net" port 5434:
   ...
 server "localhost" port 5434:
  ...

But:

 sh> psql "host=local2.coelho.net port=5434"
 psql: could not connect to server: Connection refused
    ...
 could not connect to server: Connection refused
    ...

Even if the hint message in this case is explicit enough, ISTM that the server line is deserved.

Also for homogeneity, I'd suggest to always add the server line. If the server introduction is inserted in all cases, including when only one host is used, hints become partially redundant:

  server "local.coelho.net" port 5434:
  could not connect to server: Connection refused
        Is the server running on host "local.coelho.net" (127.0.0.1) and 
accepting
        TCP/IP connections on port 5434?

This would allow to simplify more hints, which you seem to have done on "open read-write session" and "SHOW transaction_read_only" but not others.

--
Fabien.

Reply via email to