Hi Alvaro-san and Horiguchi-san,
CC) Iwata-san, Tsunakawa-san

Attached is the V23 of the libpq trace patch.

(1)
From: Álvaro Herrera <alvhe...@alvh.no-ip.org>
> It appears that something is still wrong.  I applied lipq pipeline v27 from 
> [1]
> and ran src/test/modules/test_libpq/pipeline singlerow, after patching it to
> do PQtrace() after PQconn(). Below is the output I get from that. The
> noteworthy point is that "ParseComplete" messages appear multiple times
> for some reason ... but that's quite odd, because if I look at the network 
> traffic
> with Wireshark I certainly do not see the ParseComplete message being sent
> three times.

I applied Alvaro's libpq pipeline patch (v33) [1] on top of this libpq trace 
patch,
then traced the src/test/modules/libpq_pipeline/libpq_pipeline.c 
test_singlerowmode.
It's still the same trace output even after applying the fix recommendations
of Horiguchi-san.

        >       Parse   38 "" "SELECT generate_series(42, $1)" #0
        >       Bind    20 "" "" #0 #1 2 '44' #1 #0
        >       Describe        6 P ""
        >       Execute 9 "" 0
        >       Parse   38 "" "SELECT generate_series(42, $1)" #0
        >       Bind    20 "" "" #0 #1 2 '45' #1 #0
        >       Describe        6 P ""
        >       Execute 9 "" 0
        >       Parse   38 "" "SELECT generate_series(42, $1)" #0
        >       Bind    20 "" "" #0 #1 2 '46' #1 #0
        >       Describe        6 P ""
        >       Execute 9 "" 0
        >       Sync    4
        <       ParseComplete   4
        <       BindComplete    4
        <       RowDescription  40 #1 "generate_series" 0 #0 23 #4 -1 #0
        <       DataRow 12 #1 2 '42'
        <       DataRow 12 #1 2 '43'
        <       DataRow 12 #1 2 '44'
        <       CommandComplete 13 "SELECT 3"
        <       ParseComplete   4
        <       ParseComplete   4
        <       ParseComplete   4
        <       BindComplete    4
        <       RowDescription  40 #1 "generate_series" 0 #0 23 #4 -1 #0
        <       DataRow 12 #1 2 '42'
        <       DataRow 12 #1 2 '43'
        <       DataRow 12 #1 2 '44'
        <       DataRow 12 #1 2 '45'
        <       CommandComplete 13 "SELECT 4"
        <       ParseComplete   4
        <       ParseComplete   4
        <       ParseComplete   4
        <       BindComplete    4
        <       RowDescription  40 #1 "generate_series" 0 #0 23 #4 -1 #0
        <       DataRow 12 #1 2 '42'
        <       DataRow 12 #1 2 '43'
        <       DataRow 12 #1 2 '44'
        <       DataRow 12 #1 2 '45'
        <       DataRow 12 #1 2 '46'
        <       CommandComplete 13 "SELECT 5"
        <       ReadyForQuery   5 I

ParseComplete still appears 3 times, but I wonder if that is expected only for 
pipeline's singlerowmode test.
From my observation, when I traced the whole regression test output
by putting PQtrace() in fe-connect.c: connectDBComplete(),
ParseComplete appears only once or twice, etc. for other commands.
In other words, ISTM, it's a case-to-case basis:
Some examples from the whole regression trace output is below:
a.
    >   Describe    6 P ""
    >   Execute 9 "" 0
    >   Sync    4
    <   ParseComplete   4
    <   BindComplete    4
b.
    <   ErrorResponse   217 S "ERROR" V "ERROR" C "42883" M "function 
no_such_function(integer) does not exist" H "No function matches the given name 
and argument types. You might need to add explicit type casts." P "8" F 
"parse_func.c" L "629" R "ParseFuncOrColumn" \x00
    <   ReadyForQuery   5 I
    <   ParseComplete   4
    <   ParseComplete   4
c.
    >   Bind    31 "" "my_insert" #0 #1 4 '9652' #1 #0
    >   Describe    6 P ""
    >   Execute 9 "" 0
    <   ParseComplete   4
    <   ParseComplete   4
    <   ParseComplete   4
    <   BindComplete    4
    <   NoData  4
d.
    <   CommandComplete 15 "INSERT 0 2"
    <   ReadyForQuery   5 T
    >   Parse   89 "i" "insert into test (name, amount, letter) select name, 
amount+$1, letter from test" #0
    >   Sync    4
    <   ParseComplete   4
    <   ReadyForQuery   5 T

As you can see from the examples above, ParseComplete appears in multiples/once 
depending on the test.
As for libpq_pipeline's test_singlerowmode, it appears 3x for the 
CommandComplete SELECT.
I am wondering now whether or not it's a bug. Maybe it's not.
Thoughts?


How I traced the whole regression test: fe-connect.c: connectDBComplete()

@@ -2114,6 +2137,7 @@ connectDBComplete(PGconn *conn)
        int                     timeout = 0;
        int                     last_whichhost = -2;    /* certainly different 
from whichhost */
        struct addrinfo *last_addr_cur = NULL;
+       FILE            *trace_file;
 
        if (conn == NULL || conn->status == CONNECTION_BAD)
                return 0;
@@ -2171,6 +2195,9 @@ connectDBComplete(PGconn *conn)
                switch (flag)
                {
                        case PGRES_POLLING_OK:
+                               trace_file = 
fopen("/home/postgres/tracelog/regression_tracepipeline_a.out","a");
+                               PQtrace(conn, trace_file);
+                               PQtraceSetFlags(conn, 
PQTRACE_SUPPRESS_TIMESTAMPS);


(2)
> From: Kyotaro Horiguchi <horikyota....@gmail.com>
> > > This inhibits logCursor being updated. What is worse, I find that
> > > logCursor movement is quite dubious.
> >
> > Using (inCursor - inStart) as logCursor doesn't work correctly if
> > tracing state desyncs.  Once desync happens inStart can be moved at
> > the timing that the tracing code doesn't expect.
> -                                                   This requires (as I
> - mentioned upthread) pqReadData to actively reset logCursor, though.
> + So pgReadData needs to avtively reset logCursor.  If logCursor is
> + actively reset, we no longer need to use (inCursor - inStart) as
> + logCursor and it is enough that logCursor follows inCursor.
> >
> > logCursor should move when bytes are fed to the tracing functoins even
> > when theyare the last bytes of a message.

Following the advice of Horiguchi-san above, of updating logCursor through 
pqReadData,
I have updated pqReadData and pqCheckInBufferSpace.
In pqTraceOutputBeByte1 and pqTraceOutputBeInt.
logCursor now uses the inCursor value because pqReadData actively resets the 
logCursor. 

In addition, I also fixed the following:

(3) 
In pqTraceOutputBeByte1, I added the missing update of logCursor for the 
LOG_FIRST_BYTE:

+        case LOG_FIRST_BYTE:
+            {
+                ...
+                conn->be_msg->logCursor = conn->inCursor;
+                break;
+            }


(4)
In pqTraceOutputBeInt():

I changed 0 to false:
-    bool        logfinish = 0;
+    bool        logfinish = false;

I followed this advice and applied to changes for LOG_LENGTH:

>  +        case LOG_LENGTH:
>  +            fprintf(conn->Pfdebug, "%d", v);
> -+            conn->be_msg->length = v - length;
> ++            conn->be_msg->length = v;
>  +            /* Next, log the message contents */
>  +            conn->be_msg->state = LOG_CONTENTS;
> -+            logfinish = pqTraceMaybeBreakLine(0, conn);
> ++            logfinish = pqTraceMaybeBreakLine(length, conn);


(5)
Because of the changes in pqReadData, I exposed again the
following and moved from libpq-trace.c to libq-trace.h:
+typedef enum PGLogState
+typedef struct pqBackendMessage


(6)
As reported abovethread, the Assert in pqTraceMaybeBreakLine() still breaks. So 
I removed it
>psql: libpq-trace.c:320: pqTraceMaybeBreakLine: Assertion 
>`conn->be_msg->length > 0' failed.


(7) Not included in this patch is a fix for the bug I found from the whole 
regression test output.
There are many instances of Invalid Protocol, but only for FunctionCallResponse 
   

    <   FunctionCallResponse    8 0
'<  :::Invalid Protocol
'

See fe-protocol3.c: pqFunctionCall3().
Perhaps it has something to do when the actual_result_len == 0, and it's not 
integer (pqGetnchar).
And somewhere among pqTraceOutputBeInt(), pqTraceMaybeBreakLine(), etc.
probably breaks something when message is V, 8, 0 \0.
(Referring to (6) above, maybe the reason why Assert(conn->be_msg->length > 0) 
fails
in pqTraceMaybeBreakLine() is because the length may possibly be 0.)

Thoughts?

fe-protocol3.c: pqFunctionCall3():
case 'V':                       /* function result */
    if (pqGetInt(actual_result_len, 4, conn))
        continue;
    if (*actual_result_len != -1)
    {
        if (result_is_int)
        {
            if (pqGetInt(result_buf, *actual_result_len, conn))
                continue;
        }
        else
        {
            if (pqGetnchar((char *) result_buf,
                            *actual_result_len,
                            conn))
                continue;
        }
    }



(8) Regarding the exchange of discussions between Tom Lane and Tsunakawa-san
 about the improved tracing design, Iwata-san is now working on it and
 will submit the updated patch soon.


Regards,
Kirk Jamison


[1] 
https://www.postgresql.org/message-id/flat/20210304214017.GA22915%40alvherre.pgsql#bd8f9b26a50256bb6b0d9f273fc2889e

Attachment: v23-0001-Improve-libpq-tracing-capabilities.patch
Description: v23-0001-Improve-libpq-tracing-capabilities.patch

Reply via email to