Hi Alvaro-san and Horiguchi-san, CC) Iwata-san, Tsunakawa-san Attached is the V23 of the libpq trace patch.
(1) From: Álvaro Herrera <alvhe...@alvh.no-ip.org> > It appears that something is still wrong. I applied lipq pipeline v27 from > [1] > and ran src/test/modules/test_libpq/pipeline singlerow, after patching it to > do PQtrace() after PQconn(). Below is the output I get from that. The > noteworthy point is that "ParseComplete" messages appear multiple times > for some reason ... but that's quite odd, because if I look at the network > traffic > with Wireshark I certainly do not see the ParseComplete message being sent > three times. I applied Alvaro's libpq pipeline patch (v33) [1] on top of this libpq trace patch, then traced the src/test/modules/libpq_pipeline/libpq_pipeline.c test_singlerowmode. It's still the same trace output even after applying the fix recommendations of Horiguchi-san. > Parse 38 "" "SELECT generate_series(42, $1)" #0 > Bind 20 "" "" #0 #1 2 '44' #1 #0 > Describe 6 P "" > Execute 9 "" 0 > Parse 38 "" "SELECT generate_series(42, $1)" #0 > Bind 20 "" "" #0 #1 2 '45' #1 #0 > Describe 6 P "" > Execute 9 "" 0 > Parse 38 "" "SELECT generate_series(42, $1)" #0 > Bind 20 "" "" #0 #1 2 '46' #1 #0 > Describe 6 P "" > Execute 9 "" 0 > Sync 4 < ParseComplete 4 < BindComplete 4 < RowDescription 40 #1 "generate_series" 0 #0 23 #4 -1 #0 < DataRow 12 #1 2 '42' < DataRow 12 #1 2 '43' < DataRow 12 #1 2 '44' < CommandComplete 13 "SELECT 3" < ParseComplete 4 < ParseComplete 4 < ParseComplete 4 < BindComplete 4 < RowDescription 40 #1 "generate_series" 0 #0 23 #4 -1 #0 < DataRow 12 #1 2 '42' < DataRow 12 #1 2 '43' < DataRow 12 #1 2 '44' < DataRow 12 #1 2 '45' < CommandComplete 13 "SELECT 4" < ParseComplete 4 < ParseComplete 4 < ParseComplete 4 < BindComplete 4 < RowDescription 40 #1 "generate_series" 0 #0 23 #4 -1 #0 < DataRow 12 #1 2 '42' < DataRow 12 #1 2 '43' < DataRow 12 #1 2 '44' < DataRow 12 #1 2 '45' < DataRow 12 #1 2 '46' < CommandComplete 13 "SELECT 5" < ReadyForQuery 5 I ParseComplete still appears 3 times, but I wonder if that is expected only for pipeline's singlerowmode test. From my observation, when I traced the whole regression test output by putting PQtrace() in fe-connect.c: connectDBComplete(), ParseComplete appears only once or twice, etc. for other commands. In other words, ISTM, it's a case-to-case basis: Some examples from the whole regression trace output is below: a. > Describe 6 P "" > Execute 9 "" 0 > Sync 4 < ParseComplete 4 < BindComplete 4 b. < ErrorResponse 217 S "ERROR" V "ERROR" C "42883" M "function no_such_function(integer) does not exist" H "No function matches the given name and argument types. You might need to add explicit type casts." P "8" F "parse_func.c" L "629" R "ParseFuncOrColumn" \x00 < ReadyForQuery 5 I < ParseComplete 4 < ParseComplete 4 c. > Bind 31 "" "my_insert" #0 #1 4 '9652' #1 #0 > Describe 6 P "" > Execute 9 "" 0 < ParseComplete 4 < ParseComplete 4 < ParseComplete 4 < BindComplete 4 < NoData 4 d. < CommandComplete 15 "INSERT 0 2" < ReadyForQuery 5 T > Parse 89 "i" "insert into test (name, amount, letter) select name, amount+$1, letter from test" #0 > Sync 4 < ParseComplete 4 < ReadyForQuery 5 T As you can see from the examples above, ParseComplete appears in multiples/once depending on the test. As for libpq_pipeline's test_singlerowmode, it appears 3x for the CommandComplete SELECT. I am wondering now whether or not it's a bug. Maybe it's not. Thoughts? How I traced the whole regression test: fe-connect.c: connectDBComplete() @@ -2114,6 +2137,7 @@ connectDBComplete(PGconn *conn) int timeout = 0; int last_whichhost = -2; /* certainly different from whichhost */ struct addrinfo *last_addr_cur = NULL; + FILE *trace_file; if (conn == NULL || conn->status == CONNECTION_BAD) return 0; @@ -2171,6 +2195,9 @@ connectDBComplete(PGconn *conn) switch (flag) { case PGRES_POLLING_OK: + trace_file = fopen("/home/postgres/tracelog/regression_tracepipeline_a.out","a"); + PQtrace(conn, trace_file); + PQtraceSetFlags(conn, PQTRACE_SUPPRESS_TIMESTAMPS); (2) > From: Kyotaro Horiguchi <horikyota....@gmail.com> > > > This inhibits logCursor being updated. What is worse, I find that > > > logCursor movement is quite dubious. > > > > Using (inCursor - inStart) as logCursor doesn't work correctly if > > tracing state desyncs. Once desync happens inStart can be moved at > > the timing that the tracing code doesn't expect. > - This requires (as I > - mentioned upthread) pqReadData to actively reset logCursor, though. > + So pgReadData needs to avtively reset logCursor. If logCursor is > + actively reset, we no longer need to use (inCursor - inStart) as > + logCursor and it is enough that logCursor follows inCursor. > > > > logCursor should move when bytes are fed to the tracing functoins even > > when theyare the last bytes of a message. Following the advice of Horiguchi-san above, of updating logCursor through pqReadData, I have updated pqReadData and pqCheckInBufferSpace. In pqTraceOutputBeByte1 and pqTraceOutputBeInt. logCursor now uses the inCursor value because pqReadData actively resets the logCursor. In addition, I also fixed the following: (3) In pqTraceOutputBeByte1, I added the missing update of logCursor for the LOG_FIRST_BYTE: + case LOG_FIRST_BYTE: + { + ... + conn->be_msg->logCursor = conn->inCursor; + break; + } (4) In pqTraceOutputBeInt(): I changed 0 to false: - bool logfinish = 0; + bool logfinish = false; I followed this advice and applied to changes for LOG_LENGTH: > + case LOG_LENGTH: > + fprintf(conn->Pfdebug, "%d", v); > -+ conn->be_msg->length = v - length; > ++ conn->be_msg->length = v; > + /* Next, log the message contents */ > + conn->be_msg->state = LOG_CONTENTS; > -+ logfinish = pqTraceMaybeBreakLine(0, conn); > ++ logfinish = pqTraceMaybeBreakLine(length, conn); (5) Because of the changes in pqReadData, I exposed again the following and moved from libpq-trace.c to libq-trace.h: +typedef enum PGLogState +typedef struct pqBackendMessage (6) As reported abovethread, the Assert in pqTraceMaybeBreakLine() still breaks. So I removed it >psql: libpq-trace.c:320: pqTraceMaybeBreakLine: Assertion >`conn->be_msg->length > 0' failed. (7) Not included in this patch is a fix for the bug I found from the whole regression test output. There are many instances of Invalid Protocol, but only for FunctionCallResponse < FunctionCallResponse 8 0 '< :::Invalid Protocol ' See fe-protocol3.c: pqFunctionCall3(). Perhaps it has something to do when the actual_result_len == 0, and it's not integer (pqGetnchar). And somewhere among pqTraceOutputBeInt(), pqTraceMaybeBreakLine(), etc. probably breaks something when message is V, 8, 0 \0. (Referring to (6) above, maybe the reason why Assert(conn->be_msg->length > 0) fails in pqTraceMaybeBreakLine() is because the length may possibly be 0.) Thoughts? fe-protocol3.c: pqFunctionCall3(): case 'V': /* function result */ if (pqGetInt(actual_result_len, 4, conn)) continue; if (*actual_result_len != -1) { if (result_is_int) { if (pqGetInt(result_buf, *actual_result_len, conn)) continue; } else { if (pqGetnchar((char *) result_buf, *actual_result_len, conn)) continue; } } (8) Regarding the exchange of discussions between Tom Lane and Tsunakawa-san about the improved tracing design, Iwata-san is now working on it and will submit the updated patch soon. Regards, Kirk Jamison [1] https://www.postgresql.org/message-id/flat/20210304214017.GA22915%40alvherre.pgsql#bd8f9b26a50256bb6b0d9f273fc2889e
v23-0001-Improve-libpq-tracing-capabilities.patch
Description: v23-0001-Improve-libpq-tracing-capabilities.patch