On Mon, 9 Oct 2023 at 21:17, David Rowley <dgrowle...@gmail.com> wrote:
> Here are some more thoughts on how we could improve this:
>
> 1. Adjust the definition of StringInfoData.maxlen to define that -1
> means the StringInfoData's buffer is externally managed.
> 2. Adjust enlargeStringInfo() to add a check for maxlen = -1 and have
> it palloc, say, pg_next_pow2(str->len * 2) bytes and memcpy the
> existing (externally managed string) into the newly palloc'd buffer.
> 3. Add a new function along the lines of what I originally proposed to
> allow init of a StringInfoData using an existing allocated string
> which sets maxlen = -1.
> 4. Update all the existing places, including the ones I just committed
> (plus the ones you committed in ba1e066e4) to make use of the function
> added in #3.

I just spent the past few hours playing around with the attached WIP
patch to try to clean up the various places where we manually build
StringInfoDatas around the tree.

While working on this, I added an Assert in the new
initStringInfoFromStringWithLen function to ensure that data[len] ==
'\0' per the "There is guaranteed to be a terminating '\0' at
data[len]" comment in stringinfo.h.  It looks like we have some
existing breakers of this rule.

If you apply the attached patch to 608fd198de~1 and ignore the
rejected hunks from the deserial functions, you'll see an Assert
failure during 023_twophase_stream.pl

023_twophase_stream_subscriber.log indicates:
TRAP: failed Assert("data[len] == '\0'"), File:
"../../../../src/include/lib/stringinfo.h", Line: 97, PID: 1073141
postgres: subscriber: logical replication parallel apply worker for
subscription 16396 (ExceptionalCondition+0x70)[0x56160451e9d0]
postgres: subscriber: logical replication parallel apply worker for
subscription 16396 (ParallelApplyWorkerMain+0x53c)[0x5616043618cc]
postgres: subscriber: logical replication parallel apply worker for
subscription 16396 (StartBackgroundWorker+0x20b)[0x56160434452b]

So it seems like we have some existing issues with
LogicalParallelApplyLoop(). The code there does not properly NUL
terminate the StringInfoData.data field. There are some examples in
exec_bind_message() of how that could be fixed. I've CC'd Amit to let
him know about this.

I'll also need to revert 608fd198 as this also highlights that setting
the StringInfoData.data to point to a bytea Datum can't be done either
as those aren't NUL terminated strings.

If people think it's worthwhile having something like the attached to
try to eliminate our need to manually build StringInfoDatas then I can
spend more time on it once LogicalParallelApplyLoop() is fixed.

David
diff --git a/src/backend/replication/logical/applyparallelworker.c 
b/src/backend/replication/logical/applyparallelworker.c
index 82f48a488e..fdfbd45e2c 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -774,10 +774,7 @@ LogicalParallelApplyLoop(shm_mq_handle *mqh)
                        if (len == 0)
                                elog(ERROR, "invalid message length");
 
-                       s.cursor = 0;
-                       s.maxlen = -1;
-                       s.data = (char *) data;
-                       s.len = len;
+                       initStringInfoFromStringWithLen(&s, data, len);
 
                        /*
                         * The first byte of messages sent from leader apply 
worker to
diff --git a/src/backend/replication/logical/proto.c 
b/src/backend/replication/logical/proto.c
index d52c8963eb..0d65e2836b 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -879,6 +879,7 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
        /* Read the data */
        for (i = 0; i < natts; i++)
        {
+               char       *buff;
                char            kind;
                int                     len;
                StringInfo      value = &tuple->colvalues[i];
@@ -899,19 +900,17 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
                                len = pq_getmsgint(in, 4);      /* read length 
*/
 
                                /* and data */
-                               value->data = palloc(len + 1);
-                               pq_copymsgbytes(in, value->data, len);
+                               buff = palloc(len + 1);
+                               pq_copymsgbytes(in, buff, len);
 
                                /*
                                 * Not strictly necessary for 
LOGICALREP_COLUMN_BINARY, but
                                 * per StringInfo practice.
                                 */
-                               value->data[len] = '\0';
+                               buff[len] = '\0';
 
-                               /* make StringInfo fully valid */
-                               value->len = len;
-                               value->cursor = 0;
-                               value->maxlen = len;
+                               /* make StringInfo */
+                               initStringInfoFromStringWithLen(value, buff, 
len);
                                break;
                        default:
                                elog(ERROR, "unrecognized data representation 
type '%c'", kind);
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 597947410f..b4bf3111cf 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3582,10 +3582,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
                                        /* Ensure we are reading the data into 
our memory context. */
                                        
MemoryContextSwitchTo(ApplyMessageContext);
 
-                                       s.data = buf;
-                                       s.len = len;
-                                       s.cursor = 0;
-                                       s.maxlen = -1;
+                                       initStringInfoFromStringWithLen(&s, 
buf, len);
 
                                        c = pq_getmsgbyte(&s);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 21b9763183..17407d9edb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1816,23 +1816,18 @@ exec_bind_message(StringInfo input_message)
 
                        if (!isNull)
                        {
-                               const char *pvalue = 
pq_getmsgbytes(input_message, plength);
+                               char *pvalue = unconstify(char *, 
pq_getmsgbytes(input_message, plength));
 
                                /*
-                                * Rather than copying data around, we just set 
up a phony
-                                * StringInfo pointing to the correct portion 
of the message
-                                * buffer.  We assume we can scribble on the 
message buffer so
-                                * as to maintain the convention that 
StringInfos have a
-                                * trailing null.  This is grotty but is a big 
win when
-                                * dealing with very large parameter strings.
+                                * Rather than copying data around, we just 
initialize a StringInfo
+                                * pointing to the correct portion of the 
message buffer.  We assume we
+                                * can scribble on the message buffer so as to 
maintain the convention
+                                * that StringInfos have a trailing null.  This 
is grotty but is a big win
+                                * when dealing with very large parameter 
strings.
                                 */
-                               pbuf.data = unconstify(char *, pvalue);
-                               pbuf.maxlen = plength + 1;
-                               pbuf.len = plength;
-                               pbuf.cursor = 0;
-
-                               csave = pbuf.data[plength];
-                               pbuf.data[plength] = '\0';
+                               csave = pvalue[plength];
+                               pvalue[plength] = '\0';
+                               initStringInfoFromStringWithLen(&pbuf, pvalue, 
plength);
                        }
                        else
                        {
diff --git a/src/backend/utils/adt/array_userfuncs.c 
b/src/backend/utils/adt/array_userfuncs.c
index 7f87df45df..7731a6a76f 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -722,14 +722,9 @@ array_agg_deserialize(PG_FUNCTION_ARGS)
 
        sstate = PG_GETARG_BYTEA_PP(0);
 
-       /*
-        * Fake up a StringInfo pointing to the bytea's value so we can 
"receive"
-        * the serialized aggregate state value.
-        */
-       buf.data = VARDATA_ANY(sstate);
-       buf.len = VARSIZE_ANY_EXHDR(sstate);
-       buf.maxlen = 0;
-       buf.cursor = 0;
+       initStringInfoFromStringWithLen(&buf,
+                                                                       
VARDATA_ANY(sstate),
+                                                                       
VARSIZE_ANY_EXHDR(sstate));
 
        /* element_type */
        element_type = pq_getmsgint(&buf, 4);
@@ -1133,14 +1128,9 @@ array_agg_array_deserialize(PG_FUNCTION_ARGS)
 
        sstate = PG_GETARG_BYTEA_PP(0);
 
-       /*
-        * Fake up a StringInfo pointing to the bytea's value so we can 
"receive"
-        * the serialized aggregate state value.
-        */
-       buf.data = VARDATA_ANY(sstate);
-       buf.len = VARSIZE_ANY_EXHDR(sstate);
-       buf.maxlen = 0;
-       buf.cursor = 0;
+       initStringInfoFromStringWithLen(&buf,
+                                                                       
VARDATA_ANY(sstate),
+                                                                       
VARSIZE_ANY_EXHDR(sstate));
 
        /* element_type */
        element_type = pq_getmsgint(&buf, 4);
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index f4b885005f..6e7826631f 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -5189,14 +5189,9 @@ numeric_avg_deserialize(PG_FUNCTION_ARGS)
 
        init_var(&tmp_var);
 
-       /*
-        * Fake up a StringInfo pointing to the bytea's value so we can 
"receive"
-        * the serialized aggregate state value.
-        */
-       buf.data = VARDATA_ANY(sstate);
-       buf.len = VARSIZE_ANY_EXHDR(sstate);
-       buf.maxlen = 0;
-       buf.cursor = 0;
+       initStringInfoFromStringWithLen(&buf,
+                                                                       
VARDATA_ANY(sstate),
+                                                                       
VARSIZE_ANY_EXHDR(sstate));
 
        result = makeNumericAggStateCurrentContext(false);
 
@@ -5305,14 +5300,9 @@ numeric_deserialize(PG_FUNCTION_ARGS)
 
        init_var(&tmp_var);
 
-       /*
-        * Fake up a StringInfo pointing to the bytea's value so we can 
"receive"
-        * the serialized aggregate state value.
-        */
-       buf.data = VARDATA_ANY(sstate);
-       buf.len = VARSIZE_ANY_EXHDR(sstate);
-       buf.maxlen = 0;
-       buf.cursor = 0;
+       initStringInfoFromStringWithLen(&buf,
+                                                                       
VARDATA_ANY(sstate),
+                                                                       
VARSIZE_ANY_EXHDR(sstate));
 
        result = makeNumericAggStateCurrentContext(false);
 
@@ -5676,14 +5666,9 @@ numeric_poly_deserialize(PG_FUNCTION_ARGS)
 
        init_var(&tmp_var);
 
-       /*
-        * Fake up a StringInfo pointing to the bytea's value so we can 
"receive"
-        * the serialized aggregate state value.
-        */
-       buf.data = VARDATA_ANY(sstate);
-       buf.len = VARSIZE_ANY_EXHDR(sstate);
-       buf.maxlen = 0;
-       buf.cursor = 0;
+       initStringInfoFromStringWithLen(&buf,
+                                                                       
VARDATA_ANY(sstate),
+                                                                       
VARSIZE_ANY_EXHDR(sstate));
 
        result = makePolyNumAggStateCurrentContext(false);
 
@@ -5867,14 +5852,9 @@ int8_avg_deserialize(PG_FUNCTION_ARGS)
 
        init_var(&tmp_var);
 
-       /*
-        * Fake up a StringInfo pointing to the bytea's value so we can 
"receive"
-        * the serialized aggregate state value.
-        */
-       buf.data = VARDATA_ANY(sstate);
-       buf.len = VARSIZE_ANY_EXHDR(sstate);
-       buf.maxlen = 0;
-       buf.cursor = 0;
+       initStringInfoFromStringWithLen(&buf,
+                                                                       
VARDATA_ANY(sstate),
+                                                                       
VARSIZE_ANY_EXHDR(sstate));
 
        result = makePolyNumAggStateCurrentContext(false);
 
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index ad176651d8..fafd32c116 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -623,21 +623,19 @@ record_recv(PG_FUNCTION_ARGS)
                }
                else
                {
+                       char       *strbuff;
+
                        /*
-                        * Rather than copying data around, we just set up a 
phony
-                        * StringInfo pointing to the correct portion of the 
input buffer.
-                        * We assume we can scribble on the input buffer so as 
to maintain
+                        * Initalize a new StringInfo using the correct portion 
of the input
+                        * buffer.  We assume we can scribble on the input 
buffer so as to maintain
                         * the convention that StringInfos have a trailing null.
                         */
-                       item_buf.data = &buf->data[buf->cursor];
-                       item_buf.maxlen = itemlen + 1;
-                       item_buf.len = itemlen;
-                       item_buf.cursor = 0;
-
+                       strbuff = &buf->data[buf->cursor];
                        buf->cursor += itemlen;
+                       buf->data[buf->cursor] = '\0';
+                       initStringInfoFromStringWithLen(&item_buf, strbuff, 
itemlen);
 
                        csave = buf->data[buf->cursor];
-                       buf->data[buf->cursor] = '\0';
 
                        bufptr = &item_buf;
                        nulls[i] = false;
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 1aff04fa77..ade96bbdc8 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5288,14 +5288,9 @@ string_agg_deserialize(PG_FUNCTION_ARGS)
 
        sstate = PG_GETARG_BYTEA_PP(0);
 
-       /*
-        * Fake up a StringInfo pointing to the bytea's value so we can 
"receive"
-        * the serialized aggregate state value.
-        */
-       buf.data = VARDATA_ANY(sstate);
-       buf.len = VARSIZE_ANY_EXHDR(sstate);
-       buf.maxlen = 0;
-       buf.cursor = 0;
+       initStringInfoFromStringWithLen(&buf,
+                                                                       
VARDATA_ANY(sstate),
+                                                                       
VARSIZE_ANY_EXHDR(sstate));
 
        result = makeStringAggState(fcinfo);
 
diff --git a/src/common/stringinfo.c b/src/common/stringinfo.c
index 05b22b5c53..b87073c924 100644
--- a/src/common/stringinfo.c
+++ b/src/common/stringinfo.c
@@ -30,6 +30,7 @@
 #endif
 
 #include "lib/stringinfo.h"
+#include "port/pg_bitutils.h"
 
 
 /*
@@ -320,6 +321,24 @@ enlargeStringInfo(StringInfo str, int needed)
        if (needed <= str->maxlen)
                return;                                 /* got enough space 
already */
 
+
+       /*
+        * maxlen may be -1 if the string is externally managed.  Check for 
that and
+        * palloc a new buffer instead of reallocating the existing buffer
+        */
+       if (unlikely(str->maxlen == -1))
+       {
+               char *newdata;
+
+               /* make the string no longer read-only */
+               newlen = pg_nextpower2_32(str->len * 2);
+               newdata = (char *) palloc(newlen);
+               memcpy(newdata, str->data, str->len + 1);
+               str->data = newdata;
+               str->maxlen = newlen;
+               return;
+       }
+
        /*
         * We don't want to allocate just a little more space with each append;
         * for efficiency, double the buffer size each time it overflows.
diff --git a/src/include/lib/stringinfo.h b/src/include/lib/stringinfo.h
index 36a416f8e0..b9161d6680 100644
--- a/src/include/lib/stringinfo.h
+++ b/src/include/lib/stringinfo.h
@@ -27,7 +27,10 @@
  *             maxlen  is the allocated size in bytes of 'data', i.e. the 
maximum
  *                             string size (including the terminating '\0' 
char) that we can
  *                             currently store in 'data' without having to 
reallocate
- *                             more space.  We must always have maxlen > len.
+ *                             more space.  We must always have maxlen > len 
except in the
+ *                             case when maxlen is set to -1 in which case 
'data' is classed
+                               "read-only".  Before appending to a read-only 
StringInfoData we
+                               must always copy the 'data' out into a new 
allocation.
  *             cursor  is initialized to zero by makeStringInfo or 
initStringInfo,
  *                             but is not otherwise touched by the 
stringinfo.c routines.
  *                             Some routines use it to scan through a 
StringInfo.
@@ -79,6 +82,26 @@ extern StringInfo makeStringInfo(void);
  */
 extern void initStringInfo(StringInfo str);
 
+/*------------------------
+ * initStringInfoFromStringWithLen
+ * Initialize a StringInfoData struct from an existing string without copying
+ * the string.  The caller is responsible for ensuring the given string
+ * remains valid as long as the StringInfo is.  The given string must be
+ * NUL terminated at 'len' bytes.  Calls to this are used in performance
+ * critical locations where allocating a new buffer and copying would be too
+ * costly.
+ */
+static inline void
+initStringInfoFromStringWithLen(StringInfo str, char *data, int len)
+{
+       Assert(data[len] == '\0');
+
+       str->data = data;
+       str->maxlen = -1;
+       str->len = len;
+       str->cursor = 0;
+}
+
 /*------------------------
  * resetStringInfo
  * Clears the current content of the StringInfo, if any. The

Reply via email to