Hi, replace_text() in varlena.c builds the result in a StringInfo buffer, and finishes by copying it into a freshly allocated varlena structure with cstring_to_text_with_len(), in the same memory context.
It looks like that copy step could be avoided by preprending the varlena header to the StringInfo to begin with, and return the buffer as a text*, as in the attached patch. On large strings, the time saved can be significant. For instance I'm seeing a ~20% decrease in total execution time on a test with lengths in the 2-3 MB range, like this: select sum(length( replace(repeat('abcdefghijklmnopqrstuvwxyz', i*10), 'abc', 'ABC') )) from generate_series(10000,12000) as i; Also, at a glance, there are a few other functions with similar StringInfo-to-varlena copies that seem avoidable: concat_internal(), text_format(), replace_text_regexp(). Are there reasons not to do this? Otherwise, should it be considered in in a more principled way, such as adding to the StringInfo API functions like void InitStringInfoForVarlena() and text *StringInfoAsVarlena()? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 693ccc5..3df54ed 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -4136,6 +4136,10 @@ replace_text(PG_FUNCTION_ARGS) initStringInfo(&str); + /* allocate a varlena header at the start of the stringinfo */ + enlargeStringInfo(&str, VARHDRSZ); + str.len += VARHDRSZ; + do { CHECK_FOR_INTERRUPTS(); @@ -4160,8 +4164,8 @@ replace_text(PG_FUNCTION_ARGS) text_position_cleanup(&state); - ret_text = cstring_to_text_with_len(str.data, str.len); - pfree(str.data); + ret_text = (text*) str.data; + SET_VARSIZE(ret_text, str.len); /* VARHDRSZ is already included in str.len */ PG_RETURN_TEXT_P(ret_text); }