On Sun, May 24, 2015 at 2:43 AM, Noah Misch <n...@leadboat.com> wrote: > On Sat, May 08, 2010 at 09:24:45PM -0400, Tom Lane wrote: >> hgonza...@gmail.com writes: >> > http://sources.redhat.com/bugzilla/show_bug.cgi?id=649 >> >> > The last explains why they do not consider it a bug: >> >> > ISO C99 requires for %.*s to only write complete characters that fit below >> > the >> > precision number of bytes. If you are using say UTF-8 locale, but ISO-8859-1 >> > characters as shown in the input file you provided, some of the strings are >> > not valid UTF-8 strings, therefore sprintf fails with -1 because of the >> > encoding error. That's not a bug in glibc. >> >> Yeah, that was about the position I thought they'd take. > > GNU libc eventually revisited that conclusion and fixed the bug through commit > 715a900c9085907fa749589bf738b192b1a2bda5. RHEL 7.1 is fixed, but RHEL 6.6 and > RHEL 5.11 are still affected; the bug will be relevant for another 8+ years. > >> So the bottom line here is that we're best off to avoid %.*s because >> it may fail if the string contains data that isn't validly encoded >> according to libc's idea of the prevailing encoding. > > Yep. Immediate precisions like %.10s trigger the bug as effectively as %.*s, > so tarCreateHeader() [_tarWriteHeader() in 9.2 and earlier] is also affected. > Switching to strlcpy(), as attached, fixes the bug while simplifying the code. > The bug symptom is error 'pg_basebackup: unrecognized link indicator "0"' when > the name of a file in the data directory is not a valid multibyte string. > > > Commit 6dd9584 introduced a new use of .*s, to pg_upgrade. It works reliably > for now, because it always runs in the C locale. pg_upgrade never calls > set_pglocale_pgservice() or otherwise sets its permanent locale. It would be > natural for us to fix that someday, at which point non-ASCII database names > would perturb this status output.
I caught up the following places that need attention on top of the 4 ones in tar.c: src/backend/nodes/read.c: elog(ERROR, "unrecognized integer: \"%.*s\"", src/backend/nodes/read.c: elog(ERROR, "unrecognized OID: \"%.*s\"", src/backend/nodes/read.c: elog(ERROR, "unrecognized token: \"%.*s\"", tok_len, token); src/backend/nodes/readfuncs.c: elog(ERROR, "unrecognized token: \"%.*s\"", length, token); src/backend/nodes/readfuncs.c: elog(ERROR, "unrecognized token: \"%.*s\"", length, token); src/backend/nodes/readfuncs.c: elog(ERROR, "unrecognized integer: \"%.*s\"", length, token); src/backend/nodes/readfuncs.c: elog(ERROR, "unrecognized boolop \"%.*s\"", length, token); src/backend/nodes/readfuncs.c: elog(ERROR, "badly formatted node string \"%.32s\"...", token); src/backend/tsearch/wparser_def.c: * Use of %.*s here is a bit risky since it can misbehave if the data is src/backend/tsearch/wparser_def.c: fprintf(stderr, "parsing \"%.*s\"\n", len, str); src/backend/tsearch/wparser_def.c: /* See note above about %.*s */ src/backend/tsearch/wparser_def.c: fprintf(stderr, "parsing copy of \"%.*s\"\n", prs->lenstr, prs->str); src/backend/utils/adt/datetime.c: * Note: the uses of %.*s in this function would be risky if the src/backend/utils/adt/datetime.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn); src/backend/utils/adt/datetime.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn); src/backend/utils/adt/datetime.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn); src/backend/utils/adt/datetime.c: /* %.*s is safe since all our tokens are ASCII */ src/backend/utils/adt/datetime.c: elog(LOG, "token too long in %s table: \"%.*s\"", src/interfaces/ecpg/ecpglib/error.c: /* %.*s is safe here as long as sqlstate is all-ASCII */ src/interfaces/ecpg/ecpglib/error.c: ecpg_log("raising sqlstate %.*s (sqlcode %ld): %s\n", src/interfaces/ecpg/pgtypeslib/dt_common.c: * Note: the uses of %.*s in this function would be risky if the src/interfaces/ecpg/pgtypeslib/dt_common.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn); src/interfaces/ecpg/pgtypeslib/dt_common.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn); src/interfaces/ecpg/pgtypeslib/dt_common.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn); src/bin/pg_basebackup/pg_basebackup.c: ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)", src/bin/pg_basebackup/pg_basebackup.c: "%*s/%s kB (%d%%), %d/%d tablespaces (%s%-*.*s)", src/bin/pg_upgrade/util.c: printf(" %s%-*.*s\r", > It would be good to purge the code of precisions on "s" conversion specifiers, > then Assert(!pointflag) in fmtstr() to catch new introductions. I won't plan > to do it myself, but it would be a nice little defensive change. This sounds like a good protection idea, but as it impacts existing backend code relying in sprintf's port version we should only do the assertion in HEAD in my opinion, and mention it in the release notes of the next major version at the time a patch in this area is applied. I guess that we had better backpatch the places using .*s though on back-branches. -- Michael