On Wed, May 29, 2019 at 03:03:13PM -0400, Tom Lane wrote: > Pursuant to today's discussion at PGCon about code coverage, I went > nosing into some of the particularly under-covered subdirectories > in our tree, and immediately tripped over an interesting factoid: > the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are > untested ... not because the regression tests don't try, but because > those conversions are unreachable. pg_do_encoding_conversion() and > its sister functions have hard-wired fast paths for any conversion > in which the source or target encoding is SQL_ASCII, so that an > encoding conversion function declared for such a case will never > be used.
> This situation seems kinda silly. My inclination is to delete > these functions as useless, but I suppose another approach is > to suppress the fast paths if there's a declared conversion function. > (Doing so would likely require added catalog lookups in places we > might not want them...) Removing the fast paths to make ascii_to_utf8() reachable would cause ERROR when server_encoding=SQL_ASCII, client_encoding=UTF8, and a query would otherwise send the client any character outside 7-bit ASCII. That's fairly defensible, but doing it for only UTF8 and MULE_INTERNAL is not. So if we like the ascii_to_utf8() behavior, I think the action would be to replace the fast path with an encoding-independent verification that all bytes are 7-bit ASCII. (The check would not apply when both server_encoding and client_encoding are SQL_ASCII, of course.) Alternately, one might prefer to replace the fast path with an encoding verification; in the SQL_ASCII-to-UTF8 case, we'd allow byte sequences that are valid UTF8, even though the validity may be a coincidence and mojibake may ensue. SQL_ASCII is for being casual about encoding, so it's not clear to me whether or not either prospective behavior change would be an improvement. However, I do find it clear to delete ascii_to_utf8() and ascii_to_mic(). > If we do delete them as useless, it might also be advisable to change > CreateConversionCommand() to refuse creation of conversions to/from > SQL_ASCII, to prevent future confusion. Sounds good.