On 08/08/2018 10:14 PM, Jeff Law wrote:
On 08/04/2018 12:46 PM, Martin Sebor wrote:
The sprintf handling of wide characters neglects to consider
that calling the function may fail due to a conversion error
(when the wide character is invalid or not representable in
the current locale). The handling also misinterprets
the POSIX %S wide string directive as a plain narrow %s and
doesn't include %C (the POSIX equivalent of %lc). The attached
patch corrects these oversights by extending the data structures
to indicate when a directive may fail, and extending the UNDER4K
member of the format_result structure to also encode calls with
such directives.
Tested on x86_64-linux.
Besides the trunk, since this bug can affect code correctness
I'd like to backport this patch to both release branches (7
and 8).
Martin
gcc-86853.diff
PR tree-optimization/86853 - sprintf optimization for wide strings doesn't
account for conversion failure
gcc/ChangeLog:
PR tree-optimization/86853
* gimple-ssa-sprintf.c (struct format_result): Rename member.
(struct fmtresult): Add member and initialize it in ctors.
(format_character): Handle %C. Extend range to NUL. Set MAYFAIL.
(format_string): Handle %S the same as %ls. Set MAYFAIL.
(format_directive): Set POSUNDER4K when MAYFAIL is set.
(parse_directive): Handle %C same as %c.
(sprintf_dom_walker::compute_format_length): Adjust.
(is_call_safe): Adjust.
gcc/testsuite/ChangeLog:
PR tree-optimization/86853
* gcc.dg/tree-ssa/builtin-sprintf-10.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-11.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: Adjust.
Mostly OK. One nit noted below and one question. If you're
sufficiently confident that the charmap test is OK, then you just need
to address the nit in the comment and you're good to go.
Index: gcc/gimple-ssa-sprintf.c
===================================================================
--- gcc/gimple-ssa-sprintf.c (revision 263295)
+++ gcc/gimple-ssa-sprintf.c (working copy)
@@ -2217,13 +2224,20 @@ format_character (const directive &dir, tree arg,
res.range.likely = 0;
res.range.unlikely = 0;
}
- else if (min > 0 && min < 128)
+ else if (min >= 0 && min < 128)
{
+ /* Be conservative if the target execution character set
+ is not a 1-to-1 mapping to the source character set or
+ if the source set is not ASCII. */
+ bool one_2_one_ascii
+ = (target_to_host_charmap[0] == 1 && target_to_host ('a') ==
97);
Hmm. Is this really sufficient? I have nowhere near enough knowledge
of the potential target character sets to know if this is sufficiently
tight.
I believe it is safe with all but EBCDIC character sets (in fact,
the target_to_host_charmap[0] == 1 condition is unnecessary: it's
implied by the target-to-host inequality). It's only used to
avoid excessive noise from warnings (max), otherwise the output
range for all multibyte characters is set to [1, MB_LEN_MAX] which
is as permissive as it can be.
I verified it with the available locales on AIX and on Linux.
I don't have access to Solaris but it implements (and documents)
the C99 identity requirement that char == wchar_t for all narrow
chars. Windows uses UTF-16 for wchar_t so it also holds there.
We can also set may_fail to true too. I didn't do it only
because of the above.
Martin
PS I also tested a few oddball Linux charmaps with apparent gaps
in the low 127 bytes. A number of them are invalid and cause
GCC to ICE, e.g., DIN_66003: I track it in bug 82700 (it would
be nice to handle this more gracefully). Some other invalid
ones fail with a source code conversion error (e.g. LATIN-GREEK
which doesn't contain all the letters of the English alphabet).
I haven't found where the 1-to-1 correspondence doesn't hold
@@ -2302,6 +2320,10 @@ format_string (const directive &dir, tree arg, vr_
/* Even a non-empty wide character string need not convert into
any bytes. */
res.range.min = 0;
+
+ /* A non-emtpy wide character conversion may fail. */
s/emtpy/empty/
Jeff