Hi all, The Fortran front-end now handles wide character strings (UCS-4/UTF-32); for these, the string literals are emitted as strings with the type of an array of unsigned 32-bit integers. The issue is that tree-pretty-print.c, in pretty_print_string() assumes strings are composed of chars and NUL-terminated. This fails, for example, if you look at the tree dump for the following Fortran source file:
subroutine foo call test(4_"I'm here!") end subroutine foo you currently get: foo () { test (&"I"[1]{lb: 1 sz: 4}, 9); On my little-endian compiler, "I'm here!" is in UTF-32: "I\0\0\0'\0\0\0m\0\0\0 \0\0\0h\0\0\0e\0\0\0r\0\0\0e\0\0\0!\0\0\0". So, tree-pretty-print.c stops at the first '\0', and we get "l". To make this work better, as STRING_CST's have an attached length (TREE_STRING_LENGTH), I suggest using that to output the full string length, instead of stopping at the first NUL character. With that patch, the tree dump for the same Fortran source file looks like this: test (&"I\0\0\0\'\0\0\0m\0\0\0 \0\0\0h\0\0\0e\0\0\0r\0\0\0e\0\0\0!\0\0\0"[1]{lb: 1 sz: 4}, 9); and the tree dump for the following C testcase: unsigned char *foo(void) { return "look\0here"; } which was like this: return (unsigned char *) "look"; is now like this: return (unsigned char *) "look\0here\0"; Notice the added final '\0' in the C case; I don't know if it's bad to have it there, but I don't see a way to not output it and still have the correct output for Fortran (whose strings are not NUL-terminated). Any comments? Is it OK to commit as is? It bootstraps and regtests fine on x86_64-linux, with C and Fortran enabled, except for gcc.dg/tree-ssa/builtin-{v,}{f,}printf-1.c which need their scan-tree-dump patterns adjusted accordingly. If there is no objection, I'll do that and build and regtest C++, objc and objc++ as well before going ahead. Thanks, FX -- FX Coudert http://www.homepages.ucl.ac.uk/~uccafco/
wide_char_part6_gcc.diff
Description: Binary data