Re: Preparsing sprintf format strings
[EMAIL PROTECTED] (Ross Ridge) writes: > The entire parsing of the format string is affected by the multi-byte > character encoding. I don't know how GCC would be able tell that a byte > with the same value as '%' in the middle of string would actually be > interpreted as '%' character rather than a part of an extended multibyte > character. This can easily happen with the ISO 2022-JP encoding. The compiler is supposed to know the encoding of the strings. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: Preparsing sprintf format strings
[EMAIL PROTECTED] (Ross Ridge) writes: > The entire parsing of the format string is affected by the multi-byte > character encoding. I don't know how GCC would be able tell that a byte > with the same value as '%' in the middle of string would actually be > interpreted as '%' character rather than a part of an extended multibyte > character. This can easily happen with the ISO 2022-JP encoding. Andreas Schwab writes: > The compiler is supposed to know the encoding of the strings. The compiler can't in general know what encoding that printf, fprintf, and sprintf will use to parse the string. It's locale dependent. Ross Ridge
Re: Preparsing sprintf format strings
Ross Ridge wrote: > The compiler can't in general know what encoding that printf, fprintf, > and sprintf will use to parse the string. It's locale dependent. Does this mean it can vary from one run of the program to another? I'll admit I don't understand locales very well, but doesn't this sound like a recipe for security holes? Bernd -- This footer brought to you by insane German lawmakers. Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368 Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Re: Preparsing sprintf format strings
Andreas Schwab writes: The compiler is supposed to know the encoding of the strings. The compiler can't in general know what encoding that printf, fprintf, and sprintf will use to parse the string. It's locale dependent. It is undefined what happens if you run a program in a different charset than in the one you specified for -fexec-charset. (locale != charset). A google code search for printf.*\\x1[bB][($].*%s hints that this is not a problem in practice. Paolo
Re: Preparsing sprintf format strings
Andreas Schwab wrote: [EMAIL PROTECTED] (Ross Ridge) writes: The entire parsing of the format string is affected by the multi-byte character encoding. I don't know how GCC would be able tell that a byte with the same value as '%' in the middle of string would actually be interpreted as '%' character rather than a part of an extended multibyte character. This can easily happen with the ISO 2022-JP encoding. The compiler is supposed to know the encoding of the strings. More precisely, you should tell the compiler about the input and execution charset. For example, this should make sure that a statement like this printf ("\x1B$B%s\x1B(B"); ^^ this %s is not a printf escape! does not yield a -Wformat warning even with -finput-charset=ISO-2022-JP -fexec-charset=ISO-2022-JP. Currently, the above program *does* yield a warning, though (PR33748). Paolo
Re: Preparsing sprintf format strings
Ross Ridge writes: >The compiler can't in general know what encoding that printf, fprintf, >and sprintf will use to parse the string. It's locale dependent. Paolo Bonzini writes: >It is undefined what happens if you run a program in a different charset >than in the one you specified for -fexec-charset. (locale != charset). I don't think that's true, but regardless many systems have runtime character sets that are dependent on locale. If GCC doesn't support this, then GCC is broken. >A google code search for printf.*\\x1[bB][($].*%s hints that this is >not a problem in practice. In practice, probably not. I doubt there are any ASCII based systems that actually support stateful encodings like ISO 2202-JP in their C runtimes. There is at least one EBCDIC based systems, that fully supports stateful encodings, but I don't know if in these encodings '%' byte values can appear outside of the initial shift state. Ross Ridge
Re: Preparsing sprintf format strings
Ross Ridge wrote: >The compiler can't in general know what encoding that printf, fprintf, >and sprintf will use to parse the string. It's locale dependent. Bernd Schmidt writes: >Does this mean it can vary from one run of the program to another? Yes, that's the whole point having locales. So a single program can work with more than one language. In fact locales can chage during the execution of a program. > I'll admit I don't understand locales very well, but doesn't this > sound like a recipe for security holes? A program has to explicitly call setlocale() to change the locale to anything other than the default "C" locale. Ross Ridge
Re: Preparsing sprintf format strings
[EMAIL PROTECTED] (Ross Ridge) writes: > Ross Ridge writes: > >The compiler can't in general know what encoding that printf, fprintf, > >and sprintf will use to parse the string. It's locale dependent. > > Paolo Bonzini writes: > >It is undefined what happens if you run a program in a different charset > >than in the one you specified for -fexec-charset. (locale != charset). > > I don't think that's true, but regardless many systems have runtime > character sets that are dependent on locale. If GCC doesn't support this, > then GCC is broken. I don't think it's unreasonable to insist that you tell the compiler a character set that matches the one you are using at execution time for string literals. GCC does of course fully support varying character sets at runtime for string *variables*.
Re: Preparsing sprintf format strings
Ross Ridge writes: The entire parsing of the format string is affected by the multi-byte > character encoding. I don't know how GCC would be able tell that a byte > with the same value as '%' in the middle of string would actually be > interpreted as '%' character rather than a part of an extended multibyte > character. This can easily happen with the ISO 2022-JP encoding. Michael Meissner writes: > Yes, and the ISO standard for C says that the compiler must be told what > locale to use when parsing string constants anyway, since the compiler > must behave as if it did a mbtowc on the source file. The compiler needs to know the source character set both to parse the string literal and to translate it into the execution character set. It doesn't need to know, nor can it generally know, the locale dependent character set that the standard library will use when parsing printf format strings. Ross Ridge
Re: Preparsing sprintf format strings
[EMAIL PROTECTED] (Ross Ridge) writes: > I don't think that's true, but regardless many systems have runtime > character sets that are dependent on locale. If GCC doesn't support this, > then GCC is broken. Geoffrey Keating writes: > I don't think it's unreasonable to insist that you tell the compiler a > character set that matches the one you are using at execution time for > string literals. It's completely unreasonable. I should be able put whatever byte values I want into strings literal, using octal and hexidecimal escapes if necessary, regardless of what locale might be at runtime or what GCC thinks the execution character set is. It would be absurd for code like like fprintf(f, "\xFF\xFF"); to be undefined only because GCC thinks the execution character set is UTF-8 or ASCII. Ross Ridge
Re: Preparsing sprintf format strings
On Thu, Oct 11, 2007 at 07:57:57PM -0400, [EMAIL PROTECTED] wrote: > Heikki Linnakangas writes: > >The only features in the printf-family of functions that depends on the > >locale are the conversion with thousand grouping ("%'d"), and glibc > >extension of using locale's alternative output digits ("%Id"). > > The entire parsing of the format string is affected by the multi-byte > character encoding. I don't know how GCC would be able tell that a byte > with the same value as '%' in the middle of string would actually be > interpreted as '%' character rather than a part of an extended multibyte > character. This can easily happen with the ISO 2022-JP encoding. > > Ross Ridge Yes, and the ISO standard for C says that the compiler must be told what locale to use when parsing string constants anyway, since the compiler must behave as if it did a mbtowc on the source file. For example, when I was on the ISO X3J11 standards committee that eventually produced the C-90 standard, one of the considerations was that it might be possible to have a multibyte encoding that used " or ' as the second byte. ISO 2022-JP was certainly one of the encodings that were talked about in the meetings. -- Michael Meissner, AMD 90 Central Street, MS 83-29, Boxborough, MA, 01719, USA [EMAIL PROTECTED]
gcc-4.3-20071012 is now available
Snapshot gcc-4.3-20071012 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20071012/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.3 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 129277 You'll find: gcc-4.3-20071012.tar.bz2 Complete GCC (includes all of below) gcc-core-4.3-20071012.tar.bz2 C front end and core compiler gcc-ada-4.3-20071012.tar.bz2 Ada front end and runtime gcc-fortran-4.3-20071012.tar.bz2 Fortran front end and runtime gcc-g++-4.3-20071012.tar.bz2 C++ front end and runtime gcc-java-4.3-20071012.tar.bz2 Java front end and runtime gcc-objc-4.3-20071012.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.3-20071012.tar.bz2The GCC testsuite Diffs from 4.3-20071005 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.3 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
re: Should build-sysroot be considered for setting inhibit_libc?
Stephen M. Kenton asked: > Should specifiying newlib in the absence of the newlib source continue > to be treated as meaning "force inhibit_libc" in some cases, or should > inhibit_libc just be exposed if that is desirable? FWIW, crosstool.sh has this little snippet in it: # Building the bootstrap gcc requires either setting inhibit_libc, or # having a copy of stdio_lim.h... see # http://sources.redhat.com/ml/libc-alpha/2003-11/msg00045.html cp bits/stdio_lim.h $HEADERDIR/bits/stdio_lim.h If it'd be cleaner to let the caller directly force inhibit_libc, please do. - Dan