Re: Preparsing sprintf format strings

2007-10-12 Thread Andreas Schwab
[EMAIL PROTECTED] (Ross Ridge) writes:

> The entire parsing of the format string is affected by the multi-byte
> character encoding.  I don't know how GCC would be able tell that a byte
> with the same value as '%' in the middle of string would actually be
> interpreted as '%' character rather than a part of an extended multibyte
> character.  This can easily happen with the ISO 2022-JP encoding.

The compiler is supposed to know the encoding of the strings.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: Preparsing sprintf format strings

2007-10-12 Thread Ross Ridge
[EMAIL PROTECTED] (Ross Ridge) writes:
> The entire parsing of the format string is affected by the multi-byte
> character encoding.  I don't know how GCC would be able tell that a byte
> with the same value as '%' in the middle of string would actually be
> interpreted as '%' character rather than a part of an extended multibyte
> character.  This can easily happen with the ISO 2022-JP encoding.

Andreas Schwab writes:
> The compiler is supposed to know the encoding of the strings.

The compiler can't in general know what encoding that printf, fprintf,
and sprintf will use to parse the string.  It's locale dependent.

Ross Ridge




Re: Preparsing sprintf format strings

2007-10-12 Thread Bernd Schmidt
Ross Ridge wrote:
> The compiler can't in general know what encoding that printf, fprintf,
> and sprintf will use to parse the string.  It's locale dependent.

Does this mean it can vary from one run of the program to another?  I'll
admit I don't understand locales very well, but doesn't this sound like
a recipe for security holes?


Bernd
-- 
This footer brought to you by insane German lawmakers.
Analog Devices GmbH  Wilhelm-Wagenfeld-Str. 6  80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif


Re: Preparsing sprintf format strings

2007-10-12 Thread Paolo Bonzini



Andreas Schwab writes:

The compiler is supposed to know the encoding of the strings.


The compiler can't in general know what encoding that printf, fprintf,
and sprintf will use to parse the string.  It's locale dependent.


It is undefined what happens if you run a program in a different charset 
than in the one you specified for -fexec-charset.  (locale != charset).


A google code search for printf.*\\x1[bB][($].*%s hints that this is not 
a problem in practice.


Paolo


Re: Preparsing sprintf format strings

2007-10-12 Thread Paolo Bonzini

Andreas Schwab wrote:

[EMAIL PROTECTED] (Ross Ridge) writes:


The entire parsing of the format string is affected by the multi-byte
character encoding.  I don't know how GCC would be able tell that a byte
with the same value as '%' in the middle of string would actually be
interpreted as '%' character rather than a part of an extended multibyte
character.  This can easily happen with the ISO 2022-JP encoding.


The compiler is supposed to know the encoding of the strings.


More precisely, you should tell the compiler about the input and 
execution charset.  For example, this should make sure that a statement 
like this


printf ("\x1B$B%s\x1B(B");
   ^^
 this %s is not a printf escape!

does not yield a -Wformat warning even with -finput-charset=ISO-2022-JP 
-fexec-charset=ISO-2022-JP.


Currently, the above program *does* yield a warning, though (PR33748).

Paolo


Re: Preparsing sprintf format strings

2007-10-12 Thread Ross Ridge
Ross Ridge writes:
>The compiler can't in general know what encoding that printf, fprintf,
>and sprintf will use to parse the string.  It's locale dependent.

Paolo Bonzini writes:
>It is undefined what happens if you run a program in a different charset
>than in the one you specified for -fexec-charset. (locale != charset).

I don't think that's true, but regardless many systems have runtime
character sets that are dependent on locale.  If GCC doesn't support this,
then GCC is broken.

>A google code search for printf.*\\x1[bB][($].*%s hints that this is
>not a problem in practice.

In practice, probably not.  I doubt there are any ASCII based systems that
actually support stateful encodings like ISO 2202-JP in their C runtimes.
There is at least one EBCDIC based systems, that fully supports stateful
encodings, but I don't know if in these encodings '%' byte values can
appear outside of the initial shift state.

Ross Ridge



Re: Preparsing sprintf format strings

2007-10-12 Thread Ross Ridge
Ross Ridge wrote:
>The compiler can't in general know what encoding that printf, fprintf,
>and sprintf will use to parse the string.  It's locale dependent.

Bernd Schmidt writes:
>Does this mean it can vary from one run of the program to another? 

Yes, that's the whole point having locales.  So a single program can
work with more than one language.  In fact locales can chage during the
execution of a program.

> I'll admit I don't understand locales very well, but doesn't this
> sound like a recipe for security holes?

A program has to explicitly call setlocale() to change the locale to
anything other than the default "C" locale.

Ross Ridge



Re: Preparsing sprintf format strings

2007-10-12 Thread Geoffrey Keating
[EMAIL PROTECTED] (Ross Ridge) writes:

> Ross Ridge writes:
> >The compiler can't in general know what encoding that printf, fprintf,
> >and sprintf will use to parse the string.  It's locale dependent.
> 
> Paolo Bonzini writes:
> >It is undefined what happens if you run a program in a different charset
> >than in the one you specified for -fexec-charset. (locale != charset).
> 
> I don't think that's true, but regardless many systems have runtime
> character sets that are dependent on locale.  If GCC doesn't support this,
> then GCC is broken.

I don't think it's unreasonable to insist that you tell the compiler a
character set that matches the one you are using at execution time for
string literals.  GCC does of course fully support varying character
sets at runtime for string *variables*.


Re: Preparsing sprintf format strings

2007-10-12 Thread Ross Ridge
Ross Ridge writes:
 The entire parsing of the format string is affected by the multi-byte
> character encoding.  I don't know how GCC would be able tell that a byte
> with the same value as '%' in the middle of string would actually be
> interpreted as '%' character rather than a part of an extended multibyte
> character.  This can easily happen with the ISO 2022-JP encoding.

Michael Meissner writes:
> Yes, and the ISO standard for C says that the compiler must be told what
> locale to use when parsing string constants anyway, since the compiler
> must behave as if it did a mbtowc on the source file.

The compiler needs to know the source character set both to parse the
string literal and to translate it into the execution character set.
It doesn't need to know, nor can it generally know, the locale dependent
character set that the standard library will use when parsing printf
format strings.

Ross Ridge



Re: Preparsing sprintf format strings

2007-10-12 Thread Ross Ridge
[EMAIL PROTECTED] (Ross Ridge) writes:
> I don't think that's true, but regardless many systems have runtime
> character sets that are dependent on locale.  If GCC doesn't support this,
> then GCC is broken.

Geoffrey Keating writes:
> I don't think it's unreasonable to insist that you tell the compiler a
> character set that matches the one you are using at execution time for
> string literals.

It's completely unreasonable.  I should be able put whatever byte values
I want into strings literal, using octal and hexidecimal escapes if
necessary, regardless of what locale might be at runtime or what GCC
thinks the execution character set is.  It would be absurd for code like
like fprintf(f, "\xFF\xFF"); to be undefined only because GCC thinks the
execution character set is UTF-8 or ASCII. 

Ross Ridge



Re: Preparsing sprintf format strings

2007-10-12 Thread Michael Meissner
On Thu, Oct 11, 2007 at 07:57:57PM -0400, [EMAIL PROTECTED] wrote:
> Heikki Linnakangas writes:
> >The only features in the printf-family of functions that depends on the
> >locale are the conversion with thousand grouping ("%'d"), and glibc
> >extension of using locale's alternative output digits ("%Id"). 
> 
> The entire parsing of the format string is affected by the multi-byte
> character encoding.  I don't know how GCC would be able tell that a byte
> with the same value as '%' in the middle of string would actually be
> interpreted as '%' character rather than a part of an extended multibyte
> character.  This can easily happen with the ISO 2022-JP encoding.
> 
>   Ross Ridge

Yes, and the ISO standard for C says that the compiler must be told what locale
to use when parsing string constants anyway, since the compiler must behave as
if it did a mbtowc on the source file.  For example, when I was on the ISO
X3J11 standards committee that eventually produced the C-90 standard, one of
the considerations was that it might be possible to have a multibyte encoding
that used " or ' as the second byte.  ISO 2022-JP was certainly one of the
encodings that were talked about in the meetings.

-- 
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
[EMAIL PROTECTED]




gcc-4.3-20071012 is now available

2007-10-12 Thread gccadmin
Snapshot gcc-4.3-20071012 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20071012/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.3 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 129277

You'll find:

gcc-4.3-20071012.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.3-20071012.tar.bz2 C front end and core compiler

gcc-ada-4.3-20071012.tar.bz2  Ada front end and runtime

gcc-fortran-4.3-20071012.tar.bz2  Fortran front end and runtime

gcc-g++-4.3-20071012.tar.bz2  C++ front end and runtime

gcc-java-4.3-20071012.tar.bz2 Java front end and runtime

gcc-objc-4.3-20071012.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.3-20071012.tar.bz2The GCC testsuite

Diffs from 4.3-20071005 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.3
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


re: Should build-sysroot be considered for setting inhibit_libc?

2007-10-12 Thread Dan Kegel
Stephen M. Kenton asked:
> Should specifiying newlib in the absence of the newlib source continue
> to be treated as meaning "force inhibit_libc" in some cases, or should
> inhibit_libc just be exposed if that is desirable?

FWIW, crosstool.sh has this little snippet in it:

# Building the bootstrap gcc requires either setting inhibit_libc, or
# having a copy of stdio_lim.h... see
# http://sources.redhat.com/ml/libc-alpha/2003-11/msg00045.html
cp bits/stdio_lim.h $HEADERDIR/bits/stdio_lim.h

If it'd be cleaner to let the caller directly force inhibit_libc,
please do.
- Dan