I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc. For those not familiar with this issue, gcc takes
advantage of C99's 5.1.1.2p1b1 "implementation-defined manner" to
convert multibyte end-of-line indicators to newline characters. gcc
considers zero or more whitespace characters preceding a more
traditional CR and/or LF as the end-of-line indicator. This behavior
can cause differences in some code compared to compilers which do not
strip trailing whitespace off of lines. For example:
// comment \
int x;
int y;
Pretend there's one or more spaces or tabs after the '\'. gcc will
interpret this as:
A:
// comment int x;
int y;
while other compilers (Microsoft, EDG-based, CodeWarrior to name a
few) interpret it as:
B:
// comment
int x;
int y;
And depending on what you're trying to do, either A or B is the
"correct" answer. I've seen code broken either way (by assuming A
and having the compiler do B and vice-versa).
This issue has recently been discussed on the C standards reflector,
and though I was not privy to that discussion, my understanding is
that the likely resolution from this standards body will be that a
compiler implementing either A or B is conforming.
That being said, gcc to the best of knowledge, is the only modern
compiler to implement end-of-line whitespace stripping (yes I'm aware
of older compilers and dealing with punch cards). So on the basis of
conforming to a de-facto standard alone, I propose that gcc abandon
end-of-line whitespace stripping, or at least strip 2 or more
whitespace characters down to 1 space instead of to 0 spaces during
translation phase 1.
I realize that this change could break some existing code. But I am
also aware of existing code wishing to port to gcc which is broken by
gcc's current behavior. If we want gcc to "gain market share", does
it not make sense to "welcome" new comers when possible by adopting
what is otherwise industry-wide practice?
Thanks,
Howard