Lijuan Hai wrote: > > I have a plan to convert UCN to alphabet instead of UTF8 in > GCC-4.2.0, and already handled it in libcpp.
I would like to offer advice, but I don't understand what you are trying to do. You say you want to "convert UCN[s] to [an] alphabet instead of UTF8" but that doesn't make any sense. Alphabets are abstract sets of glyphs commonly used to write a language. They are not alternatives to UTF8 (a scheme for encoding integers as sequences of bytes) or even to Unicode (a mapping from integers to glyphs). The only thing I can guess is that you want to convert UCNs to some specific character set other than Unicode, like EUC-JP or ISO8859.n. In that case the first thing I must ask you is to read up on the -fexec-charset option, and to explain why that doesn't do what you need it to do. > But I encountered a problem when compiling the code like following: > -------------------cut------------------- > 1: #define str(t) #t > 2: int foo() > 3: { > 4: char* cc = str(\u1234); > 5: if (!strcmp(cc, "\u1234")) > 6: abort(); > 7: } > -------------------cut------------------- > With my changes, \u1234 is converted to alphabet in line 4 while > kept in line 5. It's incorrect and also unexpected to convert it in > line 4 for '#' makes it different from plain identifiers. As I don't know what you mean by "converted to alphabet", I can't say for sure, but if I had to guess, I'd say you inserted your code into the routines for scanning identifiers? But at that point there is no way to know that there is a '#' in effect. You need to postpone the conversion, whatever it is, until much later; the point where cpplib hands off identifiers to the compiler proper, or perhaps even the assembly output macros, depending on your goal. (Have you read the long comment at the top of libcpp/charset.c? Do you understand all of the fine distinctions made there?) zw