Miguel Ángel <rosen644...@gmail.com> writes: >> Isn't it possible to skip Unicode escapes in 'phase7_getc', instead of >> 'phase5_get'? Like the Python parser? >> > > No problem, but a change in 'phase5_getc' has to be done to store the > actual character, something like mixed_string_buffer to translate the > unicode codepoint to the local encoding.
Right. My previous suggestions seem to contradict each other. If we handle local encodings, extraction needs to be done in 'phase5_get'. >> > I am not very sure if I have to change always >> > 'xgettext_current_source_encoding'. I have looked into x-java.c code. >> >> The patch sets 'xgettext_current_source_encoding' to UTF-8 when it >> detects Unicode escapes. I guess it only works if the source code >> encoding (see "gcc -finput-charset") is UTF-8. >> >> I'm also not very sure how to handle this case though, maybe we should >> adjust to 'xgettext_global_source_encoding', if it is not ASCII? > > I have seen that iconv is used in CONVERT_STRING (in xgettext.c) to > translate each non-ASCII string to UTF-8. Is it the default encoding for > PO(T) files? Yes, UTF-8 is the default output encoding. However, the input encoding can be specified with --from-code option of xgettext, like this: $ xgettext -a --language=C --from-code=ISO-8859-1 -o latin1.po latin1.c Suppose that latin1.c contains an ISO-8859-1 string with Unicode escapes. If 'xgettext_current_source_encoding' is set to UTF-8, ISO-8859-1 part of the string will be treated as UTF-8 and thus cause erroneous conversion. So I'd suggest to first convert the Unicode characters given by Unicode escapes into the source encoding (in x-c.c), and then let 'remember_a_message' to convert them into UTF-8. Regards, -- Daiki Ueno