Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11)

Daiki Ueno Fri, 15 Feb 2013 02:06:31 -0800

Miguel Ángel <rosen644...@gmail.com> writes:

>> Isn't it possible to skip Unicode escapes in 'phase7_getc', instead of
>> 'phase5_get'?  Like the Python parser?
>> 
>
> No problem, but a change in 'phase5_getc' has to be done to store the
> actual character, something like mixed_string_buffer to translate the
> unicode codepoint to the local encoding.


Right.  My previous suggestions seem to contradict each other.  If we
handle local encodings, extraction needs to be done in 'phase5_get'.

>> > I am not very sure if I have to change always
>> > 'xgettext_current_source_encoding'. I have looked into x-java.c code.
>> 
>> The patch sets 'xgettext_current_source_encoding' to UTF-8 when it
>> detects Unicode escapes.  I guess it only works if the source code
>> encoding (see "gcc -finput-charset") is UTF-8.
>> 
>> I'm also not very sure how to handle this case though, maybe we should
>> adjust to 'xgettext_global_source_encoding', if it is not ASCII?
>
> I have seen that iconv is used in CONVERT_STRING (in xgettext.c) to
> translate each non-ASCII string to UTF-8. Is it the default encoding for
> PO(T) files?

Yes, UTF-8 is the default output encoding.  However, the input encoding
can be specified with --from-code option of xgettext, like this:

$ xgettext -a --language=C --from-code=ISO-8859-1 -o latin1.po latin1.c

Suppose that latin1.c contains an ISO-8859-1 string with Unicode
escapes.  If 'xgettext_current_source_encoding' is set to UTF-8,
ISO-8859-1 part of the string will be treated as UTF-8 and thus cause
erroneous conversion.

So I'd suggest to first convert the Unicode characters given by Unicode
escapes into the source encoding (in x-c.c), and then let
'remember_a_message' to convert them into UTF-8.

Regards,
-- 
Daiki Ueno

Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11)

Reply via email to