Bugs item #2912803, was opened at 2009-12-11 17:35
Message generated for change (Comment added) made by adrianskilling
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=865514&aid=2912803&group_id=173455

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: fopen() fails with Japanese filenames - encoding mismatch

Initial Comment:
I've been trying to open files with Japanese characters in the filename using 
arm-wince-cegcc, v0.55.
I've recompiled with --enable-newlib-mb to enable multi-byte support. I've 
succeeded eventually but have had to fix a 'bug' in the newlib library,
however while I can make a simplistic patch up I need help on a proper fix.

I'm using filenames in UTF-8, I've called setlocal(C_TYPE,"C-UTF-8") which 
succeeds.

The problem seemed to occur in libc/sys/wince/cefixpath.c in the function 
XCEFixPathA(), which is called by fixpath().
Here's an extract for XCEFixPathA().

  MultiByteToWideChar(CP_ACP, 0, pathin, -1, wpathin, MAX_PATH);

  XCEFixPathW(wpathin, wpathout);

  WideCharToMultiByte(CP_ACP, 0,
              wpathout, -1,
              pathout, MAX_PATH,
              NULL, NULL);

It seems that the codepage CP_ACP (Windows ANSI default) can conflict with my 
codepage as set by setlocale(), because different multi-byte to wide-char 
functions are used in cefixpath.c and io.c (mbstowcs() in the function _open_r 
which is called by fopen). This conflict causes my UTF-8 string to get mangled 
up by the conversion to and from multi-byte chars in XCEFixPath().

My temporary fix has been to replace the code in XCEFixPath() with a simple / 
to \ replacement on an 8-bit string. Obviously this only works on ASCII or 
UTF-8 strings.

I include my sample source code along with trace and log output from this 
program compiled with a patched and unpatched version of newlib.
Can somebody please take a look and advise me on a better fix to this problem 
please?

----------------------------------------------------------------------

Comment By: Adrian Skilling (adrianskilling)
Date: 2009-12-15 11:10

Message:
Sorry. This can't work since MultiByteToWideChar cannot accept a string for
the locale, it only accepts a small limited set of code pages such as
CP_ACP, CP_UTF7 and CP_UTF8. MultiByteToWideChar has an advantage that it
can be given a code page but this advantage is not used because the code
page is fixed to CP_ACP.

I suggest that MultiByteToWideChar() is replaced with mbstowcs() which
would then make it consistent with that used in fopen() [in _open_r()
specifically). I shall try this on my version. But I can't be sure it would
work well for all languages. I'll get back.

----------------------------------------------------------------------

Comment By: Danny Backx (dannybackx)
Date: 2009-12-12 06:35

Message:
A trick I've seen used to figure out the locale is
 int xx = setlocale("C", LC_ALL);
 (void) setlocale(xx, LC_ALL);

The first call sets locale to "C" but also tells you what it was, the
second call restores.
You can do this to figure out the locale in XCEFixPathA, and use xx
instead of CP_ACP.
Would that fix your problem ?

----------------------------------------------------------------------

Comment By: Danny Backx (dannybackx)
Date: 2009-12-12 06:15

Message:
Please contact adrian.skill...@novauris.com for further info. The report
got posted before I could add all my attachments.
Here is the trace log for the unpatched compiler though. Look how the
filename gets messed up by fixpath().

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=865514&aid=2912803&group_id=173455

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Cegcc-devel mailing list
Cegcc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cegcc-devel

Reply via email to