Bugs item #2912803, was opened at 2009-12-11 18:35
Message generated for change (Comment added) made by dannybackx
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=865514&aid=2912803&group_id=173455

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: fopen() fails with Japanese filenames - encoding mismatch

Initial Comment:
I've been trying to open files with Japanese characters in the filename using 
arm-wince-cegcc, v0.55.
I've recompiled with --enable-newlib-mb to enable multi-byte support. I've 
succeeded eventually but have had to fix a 'bug' in the newlib library,
however while I can make a simplistic patch up I need help on a proper fix.

I'm using filenames in UTF-8, I've called setlocal(C_TYPE,"C-UTF-8") which 
succeeds.

The problem seemed to occur in libc/sys/wince/cefixpath.c in the function 
XCEFixPathA(), which is called by fixpath().
Here's an extract for XCEFixPathA().

  MultiByteToWideChar(CP_ACP, 0, pathin, -1, wpathin, MAX_PATH);

  XCEFixPathW(wpathin, wpathout);

  WideCharToMultiByte(CP_ACP, 0,
              wpathout, -1,
              pathout, MAX_PATH,
              NULL, NULL);

It seems that the codepage CP_ACP (Windows ANSI default) can conflict with my 
codepage as set by setlocale(), because different multi-byte to wide-char 
functions are used in cefixpath.c and io.c (mbstowcs() in the function _open_r 
which is called by fopen). This conflict causes my UTF-8 string to get mangled 
up by the conversion to and from multi-byte chars in XCEFixPath().

My temporary fix has been to replace the code in XCEFixPath() with a simple / 
to \ replacement on an 8-bit string. Obviously this only works on ASCII or 
UTF-8 strings.

I include my sample source code along with trace and log output from this 
program compiled with a patched and unpatched version of newlib.
Can somebody please take a look and advise me on a better fix to this problem 
please?

----------------------------------------------------------------------

>Comment By: Danny Backx (dannybackx)
Date: 2009-12-12 07:35

Message:
A trick I've seen used to figure out the locale is
 int xx = setlocale("C", LC_ALL);
 (void) setlocale(xx, LC_ALL);

The first call sets locale to "C" but also tells you what it was, the
second call restores.
You can do this to figure out the locale in XCEFixPathA, and use xx
instead of CP_ACP.
Would that fix your problem ?

----------------------------------------------------------------------

Comment By: Danny Backx (dannybackx)
Date: 2009-12-12 07:15

Message:
Please contact adrian.skill...@novauris.com for further info. The report
got posted before I could add all my attachments.
Here is the trace log for the unpatched compiler though. Look how the
filename gets messed up by fixpath().

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=865514&aid=2912803&group_id=173455

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Cegcc-devel mailing list
Cegcc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cegcc-devel

Reply via email to