I like your idea about packing and unpacking the data - in place of using Win32::API::Struct to create a WIN32_FIND_DATAW structure. But fiddling with this incessantly over a few days (more), it strikes me that the fundamental requirement is to get Perl to recognize the "cFileName" field returned from the Win32 API FindFirstFileW and FindNextFileW calls as a UTF-16LE formatted character string that may have lots of null bytes using either pack/unpack or the struct method.
I'm getting well beyond my normal "comfort level" with Perl here -- so I greatly appreciate any experienced insight ... I've tried using the "Devel::Peek" module to see how Perl is setting these variables. The following simple test script produces the associated output. _______ BEGIN ______ use strict; use Encode qw(encode decode); use Devel::Peek; my $teststr = "This is a test"; Dump $teststr; my $test16le = encode("UTF-16LE", $teststr); Dump $test16le; ______ END ______ SV = PV(0x22611c) at 0x1831fd4 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x1822b6c "This is a test"\0 CUR = 14 LEN = 15 SV = PV(0x18cc948) at 0x1831f38 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x18f0814 "T\0h\0i\0s\0 \0i\0s\0 \0a\0 \0t\0e\0s\0t\0"\0 CUR = 28 LEN = 29 _____END OUTPUT _____ Using "Devel::Peek::Dump" to peek / dump the flags and content of the cFileName component of the WIN32_FIND_DATAW struct ($FileInfo->{cFileName}) that is returned by a call to FindFirstFileW (or FindNextFileW) produces the following: SV = PV(0x1aa2aec) at 0x1aac990 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x1ab68ac "F"\0 CUR = 1 LEN = 2 And F is the correct first letter of the filename -- but the Perl CUR and LEN fields (and maybe the flags) are not set correctly -- because the output of Data::Dumper shows that the entire filename in UTF-16LE format is there (in the buffer). Again, it seems that Perl sees the first null (\0) byte as the string terminator and sets the CUR and LEN fields accordingly. So making a great leap of reasoning from this simple test... the challenge is to get Perl to set the CUR and LEN flags correctly for the "cFileName" value returned by the Win32API wide directory calls using either the pack/unpack or the Win32::API:Struct method. In the perldoc "perlunicode", there are two short, relevant sections: " When Unicode Does Not Happen" and "Forcing Unicode in Perl (Or Unforcing Unicode in Perl)". The latter shows a function "utf8::upgrade" that "force[s] Perl to believe that a byte string is UTF-8" -- but there is no corresponding utf16le::upgrade function to force Perl to believe that a byte string is UTF-16LE. And it seems that UTF-16LE is the standard format returned by the Win32 API "wide" calls. All this leads me to believe that the solution is to add an option to Win32::API (or Win32API::File) analogous to the following available in Win32::OLE. Win32::OLE-> Option(CP => Win32::OLE::CP_UTF8); So something like: Win32::API->Option(CP => Win32::API::CP_UTF8); or maybe: Win32::API->Option(CP => "UTF-16LE"); This would tell the libwin32 C code layer to recognize and appropriately handle the UTF-16LE formatted character strings returned by the wide directory calls. If this is way off base, .... Regards, ... Dewey "$Bill Luebkert" <[EMAIL PROTECTED]> 01/26/2006 01:52 AM To D D Allen/Fairfax/[EMAIL PROTECTED] cc libwin32@perl.org Subject Re: Win32 API, Directories with Unicode / Wide Filenames, FindFirstFileW, FindNextFileW Dewey Allen wrote: > > The following example script runs (and sometimes crashes) but for > filenames with only ANSI characters, $FileInfo->{cFileName}, seems to > contain only the first character. And for filenames that start with > unicode, wide characthers, $FileInfo->{cFileName}, seems to contain > only the leading unicode/wide characters. In the data dump output, the > buffer seems to show the full 16bit unicode file name (e.g., > "t^!ki<ŠeQ›R_ H o s t I D _ 2 0 0 6 - 0 1 - 1 9 _ 2 1 4 3 5 8 . x l > s"). I suspect that the spaces between the ANSI characters are null > (\0) characters. And I suspect that the Perl Win32 interface layer > treats these as null terminated C strings -- as opposed to 16 bit > unicode characters -- and therefore terminates the string at the first > null byte it encounters. That's right - every other character is a null. > Various Perl unicode documents indicate that the Win32 API unicode > format is UTF-16LE. But decoding "$FileInfo->{cFileName}" using > UTF-16LE doesn't seem to work any way that I've tried it. > > I'm also not sure of the proper array dimension for cFileName (and > cAlternateFilename) in the WIN32_FIND_DATAW struct. In the ANSI version > of this structure, cFileName is a TCHAR of dimension 260 (MAX_PATH) > where TCHAR is a single byte (according to Win32:API::Type->sizeof). In > the WIDE version (WIN32_FIND_DATAW, it's a WHCAR of the same dimension > -- but WCHAR is 2 bytes. When I make cFileName and cAltnerateFile > TCHARs of dimension 260 and 14 respectively, Perl crashes. And it also > crashes when I make them WCHARs of the same dimension. Only when I > double the dimensions to 520 and 28 does the script run without crashing > - using either TCHAR or WCHAR. > > Any ideas on how to make these functions work correctly would be greatly > appreciated. And if I'm missing something obvious or doing something > dumb, please don't hesitate to point that out :-). I would drop the use of the 'struct' and pack a pointer to your own block of packed data. Win32::API::Struct is buggy and isn't handling the unpack of the array properly - do the packing and unpacking yourself.