I like your idea about packing and unpacking the data - in place of using 
Win32::API::Struct to create a WIN32_FIND_DATAW structure.  But fiddling 
with this incessantly over a few days (more), it strikes me that the 
fundamental requirement is to get Perl to recognize the "cFileName" field 
returned from the Win32 API FindFirstFileW and FindNextFileW calls as a 
UTF-16LE formatted character string that may have lots of null bytes using 
either pack/unpack or the struct method. 

I'm getting well beyond my normal "comfort level" with Perl here -- so I 
greatly appreciate any experienced insight ...

I've tried using the "Devel::Peek" module to see how Perl is setting these 
variables.  The following simple test script produces the associated 
output.


_______ BEGIN ______

use strict;

use Encode qw(encode decode);
use Devel::Peek;

my $teststr = "This is a test";
Dump $teststr;

my $test16le = encode("UTF-16LE", $teststr);
Dump $test16le;

______ END ______

SV = PV(0x22611c) at 0x1831fd4
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,POK,pPOK)
  PV = 0x1822b6c "This is a test"\0
  CUR = 14
  LEN = 15
SV = PV(0x18cc948) at 0x1831f38
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,POK,pPOK)
  PV = 0x18f0814 "T\0h\0i\0s\0 \0i\0s\0 \0a\0 \0t\0e\0s\0t\0"\0
  CUR = 28
  LEN = 29

_____END OUTPUT _____


Using "Devel::Peek::Dump" to peek / dump the flags and content of the 
cFileName component of the WIN32_FIND_DATAW struct 
($FileInfo->{cFileName}) that is returned by a call to FindFirstFileW (or 
FindNextFileW) produces the following:

SV = PV(0x1aa2aec) at 0x1aac990
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x1ab68ac "F"\0
  CUR = 1
  LEN = 2

And F is the correct first letter of the filename -- but the Perl CUR and 
LEN fields (and maybe the flags) are not set correctly -- because the 
output of Data::Dumper shows that the entire filename in UTF-16LE format 
is there (in the buffer).   Again, it seems that Perl sees the first null 
(\0) byte as the string terminator and sets the CUR and LEN fields 
accordingly.

So making a great leap of reasoning from this simple test... the challenge 
is to get Perl to set the CUR and LEN flags correctly for the "cFileName" 
value returned by the Win32API wide directory calls using either the 
pack/unpack or the Win32::API:Struct method.

In the perldoc "perlunicode", there are two short, relevant sections: "
When Unicode Does Not Happen" and "Forcing Unicode in Perl (Or Unforcing 
Unicode in Perl)".  The latter shows a function "utf8::upgrade" that 
"force[s] Perl to believe that a byte string is UTF-8" -- but there is no 
corresponding utf16le::upgrade function to force Perl to believe that a 
byte string is UTF-16LE.   And it seems that UTF-16LE is the standard 
format returned by the Win32 API "wide" calls.

All this leads me to believe that the solution is to add an option to 
Win32::API (or Win32API::File) analogous to the following available in 
Win32::OLE.

Win32::OLE-> Option(CP => Win32::OLE::CP_UTF8);

So something like:

Win32::API->Option(CP => Win32::API::CP_UTF8);

or maybe:

Win32::API->Option(CP => "UTF-16LE");

This would tell the libwin32 C code layer to recognize and appropriately 
handle the UTF-16LE formatted character strings returned by the wide 
directory calls. 

If this is way off base, ....

Regards,

... Dewey





"$Bill Luebkert" <[EMAIL PROTECTED]> 
01/26/2006 01:52 AM

To
D D Allen/Fairfax/[EMAIL PROTECTED]
cc
libwin32@perl.org
Subject
Re: Win32 API, Directories with Unicode / Wide Filenames, FindFirstFileW, 
FindNextFileW






Dewey Allen wrote:
> 
> The following example script runs (and sometimes crashes) but for 
> filenames with only ANSI characters,  $FileInfo->{cFileName}, seems to 
> contain only the first character.  And for filenames that start with 
> unicode, wide characthers,  $FileInfo->{cFileName}, seems to contain 
> only the leading unicode/wide characters.   In the data dump output, the 

> buffer seems to show the full 16bit unicode file name (e.g., 
> "t^!ki<ŠeQ›R_ H o s t I D _ 2 0 0 6 - 0 1 - 1 9 _ 2 1 4 3 5 8 . x l 
> s").  I suspect that the spaces between the ANSI characters are null 
> (\0) characters.  And I suspect that the Perl Win32 interface layer 
> treats these as null terminated C strings -- as opposed to 16 bit 
> unicode characters -- and therefore terminates the string at the first 
> null byte it encounters.

That's right - every other character is a null.

> Various Perl unicode documents indicate that the Win32 API unicode 
> format is UTF-16LE.   But decoding "$FileInfo->{cFileName}" using 
> UTF-16LE doesn't seem to work any way that I've tried it.
> 
> I'm also not sure of the proper array dimension for cFileName (and 
> cAlternateFilename) in the WIN32_FIND_DATAW struct.  In the ANSI version 

> of this structure, cFileName is a TCHAR of dimension 260 (MAX_PATH) 
> where TCHAR is a single byte (according to Win32:API::Type->sizeof). In 
> the WIDE version (WIN32_FIND_DATAW, it's a WHCAR of the same dimension 
> -- but WCHAR is 2 bytes.   When I make cFileName and cAltnerateFile 
> TCHARs of dimension 260 and 14 respectively, Perl crashes.  And it also 
> crashes when I make them WCHARs of the same dimension.  Only when I 
> double the dimensions to 520 and 28 does the script run without crashing 

> - using either TCHAR or WCHAR.
> 
> Any ideas on how to make these functions work correctly would be greatly 

> appreciated.  And if I'm missing something obvious or doing something 
> dumb, please don't hesitate to point that out :-).

I would drop the use of the 'struct' and pack a pointer to your own block 
of
packed data.  Win32::API::Struct is buggy and isn't handling the unpack of 
the
array properly - do the packing and unpacking yourself.



Reply via email to