I previously posted the following to the perl-win32-users list (early
yesterday). This has a few updates.
I have a Win32 Perl coding challenge to search a directory (WinXP, NTFS)
with a file specification pattern where the directory may contain files
with unicode / wide filenames in addition to ANSI filenames. Through
trial, error, and searches against Perl mailing list archives, it seems
apparent that Win32 Perl's builtin directory functions do not support /
return Win32 unicode / wide filenames. More specifically, the builtin
functions return filenames like "??????_HostID_2006-01-19_213218.xls"
when the filename contains unicode / wide characters (the same as the
DOS "dir" command). The problem is that you can't pass these filenames
to something like stat or via OLE to ask Excel to open it.
I saw other postings to various Perl lists that referenced the Win32 API
directory search functions FindFirstFile, FindNextFile, and FindClose.
I also noticed that these functions are not packaged in
"Win32API::File" -- the title of which is "Low-level access to Win32
system API calls for files/dirs" (hmm, why are these low level Win32 API
directory functions not included in Win32API::File?).
I was able to code a test script that used the ANSI versions of these
Win32API calls but it produced the same results as Perl's builtin
directory functions: it returned filenames like
"??????_HostID_2006-01-19_213218.xls" when the filename contains unicode
/ wide characters. Which is to be expected.
I then found a posting from Jan Dubois (9 Dec 2005, perl5-porters) that
suggested the solution to this problem was to use Win32::OLE and the
Scripting.FileSystemObject. I was able to successfully implement this
method with one shortcoming: the Scripting.FileSystemObject does not
support directory searches -- only directory listings (as best I can
tell.) THANK YOU JAN DUBOIS!
Being stubborn and curious, I went back to fiddling with the Win32 API
directory search functions to see if I could make the "wide" version of
these calls work (starting with example code posted to various Perl
lists by others -- whom I thank). I think I'm close to getting these
functions to work... but I'm a novice at implementing Win32 API
functions in Perl -- and using Perl's Unicode functions.
The following example script runs (and sometimes crashes) but for
filenames with only ANSI characters, $FileInfo->{cFileName}, seems to
contain only the first character. And for filenames that start with
unicode, wide characthers, $FileInfo->{cFileName}, seems to contain
only the leading unicode/wide characters. In the data dump output, the
buffer seems to show the full 16bit unicode file name (e.g.,
"t^!ki<ŠeQ›R_ H o s t I D _ 2 0 0 6 - 0 1 - 1 9 _ 2 1 4 3 5 8 . x l
s"). I suspect that the spaces between the ANSI characters are null
(\0) characters. And I suspect that the Perl Win32 interface layer
treats these as null terminated C strings -- as opposed to 16 bit
unicode characters -- and therefore terminates the string at the first
null byte it encounters.
Various Perl unicode documents indicate that the Win32 API unicode
format is UTF-16LE. But decoding "$FileInfo->{cFileName}" using
UTF-16LE doesn't seem to work any way that I've tried it.
I'm also not sure of the proper array dimension for cFileName (and
cAlternateFilename) in the WIN32_FIND_DATAW struct. In the ANSI version
of this structure, cFileName is a TCHAR of dimension 260 (MAX_PATH)
where TCHAR is a single byte (according to Win32:API::Type->sizeof). In
the WIDE version (WIN32_FIND_DATAW, it's a WHCAR of the same dimension
-- but WCHAR is 2 bytes. When I make cFileName and cAltnerateFile
TCHARs of dimension 260 and 14 respectively, Perl crashes. And it also
crashes when I make them WCHARs of the same dimension. Only when I
double the dimensions to 520 and 28 does the script run without crashing
- using either TCHAR or WCHAR.
Any ideas on how to make these functions work correctly would be greatly
appreciated. And if I'm missing something obvious or doing something
dumb, please don't hesitate to point that out :-).
Regards,
... Dewey
use strict;
use Win32::API;
use Data::Dumper; $Data::Dumper::Indent=1; $Data::Dumper::Sortkeys=1;
use Encode qw(encode decode);
use Unicode::String;
use Devel::Peek;
use English;
$OUTPUT_AUTOFLUSH=1;
$Win32::API::DEBUG = 0;
binmode(STDOUT, ":utf8");
use constant ERROR_NO_MORE_FILES => 18;
use constant INVALID_HANDLE_VALUE => -1;
print "tchar is known: ", Win32::API::Type->is_known("TCHAR"), "\n";
print "wchar is known: ", Win32::API::Type->is_known("WCHAR"), "\n";
print "sizeof tchar is: ", Win32::API::Type->sizeof("TCHAR"), "\n";
print "sizeof wchar is: ", Win32::API::Type->sizeof("WCHAR"), "\n";
Win32::API::Struct-> typedef('FILETIME', qw(
DWORD dwLowDateTime;
DWORD dwHighDateTime;
)); # 8 bytes
use constant FILE_ATTRIBUTE_READONLY => 0x00000001;
use constant FILE_ATTRIBUTE_HIDDEN => 0x00000002;
use constant FILE_ATTRIBUTE_SYSTEM => 0x00000004;
use constant FILE_ATTRIBUTE_DIRECTORY => 0x00000010;
use constant FILE_ATTRIBUTE_ARCHIVE => 0x00000020;
use constant FILE_ATTRIBUTE_NORMAL => 0x00000080;
use constant FILE_ATTRIBUTE_TEMPORARY => 0x00000100;
use constant FILE_ATTRIBUTE_COMPRESSED => 0x00000800;
use constant MAX_PATH => 260;
Win32::API::Struct-> typedef('WIN32_FIND_DATAW', qw(
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
DWORD nFileSizeHigh;
DWORD nFileSizeLow;
DWORD dwReserved0;
DWORD dwReserved1;
WCHAR cFileName[520];
WCHAR cAlternateFileName[28];
));
my $FindFirstFile = Win32::API->new('kernel32.dll', 'FindFirstFileW',
'PS', 'N') or die "FindFirstFile: $^E";
my $FindNextFile = Win32::API->new('kernel32.dll', 'FindNextFileW',
'NS', 'I') or die "FindNextFile $^E";
my $FindClose = Win32::API->new('kernel32.dll', 'FindClose', 'N',
'I') or die "FileClose $^E";
my $FileSpec = "//?/C:/My Documents/Tool/*.xls\0";
my $FileInfo = Win32::API::Struct-> new('WIN32_FIND_DATAW');
#print Data::Dumper-> Dump([$FileSpec, $FileInfo], [qw($FileSpec
$FileInfo)]);
my $uFileSpec = Unicode::String->new;
$uFileSpec->utf8($FileSpec);
print "FileSpec = ", $uFileSpec->as_string, "\n";
my $handle = $FindFirstFile-> Call($uFileSpec->utf16le, $FileInfo);
#my $handle = $FindFirstFile-> Call(encode("UTF-16LE", $FileSpec),
$FileInfo);
if ($handle == INVALID_HANDLE_VALUE) {
printf "Error is %d - %s\n", Win32::GetLastError (),
Win32::FormatMessage (Win32::GetLastError ());
exit(1);
} else {
print "FindFirstFile worked\n";
Dump $FileInfo->{cFileName};
#print Data::Dumper-> Dump([$FileInfo], [qw($FileInfo)]);
my $ufn = Unicode::String->new;
$ufn->utf16le($FileInfo->{cFileName});
print "first filename = ", $ufn->as_string, "\n";
print "first filename = '", $FileInfo->{cFileName}, "'\n";
#print "first filename = ", decode("UTF-16LE", $FileInfo->{cFileName}
), "\n";
while (my $result = $FindNextFile->Call($handle,$FileInfo)) {
Dump $FileInfo->{cFileName};
$ufn->utf16le($FileInfo->{cFileName});
print "next filename = ", $ufn->as_string, "\n";
print "next filename = '", $FileInfo->{cFileName}, "'\n";
#print "next filename = ", decode("UTF-16LE",
$FileInfo->{cFileName}
), "\n";
#print Data::Dumper-> Dump([$FileInfo], [qw($FileInfo)]);
}
}
$FindClose->Call($handle) or die "FindClose $^E";