I previously posted the following to the perl-win32-users list (early yesterday). This has a few updates.

I have a Win32 Perl coding challenge to search a directory (WinXP, NTFS) with a file specification pattern where the directory may contain files with unicode / wide filenames in addition to ANSI filenames. Through trial, error, and searches against Perl mailing list archives, it seems apparent that Win32 Perl's builtin directory functions do not support / return Win32 unicode / wide filenames. More specifically, the builtin functions return filenames like "??????_HostID_2006-01-19_213218.xls" when the filename contains unicode / wide characters (the same as the DOS "dir" command). The problem is that you can't pass these filenames to something like stat or via OLE to ask Excel to open it.

I saw other postings to various Perl lists that referenced the Win32 API directory search functions FindFirstFile, FindNextFile, and FindClose. I also noticed that these functions are not packaged in "Win32API::File" -- the title of which is "Low-level access to Win32 system API calls for files/dirs" (hmm, why are these low level Win32 API directory functions not included in Win32API::File?).

I was able to code a test script that used the ANSI versions of these Win32API calls but it produced the same results as Perl's builtin directory functions: it returned filenames like "??????_HostID_2006-01-19_213218.xls" when the filename contains unicode / wide characters. Which is to be expected.

I then found a posting from Jan Dubois (9 Dec 2005, perl5-porters) that suggested the solution to this problem was to use Win32::OLE and the Scripting.FileSystemObject. I was able to successfully implement this method with one shortcoming: the Scripting.FileSystemObject does not support directory searches -- only directory listings (as best I can tell.) THANK YOU JAN DUBOIS!

Being stubborn and curious, I went back to fiddling with the Win32 API directory search functions to see if I could make the "wide" version of these calls work (starting with example code posted to various Perl lists by others -- whom I thank). I think I'm close to getting these functions to work... but I'm a novice at implementing Win32 API functions in Perl -- and using Perl's Unicode functions.

The following example script runs (and sometimes crashes) but for filenames with only ANSI characters, $FileInfo->{cFileName}, seems to contain only the first character. And for filenames that start with unicode, wide characthers, $FileInfo->{cFileName}, seems to contain only the leading unicode/wide characters. In the data dump output, the buffer seems to show the full 16bit unicode file name (e.g., "t^!ki<ŠeQ›R_ H o s t I D _ 2 0 0 6 - 0 1 - 1 9 _ 2 1 4 3 5 8 . x l s"). I suspect that the spaces between the ANSI characters are null (\0) characters. And I suspect that the Perl Win32 interface layer treats these as null terminated C strings -- as opposed to 16 bit unicode characters -- and therefore terminates the string at the first null byte it encounters.

Various Perl unicode documents indicate that the Win32 API unicode format is UTF-16LE. But decoding "$FileInfo->{cFileName}" using UTF-16LE doesn't seem to work any way that I've tried it.

I'm also not sure of the proper array dimension for cFileName (and cAlternateFilename) in the WIN32_FIND_DATAW struct. In the ANSI version of this structure, cFileName is a TCHAR of dimension 260 (MAX_PATH) where TCHAR is a single byte (according to Win32:API::Type->sizeof). In the WIDE version (WIN32_FIND_DATAW, it's a WHCAR of the same dimension -- but WCHAR is 2 bytes. When I make cFileName and cAltnerateFile TCHARs of dimension 260 and 14 respectively, Perl crashes. And it also crashes when I make them WCHARs of the same dimension. Only when I double the dimensions to 520 and 28 does the script run without crashing - using either TCHAR or WCHAR.

Any ideas on how to make these functions work correctly would be greatly appreciated. And if I'm missing something obvious or doing something dumb, please don't hesitate to point that out :-).

Regards,

... Dewey



use strict;
use Win32::API;
use Data::Dumper; $Data::Dumper::Indent=1; $Data::Dumper::Sortkeys=1;
use Encode qw(encode decode);
use Unicode::String;
use Devel::Peek;
use English;

$OUTPUT_AUTOFLUSH=1;

$Win32::API::DEBUG = 0;

binmode(STDOUT, ":utf8");


use constant ERROR_NO_MORE_FILES  => 18;
use constant INVALID_HANDLE_VALUE => -1;

print "tchar is known: ", Win32::API::Type->is_known("TCHAR"), "\n";
print "wchar is known: ", Win32::API::Type->is_known("WCHAR"), "\n";
print "sizeof tchar is: ", Win32::API::Type->sizeof("TCHAR"), "\n";
print "sizeof wchar is: ", Win32::API::Type->sizeof("WCHAR"), "\n";



Win32::API::Struct-> typedef('FILETIME', qw(
  DWORD dwLowDateTime;
  DWORD dwHighDateTime;
));                             # 8 bytes

use constant FILE_ATTRIBUTE_READONLY =>  0x00000001;
use constant FILE_ATTRIBUTE_HIDDEN =>  0x00000002;
use constant FILE_ATTRIBUTE_SYSTEM =>  0x00000004;
use constant FILE_ATTRIBUTE_DIRECTORY =>  0x00000010;
use constant FILE_ATTRIBUTE_ARCHIVE =>  0x00000020;
use constant FILE_ATTRIBUTE_NORMAL =>  0x00000080;
use constant FILE_ATTRIBUTE_TEMPORARY =>  0x00000100;
use constant FILE_ATTRIBUTE_COMPRESSED =>  0x00000800;
use constant MAX_PATH =>  260;

Win32::API::Struct-> typedef('WIN32_FIND_DATAW', qw(
  DWORD dwFileAttributes;
  FILETIME ftCreationTime;
  FILETIME ftLastAccessTime;
  FILETIME ftLastWriteTime;
  DWORD nFileSizeHigh;
  DWORD nFileSizeLow;
  DWORD dwReserved0;
  DWORD dwReserved1;
  WCHAR cFileName[520];
  WCHAR cAlternateFileName[28];
));


my $FindFirstFile = Win32::API->new('kernel32.dll', 'FindFirstFileW', 'PS', 'N') or die "FindFirstFile: $^E"; my $FindNextFile = Win32::API->new('kernel32.dll', 'FindNextFileW', 'NS', 'I') or die "FindNextFile $^E"; my $FindClose = Win32::API->new('kernel32.dll', 'FindClose', 'N', 'I') or die "FileClose $^E";


my $FileSpec = "//?/C:/My Documents/Tool/*.xls\0";

my $FileInfo = Win32::API::Struct-> new('WIN32_FIND_DATAW');
#print Data::Dumper-> Dump([$FileSpec, $FileInfo], [qw($FileSpec $FileInfo)]);

my $uFileSpec = Unicode::String->new;
$uFileSpec->utf8($FileSpec);
print "FileSpec = ", $uFileSpec->as_string, "\n";

my $handle = $FindFirstFile-> Call($uFileSpec->utf16le, $FileInfo);
#my $handle = $FindFirstFile-> Call(encode("UTF-16LE", $FileSpec), $FileInfo);

if ($handle == INVALID_HANDLE_VALUE) {
        printf "Error is %d - %s\n", Win32::GetLastError (),
          Win32::FormatMessage (Win32::GetLastError ());
        exit(1);
} else {
        print "FindFirstFile worked\n";
        
        Dump $FileInfo->{cFileName};
        #print Data::Dumper-> Dump([$FileInfo], [qw($FileInfo)]);
        
        my $ufn = Unicode::String->new;
        $ufn->utf16le($FileInfo->{cFileName});
        
        print "first filename = ", $ufn->as_string, "\n";
        print "first filename = '", $FileInfo->{cFileName}, "'\n";
#print "first filename = ", decode("UTF-16LE", $FileInfo->{cFileName} ), "\n";
        while (my $result = $FindNextFile->Call($handle,$FileInfo)) {
                Dump $FileInfo->{cFileName};
                $ufn->utf16le($FileInfo->{cFileName});
                print "next filename = ", $ufn->as_string, "\n";
                print "next filename = '", $FileInfo->{cFileName}, "'\n";
#print "next filename = ", decode("UTF-16LE", $FileInfo->{cFileName} ), "\n";
                #print Data::Dumper-> Dump([$FileInfo], [qw($FileInfo)]);
                
        }
}

$FindClose->Call($handle) or die "FindClose $^E";

Reply via email to