> So why won't a multi-line match work with Unicode?
This from perldoc perlunicode probably applies:
WARNING: As of the 5.6.1 release, the implementation of Unicode
support in Perl is incomplete, and continues to be highly experimental.
Tried installing 5.8 but it did not go well at all on XP, gave up.
Now I have a different question:
How can I find out what encoding a string contains?
The same operation on two different files results in two types of output:
D:\Perl\scratch\a.pl c:\windows\system\c*.dll
c:\windows\system\crlds3d.dll
∟♦4 V S _ V E R S I O N _ I N F O ╜♦∩■ ☺ ♀ ♦ ╥☺ ♀ ♦
╥☺ ? ♦ ☺ |♥ ☺ S t r i n g F i l e I n f o X♥
☺ C o m p a n y N a m e S e n s a u r a L t d ` ∟ ☺ F i l e D e s c r
i p t i o n S e n s a u r a 3 D d r
☺ F i l e V e r s i o n 4 . 1 2 . 0 1 . 2 0 0 2 8 ♀ ☺ I n t e r n a l N
a m e c r l d s 3 d . d l l ` ▲ ☺ L e g
a l C o p y r i g h t ⌐ C o p y r i g h t 2 0 0 0 S e n s a u r a L t d
t & ☺ L e g a l T r a d e m a r k s S
e n s a u r a , M a c r o F X , Z o o m F X , M u l t i d r i v e @ ♀ ☺ O
r i g i n a l F i l e n a m e c r l d s
☺ P r o d u c t V e r s i o n 4 . 1 2 . 0 1 . 2 0 0 2 ☺ S p e c i a l B
u i l d D ☺ V a r F i l e I n f o
$ ♦ T r a n s l a t i o n ♦░♦FE2X
c:\windows\system\commdlg.dll
∞☺4 VS_VERSION_INFO ╜♦∩■ ☺
♥ g
♥ g ?
☺ ☺ ☻ Ç☺ StringFileInfo l☺ 040904E4 ' ↨
CompanyName Microsoft Corporation . → FileDescription Common
Dialogs libraries ▬ ♠ FileVersion 3.10 ↔ InternalName COMMDLG
; ' LegalCopyright Copyright ⌐ Microsof
t Corp. 1981-1996 $ ♀ OriginalFilename COMMDLG.DLL 9 ) ProductName Microsoft«
Windows(TM) Operating System → ♠ Produ
ctVersion 3.10 ¶ ♦ WOW Version 4.0 $ VarFileInfo ¶ ♦ Translation ♦Σ♦
Similar if you just open the files in notepad. The files were compiled differently?
Doing unpack h* on each of those, the first one contains sets of '00's (on my screen).
The difference should be easy to deal with if I can find a way for the script to know
when the data is one way or the other. Web pages describe a ton of information about
the topic but I didn't find anything on how to be able to tell which. Apparently most
of the time... you know already.
After the unpack both can contain '00's, one is just a regular pattern, would rather
not go looking for it.
Is there an elegant way to detect encoding type?
Gary
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]