> So why won't a multi-line match work with Unicode?

This from perldoc perlunicode probably applies:

        WARNING:  As of the 5.6.1 release, the implementation of Unicode
        support in Perl is incomplete, and continues to be highly experimental.

Tried installing 5.8 but it did not go well at all on XP, gave up.

Now I have a different question:

        How can I find out what encoding a string contains?

The same operation on two different files results in two types of output:

D:\Perl\scratch\a.pl c:\windows\system\c*.dll

c:\windows\system\crlds3d.dll
∟♦4   V S _ V E R S I O N _ I N F O     ╜♦∩■  ☺ ♀ ♦ ╥☺ ♀ ♦ 
╥☺ ?    ♦   ☺               |♥  ☺ S t r i n g F i l e I n f o   X♥
 ☺ C o m p a n y N a m e     S e n s a u r a   L t d     ` ∟ ☺ F i l e D e s c r 
i p t i o n     S e n s a u r a   3 D   d r
 ☺ F i l e V e r s i o n     4 . 1 2 . 0 1 . 2 0 0 2     8 ♀ ☺ I n t e r n a l N 
a m e   c r l d s 3 d . d l l   ` ▲ ☺ L e g
a l C o p y r i g h t   ⌐   C o p y r i g h t   2 0 0 0   S e n s a u r a   L t d   
t & ☺ L e g a l T r a d e m a r k s     S
 e n s a u r a ,   M a c r o F X ,   Z o o m F X ,   M u l t i d r i v e   @ ♀ ☺ O 
r i g i n a l F i l e n a m e   c r l d s
 ☺ P r o d u c t V e r s i o n   4 . 1 2 . 0 1 . 2 0 0 2         ☺ S p e c i a l B 
u i l d   D   ☺ V a r F i l e I n f o
$ ♦   T r a n s l a t i o n             ♦░♦FE2X

c:\windows\system\commdlg.dll
∞☺4 VS_VERSION_INFO ╜♦∩■  ☺
 ♥ g
 ♥ g   ?
   ☺ ☺ ☻               Ç☺  StringFileInfo  l☺  040904E4    ' ↨ 
CompanyName Microsoft Corporation   . → FileDescription Common
 Dialogs libraries    ▬ ♠ FileVersion 3.10    ↔          InternalName    COMMDLG 
    ; ' LegalCopyright  Copyright ⌐ Microsof
t Corp. 1981-1996   $ ♀ OriginalFilename    COMMDLG.DLL 9 ) ProductName Microsoft« 
Windows(TM) Operating System     → ♠ Produ
ctVersion  3.10    ¶ ♦ WOW Version 4.0 $   VarFileInfo ¶ ♦ Translation  ♦Σ♦

Similar if you just open the files in notepad.  The files were compiled differently?

Doing unpack h* on each of those, the first one contains sets of '00's (on my screen).

The difference should be easy to deal with if I can find a way for the script to know 
when the data is one way or the other.  Web pages describe a ton of information about 
the topic but I didn't find anything on how to be able to tell which.  Apparently most 
of the time... you know already.

After the unpack both can contain '00's, one is just a regular pattern, would rather 
not go looking for it.

Is there an elegant way to detect encoding type?

Gary



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to