Still futzing around with email and character sets.
Under Encode and perluniintro there's mention of
octet \x{..} (255 chars up to \xff
string some internal representation
code point \x{...} 1, 2 or more bytes of data
But I'm not sure about the order of things.
So I'll try this:
I have a MIME messsage part like the following:
Content-Type: text/plain;
charset="BIG5"
Content-Transfer-Encoding: base64
1eLKx9K7t+JIVE1MuPHKvdDFvP6joQ0KCqFYoVihWKFYoVihWKFYoVihWKFYoVihWKFYoVihWKFY
oVihWKFYoVihWKFYoVihWKFYoVihWKFYoVihWAqhaapgt06haqRXrbGquoVvpfOBWK5lyU+lSKRV
pOWmcsbTi9ehQ6W7hLCl84Wyra2kX6ZYqmulzrN+IQqGR4VvpfOl0aFtVm9sbGV5bWFpbIVvpfO4
c4T6g/2uYaFuhLCl84T6sGWhRrNRykmkzYVUg2+zzIetrmAKqrqFb6XzuHOE+oSwpfOm06Zoprit
bqhEr3240aFJhGOnS4VkpFWGXqFBxtOtrYO6hX2oz6XOoUMKhkixoYhbhKGD9Kfag6iquqVEg6Sh
R2h0dHA6Ly93d3cuY255c29mdC5jb20v
MIME::Base64 has a function
my $decoded = decode_base64($DATA);
that returns really wonderful crud to my screen. But I can't regex it.
I think it returns octects. At least that what MIME::Base64 says.
So I should be able to do
my $base64 = join('',<DATA>);
my $octets = decode_base64($base64);
my $utf8 = decode('Big5',$octets);
and from there I can use something like /(\w+)/ on it.
(But IIRC /[\w]+/ will act weird).
printing it out require 'binmode(...)' but I can do stuff internally to the
program.
Which is all good. And I guess it's progress.
But can I expect to ALWAYS find a charset declaration on the Content-Type line
if it isn't just ascii? (There is sometimes a content-type in the header which
I assume applies to all)
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/