Python for Vcard Parsing in UTF16

2007-04-21 Thread R Wood
Greetings -

A recent Perl experiment hasn't turned out so well, which has piqued my
interest in Python.  The project is this: take a Vcard file exported from
Apple's Addressbook and use a language that is good at parsing text to convert
it into a mutt alias file.  There are better ways to use Mutt with Mac's 
addressbook, but I want to be able to periodically convert my working 
addressbook file into an alias file I can then transfer across all my different 
machines - two Macs, two Linux, and one FreeBSD. It's basically a couple of 
regexes that look for FN: followed by a name and convert all the words of the 
name into a single structure separated by underscores, followed by the email 
addresses.  You would wind up with

alias Linus_Torvalds Linus Torvalds <[EMAIL PROTECTED]>

To me this was a natural task for Perl.  Turns out however, there's a catch.  
Apple exports the file in UTF-16 to ensure anyone with Chinese characters in 
their addressbook gets a legitimate Vcard file.  And of course Perl somewhat 
chokes on UTF.  I've found several ways to do it that involve complicated 
downloads and installations of Perl modules, but that defeats the purpose of 
making it simple. In an ideal world you should be able to say "try this cool 
script" and be done with it.  Once you have to say "go to CPAN, download and
compile this module, then ..." it gets less exciting.

I know nothing about Python except that it interests me and has interested me
since I first learned the Rekall database frontend (Linux) runs on it.  I just 
ordered Learning Python and if that works out satisfactorily I'm going to go 
back for Programming Python.  In the meantime, I thought I would pose the 
question to this newsgroup: would Python be useful for a parsing exercise like 
this one?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python for Vcard Parsing in UTF16

2007-04-22 Thread R Wood
Alex Martelli wrote:

> R Wood <[EMAIL PROTECTED]> wrote:
>...
>> alias Linus_Torvalds Linus Torvalds <[EMAIL PROTECTED]>
>> 
>> To me this was a natural task for Perl.  Turns out however, there's a
>> catch. Apple exports the file in UTF-16 to ensure anyone with Chinese
>> characters in
>> their addressbook gets a legitimate Vcard file.  And of course Perl
>> somewhat
>> chokes on UTF.  
> 
> Sure, Python and Perl (and Ruby) should be equally suitable for the
> task, so, if Python appears more suitable by having built-in unicode
> capabilities, go for it.  I'm a bit uncertain about the UTF-16 export
> though; I know some applications do use it (e.g., Microsoft Entourage),
> but I thought Apple's Address Book didn't, and, having just tried a
> VCard export from mine, it looks quite ASCII to me.  Maybe you've set
> some kind of preference, or...?
> 
> 
> Alex

I did the same thing.  Apple's clever.  If your addressbook doesn't have any
higher characters, ie nothing but ASCII, it will export your addressbook in
ASCII.  But if you have anything else (in my case, Spanish, French, and
Italian) it goes for UTF16.  I first thought it was UTF8 but realized since
Apple supports all sorts of Asian languages really well they need UTF16 to
deal with it, and importing the exported file into Jedit using UTF16
encoding confirmed that's what it is.

-- 
http://mail.python.org/mailman/listinfo/python-list