Python for Vcard Parsing in UTF16
Greetings - A recent Perl experiment hasn't turned out so well, which has piqued my interest in Python. The project is this: take a Vcard file exported from Apple's Addressbook and use a language that is good at parsing text to convert it into a mutt alias file. There are better ways to use Mutt with Mac's addressbook, but I want to be able to periodically convert my working addressbook file into an alias file I can then transfer across all my different machines - two Macs, two Linux, and one FreeBSD. It's basically a couple of regexes that look for FN: followed by a name and convert all the words of the name into a single structure separated by underscores, followed by the email addresses. You would wind up with alias Linus_Torvalds Linus Torvalds <[EMAIL PROTECTED]> To me this was a natural task for Perl. Turns out however, there's a catch. Apple exports the file in UTF-16 to ensure anyone with Chinese characters in their addressbook gets a legitimate Vcard file. And of course Perl somewhat chokes on UTF. I've found several ways to do it that involve complicated downloads and installations of Perl modules, but that defeats the purpose of making it simple. In an ideal world you should be able to say "try this cool script" and be done with it. Once you have to say "go to CPAN, download and compile this module, then ..." it gets less exciting. I know nothing about Python except that it interests me and has interested me since I first learned the Rekall database frontend (Linux) runs on it. I just ordered Learning Python and if that works out satisfactorily I'm going to go back for Programming Python. In the meantime, I thought I would pose the question to this newsgroup: would Python be useful for a parsing exercise like this one? -- http://mail.python.org/mailman/listinfo/python-list
Re: Python for Vcard Parsing in UTF16
Alex Martelli wrote: > R Wood <[EMAIL PROTECTED]> wrote: >... >> alias Linus_Torvalds Linus Torvalds <[EMAIL PROTECTED]> >> >> To me this was a natural task for Perl. Turns out however, there's a >> catch. Apple exports the file in UTF-16 to ensure anyone with Chinese >> characters in >> their addressbook gets a legitimate Vcard file. And of course Perl >> somewhat >> chokes on UTF. > > Sure, Python and Perl (and Ruby) should be equally suitable for the > task, so, if Python appears more suitable by having built-in unicode > capabilities, go for it. I'm a bit uncertain about the UTF-16 export > though; I know some applications do use it (e.g., Microsoft Entourage), > but I thought Apple's Address Book didn't, and, having just tried a > VCard export from mine, it looks quite ASCII to me. Maybe you've set > some kind of preference, or...? > > > Alex I did the same thing. Apple's clever. If your addressbook doesn't have any higher characters, ie nothing but ASCII, it will export your addressbook in ASCII. But if you have anything else (in my case, Spanish, French, and Italian) it goes for UTF16. I first thought it was UTF8 but realized since Apple supports all sorts of Asian languages really well they need UTF16 to deal with it, and importing the exported file into Jedit using UTF16 encoding confirmed that's what it is. -- http://mail.python.org/mailman/listinfo/python-list