On Sunday 03 April 2005 15:59, Angus Leeming wrote: > I've written a python script (attached) to convert lib/CREDITS to a > web page. See > http://www.devel.lyx.org/~leeming/www-user/about/credits.php > > The script is currently rather clunky because my python skills aren't > so hot. Creation: > > $ recode ISO-8859-1..H4 < CREDITS > tmp > $ python phpcredits.py tmp > credits.php > $ scp credits.php \ > [EMAIL PROTECTED]:public_html/www-user/about/.
Cool. :-) > Questions for our resident python gurus: I don't fit the bill but I will try my best. :-) > * Is there a python iconv or recode module? It would be nice to do > this in one step only. Python has unicode support ever since version 2.0 Attached follows my patch to your python script, those work for me. I just tried to make it work, not more not less. ;-) FWIW I have used the old fashion method of searching with google: http://www.google.com/search?hl=en&ie=ISO-8859-1&q=python+unicode&btnG=Google+Search http://www.reportlab.com/i18n/python_unicode_tutorial.html http://www.jorendorff.com/articles/unicode/python.html > * If not, then how do I modify the script to take its input from STDIN > so I can use it as > > $ recode ISO-8859-1..H4 < CREDITS | ./phpcredits.py > credits.php Not necessary as I have shown but why not to use the gnu convention (?) and pass "-" as the argument when you want to say read from the standard input: if credits_file != "-": credits = open(credits_file) else: credits = sys.stdin > The code to parse the contents of the CREDITS file is also rather > crappy. Is there a python parser module? Again using google ;-) I found this http://python.prokmu.com/topics/parsing.html I don't think there is any generic parser in the standard library, although there are specific parser for xml, html and others... > Angus (learning new tricks) > > ps, I did this originally as a sed script. If you want to see some > crazy code, then have a look at phpcredits.sh. (Also attached.) No thanks, I pass. ;-) -- Josà AbÃlio
--- phpcredits.py.old 2005-04-04 12:31:00.000000000 +0100 +++ phpcredits.py 2005-04-04 12:58:34.000000000 +0100 @@ -19,7 +19,8 @@ ''' -import re, string, sys +import re, sys +import codecs class Contributer: def __init__(self): @@ -71,12 +72,12 @@ def __repr__(self): result = as_desriptive_list(self.contributers) + '\n\n' + \ as_paragraph(self.post) - return result + return result.encode('utf-8') def read(self, credits_file): - credits = open(credits_file, 'r') + credits = codecs.open(credits_file, 'r', 'latin1') name_re = re.compile("^ [EMAIL PROTECTED](.*)") contact_re = re.compile("^ [EMAIL PROTECTED](.*)") @@ -127,7 +128,7 @@ name = match.group(1) body = match.group(2) contact = name + ' () ' + re.sub(r'\.', r' ! ', body) - contact = str.lower(contact) + contact = contact.lower() contributer.contact = '<' + contact + '>'