On Sunday 03 April 2005 15:59, Angus Leeming wrote:
> I've written a python script (attached) to convert lib/CREDITS to a
> web page. See
> http://www.devel.lyx.org/~leeming/www-user/about/credits.php
>
> The script is currently rather clunky because my python skills aren't
> so hot. Creation:
>
> $ recode ISO-8859-1..H4 < CREDITS > tmp
> $ python phpcredits.py tmp > credits.php
> $ scp credits.php \
>   [EMAIL PROTECTED]:public_html/www-user/about/.

  Cool. :-)

> Questions for our resident python gurus:

  I don't fit the bill but I will try my best. :-)

> * Is there a python iconv or recode module? It would be nice to do
> this in one step only.

  Python has unicode support ever since version 2.0

  Attached follows my patch to your python script, those work for me. I just 
tried to make it work, not more not less. ;-)

  FWIW I have used the old fashion method of searching with google:
http://www.google.com/search?hl=en&ie=ISO-8859-1&q=python+unicode&btnG=Google+Search
http://www.reportlab.com/i18n/python_unicode_tutorial.html
http://www.jorendorff.com/articles/unicode/python.html

> * If not, then how do I modify the script to take its input from STDIN
> so I can use it as
>
> $ recode ISO-8859-1..H4 < CREDITS | ./phpcredits.py > credits.php

  Not necessary as I have shown but why not to use the gnu convention (?) 
and pass "-" as the argument when you want to say read from the standard 
input:

if credits_file != "-":
    credits = open(credits_file)
else:
    credits = sys.stdin

> The code to parse the contents of the CREDITS file is also rather
> crappy. Is there a python parser module?

  Again using google ;-) I found this
http://python.prokmu.com/topics/parsing.html

  I don't think there is any generic parser in the standard library, 
although there are specific parser for xml, html and others...

> Angus (learning new tricks)
>
> ps, I did this originally as a sed script. If you want to see some
> crazy code, then have a look at phpcredits.sh. (Also attached.)

  No thanks, I pass. ;-)

-- 
Josà AbÃlio
--- phpcredits.py.old	2005-04-04 12:31:00.000000000 +0100
+++ phpcredits.py	2005-04-04 12:58:34.000000000 +0100
@@ -19,7 +19,8 @@
 
 '''
 
-import re, string, sys
+import re, sys
+import codecs
 
 class Contributer:
     def __init__(self):
@@ -71,12 +72,12 @@
     def __repr__(self):
         result = as_desriptive_list(self.contributers) + '\n\n' + \
                  as_paragraph(self.post)
-        return result
+        return result.encode('utf-8')
 
 
     def read(self, credits_file):
 
-        credits = open(credits_file, 'r')
+        credits = codecs.open(credits_file, 'r', 'latin1')
 
         name_re = re.compile("^ [EMAIL PROTECTED](.*)")
         contact_re = re.compile("^ [EMAIL PROTECTED](.*)")
@@ -127,7 +128,7 @@
                         name = match.group(1)
                         body = match.group(2)
                         contact = name + ' () ' + re.sub(r'\.', r' ! ', body)
-                        contact = str.lower(contact)
+                        contact = contact.lower()
 
                     contributer.contact = '&lt;' + contact + '&gt;'
 

Reply via email to