Hi, I downloades 2.2 beta, just to be sure I have the same version as you specify. (The file names are no longer funny.) Anyway, it does not seem to do as you said:
In [14]: import SE In [15]: SE.version -------> SE.version() Out[15]: 'SE 2.2 beta - SEL 2.2 beta' In [16]: HTM_Decoder = SE.SE ('HTM2ISO.se') In [17]: test_string = ''' ....: ø=(xf8) # 248 f8 ....: ù=(xf9) # 249 f9 ....: ú=(xfa) # 250 fa ....: û=(xfb) # 251 fb ....: ü=(xfc) # 252 fc ....: ý=(xfd) # 253 fd ....: þ=(xfe) # 254 fe ....: é=(xe9) ....: ê=(xea) ....: ë=(xeb) ....: ì=(xec) ....: í=(xed) ....: î=(xee) ....: ï=(xef) ....: ''' In [18]: print HTM_Decoder (test_string) ø=(xf8) # 248 f8 ù=(xf9) # 249 f9 ú=(xfa) # 250 fa û=(xfb) # 251 fb ü=(xfc) # 252 fc ý=(xfd) # 253 fd þ=(xfe) # 254 fe é=(xe9) ê=(xea) ë=(xeb) ì=(xec) í=(xed) î=(xee) ï=(xef) In [19]: Thanks, Ray Frederic Rentsch wrote: > Rares Vernica wrote: >> Hi, >> >> How can I unescape HTML entities like " "? >> >> I know about xml.sax.saxutils.unescape() but it only deals with "&", >> "<", and ">". >> >> Also, I know about htmlentitydefs.entitydefs, but not only this >> dictionary is the opposite of what I need, it does not have " ". >> >> It has to be in python 2.4. >> >> Thanks a lot, >> Ray >> > One way is this: > > >>> import SE # > Download from http://cheeseshop.python.org/pypi/SE/2.2%20beta > >>> SE.SE ('HTM2ISO.se')('input_file_name', 'output_file_name') # > HTM2ISO.se is included > 'output_file_name' > > For repeated translations the SE object would be assigned to a variable: > > >>> HTM_Decoder = SE.SE ('HTM2ISO.se') > > SE objects take and return strings as well as file names which is useful > for translating string variables, doing line-by-line translations and > for interactive development or verification. A simple way to check a > substitution set is to use its definitions as test data. The following > is a section of the definition file HTM2ISO.se: > > test_string = ''' > ø=(xf8) # 248 f8 > ù=(xf9) # 249 f9 > ú=(xfa) # 250 fa > û=(xfb) # 251 fb > ü=(xfc) # 252 fc > ý=(xfd) # 253 fd > þ=(xfe) # 254 fe > é=(xe9) > ê=(xea) > ë=(xeb) > ì=(xec) > í=(xed) > î=(xee) > ï=(xef) > ''' > > >>> print HTM_Decoder (test_string) > > ø=(xf8) # 248 f8 > ù=(xf9) # 249 f9 > ú=(xfa) # 250 fa > û=(xfb) # 251 fb > ü=(xfc) # 252 fc > ý=(xfd) # 253 fd > þ=(xfe) # 254 fe > é=(xe9) > ê=(xea) > ë=(xeb) > ì=(xec) > í=(xed) > î=(xee) > ï=(xef) > > Another feature of SE is modularity. > > >>> strip_tags = ''' > ~<(.|\x0a)*?>~=(9) # one tag to one tab > ~<!--(.|\x0a)*?-->~=(9) # one comment to one tab > | # run > "~\x0a[ \x09\x0d\x0a]*~=(x0a)" # delete empty lines > ~\t+~=(32) # one or more tabs to one space > ~\x20\t+~=(32) # one space and one or more tabs to > one space > ~\t+\x20~=(32) # one or more tab and one space to > one space > ''' > > >>> HTM_Stripper_Decoder = SE.SE (strip_tags + ' HTM2ISO.se ') # > Order doesn't matter > > If you write 'strip_tags' to a file, say 'STRIP_TAGS.se' you'd name it > together with HTM2ISO.se: > > >>> HTM_Stripper_Decoder = SE.SE ('STRIP_TAGS.se HTM2ISO.se') # > Order doesn't matter > > Or, if you have two SE objects, one for stripping tags and one for > decoding the ampersands, you can nest them like this: > > >>> test_string = "<p class=MsoNormal > style='line-height:110%'><i>René</i> est un garçon qui > paraît plus âgé. </p>" > > >>> print Tag_Stripper (HTM_Decoder (test_string)) > René est un garçon qui paraît plus âgé. > > Nesting works with file names too, because file names are returned: > > >>> Tag_Stripper (HTM_Decoder ('input_file_name'), 'output_file_name') > 'output_file_name' > > > Frederic > > > -- http://mail.python.org/mailman/listinfo/python-list