Rares Vernica wrote: > Hi, > > I downloades 2.2 beta, just to be sure I have the same version as you > specify. (The file names are no longer funny.) Anyway, it does not seem > to do as you said: > > In [14]: import SE > > In [15]: SE.version > -------> SE.version() > Out[15]: 'SE 2.2 beta - SEL 2.2 beta' > > In [16]: HTM_Decoder = SE.SE ('HTM2ISO.se') > > In [17]: test_string = ''' > ....: ø=(xf8) # 248 f8 > ....: ù=(xf9) # 249 f9 > ....: ú=(xfa) # 250 fa > ....: û=(xfb) # 251 fb > ....: ü=(xfc) # 252 fc > ....: ý=(xfd) # 253 fd > ....: þ=(xfe) # 254 fe > ....: é=(xe9) > ....: ê=(xea) > ....: ë=(xeb) > ....: ì=(xec) > ....: í=(xed) > ....: î=(xee) > ....: ï=(xef) > ....: ''' > > In [18]: print HTM_Decoder (test_string) > > ø=(xf8) # 248 f8 > ù=(xf9) # 249 f9 > ú=(xfa) # 250 fa > û=(xfb) # 251 fb > ü=(xfc) # 252 fc > ý=(xfd) # 253 fd > þ=(xfe) # 254 fe > é=(xe9) > ê=(xea) > ë=(xeb) > ì=(xec) > í=(xed) > î=(xee) > ï=(xef) > > > In [19]: > > Thanks, > Ray > > > > Frederic Rentsch wrote: > >> Rares Vernica wrote: >> >>> Hi, >>> >>> How can I unescape HTML entities like " "? >>> >>> I know about xml.sax.saxutils.unescape() but it only deals with "&", >>> "<", and ">". >>> >>> Also, I know about htmlentitydefs.entitydefs, but not only this >>> dictionary is the opposite of what I need, it does not have " ". >>> >>> It has to be in python 2.4. >>> >>> Thanks a lot, >>> Ray >>> >>> >> One way is this: >> >> >>> import SE # >> Download from http://cheeseshop.python.org/pypi/SE/2.2%20beta >> >>> SE.SE ('HTM2ISO.se')('input_file_name', 'output_file_name') # >> HTM2ISO.se is included >> 'output_file_name' >> >> For repeated translations the SE object would be assigned to a variable: >> >> >>> HTM_Decoder = SE.SE ('HTM2ISO.se') >> >> SE objects take and return strings as well as file names which is useful >> for translating string variables, doing line-by-line translations and >> for interactive development or verification. A simple way to check a >> substitution set is to use its definitions as test data. The following >> is a section of the definition file HTM2ISO.se: >> >> test_string = ''' >> ø=(xf8) # 248 f8 >> ù=(xf9) # 249 f9 >> ú=(xfa) # 250 fa >> û=(xfb) # 251 fb >> ü=(xfc) # 252 fc >> ý=(xfd) # 253 fd >> þ=(xfe) # 254 fe >> é=(xe9) >> ê=(xea) >> ë=(xeb) >> ì=(xec) >> í=(xed) >> î=(xee) >> ï=(xef) >> ''' >> >> >>> print HTM_Decoder (test_string) >> >> ø=(xf8) # 248 f8 >> ù=(xf9) # 249 f9 >> ú=(xfa) # 250 fa >> û=(xfb) # 251 fb >> ü=(xfc) # 252 fc >> ý=(xfd) # 253 fd >> þ=(xfe) # 254 fe >> é=(xe9) >> ê=(xea) >> ë=(xeb) >> ì=(xec) >> í=(xed) >> î=(xee) >> ï=(xef) >> >> Another feature of SE is modularity. >> >> >>> strip_tags = ''' >> ~<(.|\x0a)*?>~=(9) # one tag to one tab >> ~<!--(.|\x0a)*?-->~=(9) # one comment to one tab >> | # run >> "~\x0a[ \x09\x0d\x0a]*~=(x0a)" # delete empty lines >> ~\t+~=(32) # one or more tabs to one space >> ~\x20\t+~=(32) # one space and one or more tabs to >> one space >> ~\t+\x20~=(32) # one or more tab and one space to >> one space >> ''' >> >> >>> HTM_Stripper_Decoder = SE.SE (strip_tags + ' HTM2ISO.se ') # >> Order doesn't matter >> >> If you write 'strip_tags' to a file, say 'STRIP_TAGS.se' you'd name it >> together with HTM2ISO.se: >> >> >>> HTM_Stripper_Decoder = SE.SE ('STRIP_TAGS.se HTM2ISO.se') # >> Order doesn't matter >> >> Or, if you have two SE objects, one for stripping tags and one for >> decoding the ampersands, you can nest them like this: >> >> >>> test_string = "<p class=MsoNormal >> style='line-height:110%'><i>René</i> est un garçon qui >> paraît plus âgé. </p>" >> >> >>> print Tag_Stripper (HTM_Decoder (test_string)) >> René est un garçon qui paraît plus âgé. >> >> Nesting works with file names too, because file names are returned: >> >> >>> Tag_Stripper (HTM_Decoder ('input_file_name'), 'output_file_name') >> 'output_file_name' >> >> >> Frederic >> >> >> >> > >
Ray, I am sorry you're having a problem. I cannot duplicate it. It works fine here. I suspect that SE.SE doesn't find your file HTM2ISO.SE. Do this: >>> HTM_Decoder = SE.SE ('HTM2ISO.SE') >>> HTM_Decoder.show_log () Thu Nov 02 15:15:39 2006 - Compiler - Ignoring single word 'HTM2ISO.SE'. Not an existing file 'HTM2ISO.SE'. If you see this, then you might have forgotten to include the path with the file name. Rather than getting an old version, you could just have renamed the to py-files. Version 2.3 has some minor bugs corrected. I fixed the names and tried to re-upload to the Cheese Shop and the damn thing stubbornly refuses the upload after having required that I delete the file I was going to replacing. So it isn't there anymore and the replacement isn't there yet. I'll be working on this. In the meantime I'll be happy to direct-mail V2.3 by request. Frederic -- http://mail.python.org/mailman/listinfo/python-list