Mike Meyer wrote: > "Iain King" <[EMAIL PROTECTED]> writes: > > > I have some code that converts html into xhtml. For example, convert > > all <i> tags into <em>. Right now I need to do to string.replace calls > > for every tag: > > > > html = html.replace('<i>','<em>') > > html = html.replace('</i>','</em>') > > > > I can change this to a single call to re.sub: > > > > html = re.sub('<([/]*)i>', r'<\1em>', html) > > > > Would this be a quicker/better way of doing it? > > Maybe. You could measure it and see. But neither will work in the face > of attributes or whitespace in the tag. > > If you're going to parse [X]HTML, you really should use tools that are > designed for the job. If you have well-formed HTML, you can use the > htmllib parser in the standard library. If you have the usual crap one > finds on the web, I recommend BeautifulSoup. >
Thanks. My initial post overstates the program a bit - what I actually have is a cgi script which outputs my LIveJournal, which I then server-side include in my home page (so my home page also displays the latest X entries in my livejournal). The only html I need to convert is the stuff that LJ spews out, which, while bad, isn't terrible, and is fairly consistent. The stuff I need to convert is mostly stuff I write myself in journal entries, so it doesn't have to be so comprehensive that I'd need something like BeautifulSoup. I'm not trying to parse it, just clean it up a little. Iain -- http://mail.python.org/mailman/listinfo/python-list