Martin Evans wrote: > Sorry, yet another REGEX question. I've been struggling with trying to get > a regular expression to do the following example in Python: > > Search and replace all instances of "sleeping" with "dead". > > This parrot is sleeping. Really, it is sleeping. > to > This parrot is dead. Really, it is dead. > > > But not if part of a link or inside a link: > > This parrot <a href="sleeping.htm" target="new">is sleeping</a>. Really, it > is sleeping. > to > This parrot <a href="sleeping.htm" target="new">is sleeping</a>. Really, it > is dead. > > > This is the full extent of the "html" that would be seen in the text, the > rest of the page has already been processed. Luckily I can rely on the > formating always being consistent with the above example (the url will > normally by much longer in reality though). There may though be more than > one link present. > > I'm hoping to use this to implement the automatic addition of links to other > areas of a website based on keywords found in the text. > > I'm guessing this is a bit too much to ask for regex. If this is the case, > I'll add some more manual Python parsing to the string, but was hoping to > use it to learn more about regex. > > Any pointers would be appreciated. > > Martin
What you want is: re.sub(regex, replacement, instring) The replacement can be a function. So use a function. def sleeping_to_dead(inmatch): instr = inmatch.group(0) if needsfixing(instr): return instr.replace('sleeping','dead') else: return instr as for the regex, something like (<a)?[^<>]*(</a>)? could be a start. It is probaly better to use the regex to recognize the links as you might have something like sleeping.sleeping/sleeping/sleeping.html in your urls. Also you probably want to do many fixes, so you can put them all within the same replacer function. -- http://mail.python.org/mailman/listinfo/python-list