On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee <xah...@gmail.com> wrote: > So, a solution by regex is out.
Actually, none of the complications you listed appear to exclude regexes. Here's a possible (untested) solution: <div class="img"> ((?:\s*<img src="[^.]+\.(?:jpg|png|gif)" alt="[^"]+" width="[0-9]+" height="[0-9]+">)+) \s*<p class="cpt">((?:[^<]|<(?!/p>))+)</p> \s*</div> and corresponding replacement string: <figure> \1 <figcaption>\2</figcaption> </figure> I don't know what dialect Emacs uses for regexes; the above is the Python re dialect. I assume it is translatable. If not, then the above should at least work with other editors, such as Komodo's "Find/Replace in Files" command. I kept the line breaks here for readability, but for completeness they should be stripped out of the final regex. The possibility of nested HTML in the caption is allowed for by using a negative look-ahead assertion to accept any tag except a closing </p>. It would break if you had nested <p> tags, but then that would be invalid html anyway. Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list