On Jul 5, 12:17 pm, Ian Kelly <ian.g.ke...@gmail.com> wrote: > On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee <xah...@gmail.com> wrote: > > So, a solution by regex is out. > > Actually, none of the complications you listed appear to exclude > regexes. Here's a possible (untested) solution: > > <div class="img"> > ((?:\s*<img src="[^.]+\.(?:jpg|png|gif)" alt="[^"]+" width="[0-9]+" > height="[0-9]+">)+) > \s*<p class="cpt">((?:[^<]|<(?!/p>))+)</p> > \s*</div> > > and corresponding replacement string: > > <figure> > \1 > <figcaption>\2</figcaption> > </figure> > > I don't know what dialect Emacs uses for regexes; the above is the > Python re dialect. I assume it is translatable. If not, then the > above should at least work with other editors, such as Komodo's > "Find/Replace in Files" command. I kept the line breaks here for > readability, but for completeness they should be stripped out of the > final regex. > > The possibility of nested HTML in the caption is allowed for by using > a negative look-ahead assertion to accept any tag except a closing > </p>. It would break if you had nested <p> tags, but then that would > be invalid html anyway. > > Cheers, > Ian
emacs regex supports shygroup (the 「(?:…)」) but it doesn't support the negative assertion 「?!…」 though. but in anycase, i can't see how this part would work <p class="cpt">((?:[^<]|<(?!/p>))+)</p> ? Xah -- http://mail.python.org/mailman/listinfo/python-list