Rob Dixon wrote:
Dave Cardwell wrote:
Rob Dixon wrote:
Dave Cardwell wrote:
Hello there, I'm having trouble constructing a regular expression
that would do the following:
FOO...
...followed by anything but BAR (non-greedy)...
...followed by BAZ (captured)...
...followed by anything but BAR (greedy)...
...followed by BAR
I've been looking at zero-width negative look-ahead, but I haven't
used this area of regular expressions before so I'm struggling. A
solution or prod in the right direction would be lovely.
Please show us the real problem. I know you mean to clarify, but your
summary is so ambiguous that understanding it becomes the most difficult
part of providing a solution.
Thanks,
Rob
I was afraid of that, sorry. I'm using HTML::Parser to scan through a
document, but I need to do one quick manipulation first that depends on
seeing the document as a whole (unlike per-token as with HTML::Parser).
Rather than attempting to fit all of the real work in a regular
expression, I thought it best to simply mark the element with a custom
attribute that HTML::Parser could pick up later.
To that end, I need to find an <a> (BAZ) that contains just plain text,
somewhere between an opening <td> (FOO) and the closest closing </td>
(BAR), ie something along the lines of:
s%
<td([^>]*>
{not </td>}*?
<a[^>]*>[\w\s]+</a>
{not </td>}*?
</td>)
%<td foo="1"$1%gismx;
It's the {not </td>} bits I'm having difficulty with.
OK I see. But I think you should be parsing the HTML instead of trying to
do this sort of stuff with a regex, which is notoriously awkward, mainly
because it doesn't take account of the structure of nested text like HTML
or XML.
I know the HTML::Parser interface isn't the easiest in the world to work
with, but one of it subclasses should do it for you. If the markup was
parsed with HTML::TreeBuilder, for example, I could write:
foreach my $td ($tree->find('td')) {
foreach my $anchor ($td->find('a')) {
my @content = $anchor->content_list;
next if grep ref, @content;
$anchor->attr(myattr => 1);
}
}
which finds all anchor tags which appear anywhere (at any level) within a
table data tag and contain no further HTML markup, and adds an attribute
'myattr' to that anchor with a value of 1. This may or may not be exactly
what you want, but you see the principle. I think an attempt to write a
regular expression to do the job would be problematic, to say the least.
HTH,
Rob
I'll certainly look into TreeBuilder - thank you.
--
Best wishes,
Dave Cardwell.
http://perlprogrammer.co.uk/
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/