On 01/12/2007 11:29 AM, Beginner wrote:
On 12 Jan 2007 at 17:06, Rob Dixon wrote:
Hi Rob,
In the sample date below your'll see some addresses with "DO NOT..."
in. I can locate them easily enough but I am struggling to navigate
back up the DOM to access the code so I can record the code with
faulty addresses.
Here my effort. Can anyone help me either to move backup up to the
right element node or catch the code node before I begin to loop
through the address line(s).
TIA,
Dp.
======= My Effort ==========
#!/usr/bin/perl
my $file = 'ADDRESS.XML';
open(FH,$file) or die "Can't open file $file: $!\n";
my $parser = XML::LibXML->new;
my $doc = $parser->parse_fh(\*FH);
my @codes = $doc->findnodes('//code');
my @lines = $doc->findnodes('//lines');
for (my $i = 0; $i < $#codes; ++$i) {
#print $codes[$i]->string_value, "\t";
my @add = $lines[$i]->childNodes;
for ( my $a = 1; $a <$#add; ++$a) {
if ($add[$a]->string_value =~ /\s+NOT\s+/) {
print $codes[$i]->string_value,": ",$add[$a]-
string_value,"\n";
}
}
If I understand you correctly then all you need is
my @results = $doc->findnodes('/dataroot/address[contains(lines/line, "DO NOT
USE")]');
foreach my $address (@results) {
my $code = $address->findvalue('code');
print $code, "\n";
}
which prints the code of all those addresses that have a line containing 'DO NOT
USE'. Is that what was required?
Yes ...and no. I guess I want to print out the 'code' for any address
so that I can get the data corrected but I guess I would also like to
remove those records at the /dataroot/address level so they don't
appear in the file.
This works for me:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
use Data::Dumper;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_fh(\*DATA);
my %remove;
my @results = $doc->findnodes('//address');
foreach my $address (@results) {
my @lines = $address->childNodes;
@lines = grep $_->nodeName eq 'lines', @lines;
@lines = map $_->childNodes, @lines;
@lines = grep $_->nodeName eq 'line', @lines;
foreach my $line (@lines) {
if ($line->string_value =~ /\bNOT\b/) {
my $number = $address->getAttribute('number');
$remove{$number} = $address;
}
}
}
print Dumper(\%remove);
i spent a lot of time on this today as this look like a excellent
parser and DOM navigator but I struggled moving around.
Yes, the unintuitive behavior of findnodes and find threw me too. It
seems that those methods look at every node in the file--not just the
children of the current element.
In your example @results looks like it would contain references to
all the /lines/line data with DO NOT USE in the string_value. What I
have struggling with is that this is also a reference to the record
as a whole and my navigation techniques are not working out. For
example whenever I used findnodes I was getting every code in the
file. I think now that was because I was using /dataroot/address as
the starting point.
I know ZIP about XPath (today is my first day dealing with it), but I'm
able to get the 1016 address element using this code:
my @results = $doc->findnodes('/dataroot/address[contains(lines/line,
"NOT")]');
@results = map $_->getAttribute('number'), @results;
print Dumper([EMAIL PROTECTED]);
-----------------
Unfortunately, it doesn't get the 1333 node. Using findnodes and
specifying an XPath statement only seems to pick up the first line in
the lines element. Other lines are ignored. If a put a dummy line before
the "DO NOT USE" line for 1016, 1016 is no longer recognized.
Perhaps it's possible to create an XPath statement that searches all of
the lines in a lines element.
Aside from CPAN, I would appreciate any other sources of info about
using the using libXML with perl and xpath expressions. It is
whoppingly fast.
Thanx again,
Dp.
XML::Simple isn't so bad :-)
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
my $root = XMLin(\*DATA );
my @remove;
foreach my $address (@{$root->{address}}) {
my $descent = $address->{lines}{line};
my $lines = ref($descent) ? join("\n",@$descent) : $descent;
if ($lines =~ /\bNOT\b/) {
push @remove, $address->{number};
}
}
print "To remove: @remove\n";
------------
For me, this prints "To remove: 1016 1333."
However, XML::Simple places some constraints on your XML document. Read
the POD.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/