On 01/12/2007 11:29 AM, Beginner wrote:
On 12 Jan 2007 at 17:06, Rob Dixon wrote:

Hi Rob,

In the sample date below your'll see some addresses with "DO NOT..." in. I can locate them easily enough but I am struggling to navigate back up the DOM to access the code so I can record the code with faulty addresses.

Here my effort. Can anyone help me either to move backup up to the right element node or catch the code node before I begin to loop through the address line(s).

TIA,
Dp.


======= My Effort ==========
#!/usr/bin/perl


my $file = 'ADDRESS.XML';
open(FH,$file) or die "Can't open file $file: $!\n";

my $parser = XML::LibXML->new;
my $doc = $parser->parse_fh(\*FH);

my @codes = $doc->findnodes('//code');
my @lines = $doc->findnodes('//lines');

for (my $i = 0; $i < $#codes; ++$i) {
        #print $codes[$i]->string_value, "\t";
        my @add = $lines[$i]->childNodes;
        for ( my $a = 1; $a <$#add; ++$a) {
                if ($add[$a]->string_value =~ /\s+NOT\s+/) {
                        print $codes[$i]->string_value,": ",$add[$a]-
string_value,"\n";
                }
        }

If I understand you correctly then all you need is

my @results = $doc->findnodes('/dataroot/address[contains(lines/line, "DO NOT USE")]');

foreach my $address (@results) {
   my $code = $address->findvalue('code');
   print $code, "\n";
}

which prints the code of all those addresses that have a line containing 'DO NOT USE'. Is that what was required?


Yes ...and no. I guess I want to print out the 'code' for any address so that I can get the data corrected but I guess I would also like to remove those records at the /dataroot/address level so they don't appear in the file.


This works for me:

#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
use Data::Dumper;

my $parser = XML::LibXML->new();
my $doc = $parser->parse_fh(\*DATA);

my %remove;

my @results = $doc->findnodes('//address');
foreach my $address (@results) {
    my @lines = $address->childNodes;
    @lines = grep $_->nodeName eq 'lines', @lines;
    @lines = map $_->childNodes, @lines;
    @lines = grep $_->nodeName eq 'line', @lines;
    foreach my $line (@lines) {
        if ($line->string_value =~ /\bNOT\b/) {
            my $number = $address->getAttribute('number');
            $remove{$number} = $address;
        }
    }
}

print Dumper(\%remove);


i spent a lot of time on this today as this look like a excellent parser and DOM navigator but I struggled moving around.


Yes, the unintuitive behavior of findnodes and find threw me too. It seems that those methods look at every node in the file--not just the children of the current element.

In your example @results looks like it would contain references to all the /lines/line data with DO NOT USE in the string_value. What I have struggling with is that this is also a reference to the record as a whole and my navigation techniques are not working out. For example whenever I used findnodes I was getting every code in the file. I think now that was because I was using /dataroot/address as the starting point.


I know ZIP about XPath (today is my first day dealing with it), but I'm able to get the 1016 address element using this code:

my @results = $doc->findnodes('/dataroot/address[contains(lines/line, "NOT")]');
@results = map $_->getAttribute('number'), @results;

print Dumper([EMAIL PROTECTED]);

-----------------
Unfortunately, it doesn't get the 1333 node. Using findnodes and specifying an XPath statement only seems to pick up the first line in the lines element. Other lines are ignored. If a put a dummy line before the "DO NOT USE" line for 1016, 1016 is no longer recognized.

Perhaps it's possible to create an XPath statement that searches all of the lines in a lines element.

Aside from CPAN, I would appreciate any other sources of info about using the using libXML with perl and xpath expressions. It is whoppingly fast.

Thanx again,
Dp.



XML::Simple isn't so bad :-)

#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;

my $root = XMLin(\*DATA );
my @remove;

foreach my $address (@{$root->{address}}) {
    my $descent = $address->{lines}{line};
    my $lines = ref($descent) ? join("\n",@$descent) : $descent;
    if ($lines =~ /\bNOT\b/) {
        push @remove, $address->{number};
    }
}

print "To remove: @remove\n";

------------
For me, this prints "To remove: 1016 1333."

However, XML::Simple places some constraints on your XML document. Read the POD.




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to