Updated to fix memory problem, you have to purge.   Takes over 30
minutes for 120K records.

I am sure that  the whole process can be done better with a good
understanding of the module.  Will benchmark XML::Rules though.

On Tue, 2008-03-18 at 00:55 +1100, Ken Foskey wrote:
> I am extracting addresses from an XML file to process through other
> programs using pipe delimiter the following code works but this is going
> to get 130,000 records through it it must be very efficient and I cannot
> follow the documentation on the best way to do this.
> 
> After this simple one is programmed I have to change a much more complex
> version of this program.
> 
> #!/usr/bin/perl -w
> # vi:set sw=4 ts=4 et cin:
> # $Id:$
> 
> =head1 SYNOPSIS
> 
> Extract addresses from an XML file into pipe delimited file.
> 
>    usage: address_extract.pl  xml_file
> 
> =cut
> 
> use warnings;
> use strict;
> 
> use XML::Twig qw(:strict);
> 
> sub no_pipe
> {
>     my $value = shift;
> 
>     $value =~ s/\|//g;
>     return $value;
> }
> 
> if( ! -f $ARGV[0] ) {
>     print "$ARGV[0] is not a filename, requires filename as first
> parameter!\n";
> }
> 
> my $sort;
> my $sort_file = $ARGV[0].'.unsorted';
> unlink $sort_file; # in case of rerun
> open( $sort, '>', $sort_file  ) 
>     or die "Unable to open $sort_file for output $!";
> 
> my $ref = XML::Twig->new( twig_handlers=>{mem=>\&member} ) 
>     or die "Unable to open $ARGV[0] $!";
> 
> my $member = 0;
> 
> $ref->parsefile( $ARGV[0] );
> 
> sub get_value
> {
>     my ($mem_ref, $key) = @_;
>     my @array = $mem_ref->descendants( $key );
>     return $array[0]->text();
> }
> 
> sub member {
    my ($twig, $mem_ref) = @_;
>     $member++;
> 
>     my $mem_no = get_value( $mem_ref, 'member' );
>     my $add1   = get_value( $mem_ref, 'add1' );
>     my $add2   = get_value( $mem_ref, 'add2' );
>     my $add3   = get_value( $mem_ref, 'add3' );
>     my $suburb = get_value( $mem_ref, 'suburb' );
>     my $state  = get_value( $mem_ref, 'state' );
>     my $pcode  = get_value( $mem_ref, 'pcode' );
> 
>     print $sort join( '|', $member,
>                      $mem_no,
>                      no_pipe( $add1 ),
>                      no_pipe( $add2 ),
>                      no_pipe( $add3 ),
>                      no_pipe( $suburb),
>                      $state,
>                      $pcode,
>                     ) ."\n";
        $twig->purge;

>     return 1;
> }
> 
-- 
Ken Foskey
FOSS developer


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to