this is pretty abstract from what i'm really doing, but i'll put in
the blurb anyway...

i'm using CAM::PDF to parse data. this works well. however, i am
trying to extract information out of a table. so, i want to find
certain keys, then go up 3 levels and grab the position of that peace
of data so that i can then search for text with that same horizontal
or vertical position (defined in two other refs on the same level of a
different 'branch'). so, what i want is an array ref with the number
of found keys with the refs it took to get there. i've liked
Data::Walk but i couldn't figure out how to return the results - all i
could do is 'print'. and i couldn't figure out how to use
Data::Visitor either. so, i figured i'd try myself.

a few things i don't understand here:
does the second print cut off?
when i pass $tree instead of \$tree, it gives me this:
CAM::PDF::Content=HASH(0x2b298a8) is CAM::PDF::Content LEVEL: 0 FOUND:
0 CLEAR: 0
where CAM::PDF::Content isn't going to match anything. how do i handle this?
obviously i'm not thinking clearly about how i need to build
$datdepth. and ideally, i'd like the value to be internal to the sub
and returned when it is finished instead of global. any help with my
thinking would be great.

so, here's where i'm at:


my $pdf = CAM::PDF->new('file.pdf');
my $pages = $pdf->numPages;

print "PAGES: $pages\n";

my $datdepth;

sub datarec {
   my( $data, $regex, $lvl, $found, $clear ) = @_;
   $lvl ||= 0;
   $found ||= 0;
   $clear ||= 0;
   return -1 unless( $data );

print "$data is ", ref( $data ), " LEVEL: $lvl FOUND: $found CLEAR: $clear\n";
   if( ( ref( $data ) or $data ) eq 'SCALAR' ) {
      if( $data =~ /$regex/ ) {
         $datdepth->[ $found + 1 ] = $datdepth->[ $found ];
         $datdepth->[ $found++ ]->[ $lvl ] = $data;
         $clear = 0;
      } else {
         return $clear = 1;
      }
   } elsif( ( ref( $data ) or $data ) eq 'HASH' ) {
      foreach( values %$data ) {
         $datdepth->[ $found ]->[ $lvl ] = $data;
         $clear = datarec( $_, $regex, $lvl + 1, $found );
      }
   } elsif( ( ref( $data ) or $data ) eq 'ARRAY' ) {
      foreach( @$data ) {
         $datdepth->[ $found ]->[ $lvl ] = $data;
         ( $found, $clear ) = datarec( $_, $regex, $lvl + 1, $found );
      }
   } elsif( ( ref( $data ) or $data ) eq 'REF' ) {
      $datdepth->[ $found ]->[ $lvl ] = $data;
      ( $found, $clear ) = datarec( $_, $regex, $lvl + 1, $found );
   } elsif( ( ref( $data ) or $data ) ne ( 'SCALAR' or 'HASH' or 'ARRAY' ) ) {
      print "$data is ", ref( $data ), "\n";
   } else {
      print "SOMETHING HORRIBLE!\n";
   }

   if( $clear and $clear == 1 ) {
      foreach my $i ( $lvl .. $#{ $datdepth->[ $found ] } ) {
         undef( $datdepth->[ $found ]->[ $i ] );
      }
   }
}


for( 1 .. $pages ) {

   my $cur = $_;
   my $i = 0;

print "PAGE: $cur\n" if( $cur == 8 );

   if( $cur == 8 ) {
      my $tree = $pdf->getPageContentTree( $cur );
#$tree->render("CAM::PDF::Renderer::Dump");

   datarec( \$tree, qr/10\. / );

#print Dumper $datdepth->[ $#$datdepth ]->[ $#{ $datdepth->[
$#$datdepth ] } ], "\n";


print "DONE\n";
   }
}


__DATA__

PAGES: 32
PAGE: 8
REF(0x2ef4bf8) is REF LEVEL: 0 FOUND: 0 CLEAR: 0
8 is  LEVEL: 1 FOUND: 0 CLEAR: 0
8 is
DONE

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to