Re: script to parse OCR output different between servers

Gary Stainburn Wed, 22 Jul 2015 12:03:07 -0700

Below is my revised code based on your comments.  It is tidier but more 
importantly it works correctly.  Ironically, it didn't actually work 
correctly before on my dev machine either,– it didn't find all matches.


It looks like using my original code it was  only using the first element in 
each array.  Using the map syntax you provided  it is now finding matches on 
the second regex for the vin field.

Thank you for your help

Gary


#!/usr/bin/perl 

# searches a series of OCR generated text files - one per page
# looks for sets of regex's for field contents and stores in arrays

use warnings;
use strict;

my %searches=('stock'=>[qr/\b([NU][LD] *\d{5})\b/],
              'regno'=>[qr/\b([A-Za-z]{2}\d{2}[A-Za-z]{3})\b/],
              'vin'=>[qr/\b(WF[0O]XX[A-Z]{6}\d{5}\b)/i,
                      qr/\b([A-Z]{6}\d{5}\b)/i]);
my %found;
my %values;
foreach my $fn (glob("*.txt")) {
  print "file.....$fn\n";
  my $FH;
  if (!open $FH,"<",$fn ) {
    print "file open failed: $!\n";
    next;
  }
  my $content = slurp($FH);
  close(FH);

  foreach my $field (keys %searches) {
    if (my @matches = map { $content =~ $_ } @{$searches{$field}}) {
      foreach (@matches) {
        $_=~s/ //g; # remove spaces
        print "match found - '$field': '$_'\n";
          $found{$field}{$_}++;
      }
    }
  }
  
} # foreach page

foreach my $field (keys %found) { # foreach field
  my $value='';
  my $count=0;
  foreach my $key (keys %{$found{$field}}) { # foreach field -> value
    # if current key's tally is > the previous, store it
    $value=$key if ($found{$field}{$key} > $count);
  }
  print "field='$field' value='$value'\n";
  $values{$field}=$value;
}


sub slurp {
  my ($fh)=@_;
  local $/; 
  return <$fh>;
}
      

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: script to parse OCR output different between servers

Reply via email to