Hi Gary, some comments about your code.
On Wed, 22 Jul 2015 15:32:33 +0100 Gary Stainburn <gary.stainb...@ringways.co.uk> wrote: > I've written the code below to parse a number of text page files generated by > Tesseract OCR software to look for a guess the most likely values for VIN, > Reg number and stock number for a vehicle. > > On my development site it works fine and gives the values it should. > However, on my live server it only returns '1' as the value. > > The only thing I can think of is that my live server is much older than my > development server. My perl versions are: > > Live v5.10.0 > Devel v5.18.4 version 5.10.0 of perl is really old and you should definitely upgrade. Even 5.18.4 is somewhat old and was end-of-lifed. That put aside. > > Am I right, and what can I do to get the code to work on my live server > (other than upgrage perl) > > > #!/usr/bin/perl -w Don't use the -w flag: http://perl-begin.org/tutorials/bad-elements/#the-dash-w-flag > > # searches a series of OCR generated text files - one per page > # looks for sets of regex's for field contents and stores in arrays > > use warnings; > use strict; That's good. > use Data::Dumper; > > my %searches=('stock'=>[qr/\b([NU][LD] *\d{5})\b/], > 'regno'=>[qr/\b([A-Za-z]{2}\d{2}[A-Za-z]{3})\b/], > 'vin'=>[qr/\b(WF[0O]XX[A-Z]{6}\d{5}\b)/i,qr/\b([A-Z] > {6}\d{5}\b)/i]); The regex here got broken into two, possibly by your mailer. I hope it's OK in the original code. > my %found; > my %values; > foreach my $file (<*.txt>) { 1. Don't call variables "file": http://perl-begin.org/tutorials/bad-elements/#calling-variables-file 2. It would be safer to do: my @fns = glob('*.txt'); foreach my $filename (@fns) { > print "file.....$file\n"; > if (!open FH,$file ) { > print "file open failed: $!\n"; > next; > } 1. Maybe it's just me but I feel an indentation step of 2 spaces is too hard to read. I recommend increasing it. 2. See http://perl-begin.org/tutorials/bad-elements/#open-function-style > my $content = do { local $/; <FH> }; > close(FH); You really should extract it into its own slurp function or use a slurp module such as https://metacpan.org/pod/File::Slurper . > # print "*****$content*********\n"; > You shouldn't put the comments right at the start of the line - indent them with the rest of the code. > foreach my $field (keys %searches) { # foreach field What does the comment add? > #####THe following line is the one with the problem > if (my @matches = $content =~ @{$searches{$field}}) { What do you want to achieve when using an array at the right hand side of =~ ? Please use any/all/map/grep/none/notall from either perlfunc or https://metacpan.org/pod/List::Util / https://metacpan.org/pod/List::MoreUtils > foreach (@matches) { Don't iterate using $_ - use a lexical iterator: http://perl-begin.org/tutorials/bad-elements/#overuse_dollar_underscore > $_=~s/ //g; > print STDERR "match found - '$field': '$_'\n"; > if ($found{$field}{$_}) { > $found{$field}{$_}++; > } else { > $found{$field}{$_}=1; > } This can be shortened to « $found{$field}{$_}=1; » - without the if/else. See autovivification. > } > } > } > > } # foreach page > > > foreach my $field (keys %found) { # foreach field > my $value=''; > my $count=0; > foreach my $key (keys %{$found{$field}}) { # foreach field -> value > $value=$key if ($found{$field}{$key} > $count); > } shouldn't you use "first" from List::Util here? Regards, Shlomi Fish -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Parody of "The Fountainhead" - http://shlom.in/towtf SMG: It was 1997-1998ish, Buffy started airing. So one day a group of yeshivah pupils arrived to the studios saying they have some numereological insights from the Jewish bible, about what will happen in Sunnydale next. — http://www.shlomifish.org/humour/Summerschool-at-the-NSA/ Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/