On Wednesday 22 July 2015 16:10:18 Shlomi Fish wrote: > Hi Gary, > > some comments about your code. > > On Wed, 22 Jul 2015 15:32:33 +0100 > > Gary Stainburn <gary.stainb...@ringways.co.uk> wrote: > > I've written the code below to parse a number of text page files > > generated by Tesseract OCR software to look for a guess the most likely > > values for VIN, Reg number and stock number for a vehicle. > > > > On my development site it works fine and gives the values it should. > > However, on my live server it only returns '1' as the value. > > > > The only thing I can think of is that my live server is much older than > > my development server. My perl versions are: > > > > Live v5.10.0 > > Devel v5.18.4 > > version 5.10.0 of perl is really old and you should definitely upgrade. > Even 5.18.4 is somewhat old and was end-of-lifed. That put aside. >
That is the most recent version of perl installed on my Fedora boxes using yum update, which I try to avoid messing with. I will at some point be building a new server, but that will be long and painful as the version changes for Postgresql, Perl and PHP will all be significant and need extensive testing with my services. > > Am I right, and what can I do to get the code to work on my live server > > (other than upgrage perl) > > > > > > #!/usr/bin/perl -w > > Don't use the -w flag: > > http://perl-begin.org/tutorials/bad-elements/#the-dash-w-flag I've removed that. I'm surprised by this though, as when I used to write Perl regularly, I'm sure that was the norm. As you may tell, I haven't written perl for a while. > > > # searches a series of OCR generated text files - one per page > > # looks for sets of regex's for field contents and stores in arrays > > > > use warnings; > > use strict; > > That's good. > > > use Data::Dumper; > > > > my %searches=('stock'=>[qr/\b([NU][LD] *\d{5})\b/], > > 'regno'=>[qr/\b([A-Za-z]{2}\d{2}[A-Za-z]{3})\b/], > > 'vin'=>[qr/\b(WF[0O]XX[A-Z]{6}\d{5}\b)/i,qr/\b([A-Z] > > {6}\d{5}\b)/i]); > > The regex here got broken into two, possibly by your mailer. I hope it's OK > in the original code. Yes, this was a split by the email program. The original is all on one line > > > my %found; > > my %values; > > foreach my $file (<*.txt>) { > > 1. Don't call variables "file": > http://perl-begin.org/tutorials/bad-elements/#calling-variables-file Fair enough, but that isn't the problem > > 2. It would be safer to do: > > my @fns = glob('*.txt'); I can see how this is tidier code. > > foreach my $filename (@fns) { > > > print "file.....$file\n"; > > if (!open FH,$file ) { > > print "file open failed: $!\n"; > > next; > > } > > 1. Maybe it's just me but I feel an indentation step of 2 spaces is too > hard to read. I recommend increasing it. It's just personal preference. I've found that using 2 spaces makes my code easier to read when you have 3 or more indents, loops, if's, etc. Using Tab was a farse as the code ended up starting half way across the screen. > > 2. See http://perl-begin.org/tutorials/bad-elements/#open-function-style > > > my $content = do { local $/; <FH> }; > > close(FH); > > You really should extract it into its own slurp function or use a slurp > module such as https://metacpan.org/pod/File::Slurper . I made the choice to not rely on modules to make the code portable - or so I thought. > > > # print "*****$content*********\n"; placing a # in column 1 to remove debug code etc. makes it easier to search in gvim. Again, this isn't the issue > > You shouldn't put the comments right at the start of the line - indent them > with the rest of the code. > > > foreach my $field (keys %searches) { # foreach field > > What does the comment add? > The improvement in variable names made the comment redundant > > #####THe following line is the one with the problem > > if (my @matches = $content =~ @{$searches{$field}}) { > > What do you want to achieve when using an array at the right hand side of > =~ ? Please use any/all/map/grep/none/notall from either perlfunc or > https://metacpan.org/pod/List::Util / > https://metacpan.org/pod/List::MoreUtils > This statement was supposed to return an array of matches found by applying multiple regex's to the string $content. This is where the problems appear. On the devel box this works, whereas on the older box it returns '1' which I'm guessing is just 'true' > > foreach (@matches) { > > Don't iterate using $_ - use a lexical iterator: > http://perl-begin.org/tutorials/bad-elements/#overuse_dollar_underscore > > > $_=~s/ //g; > > print STDERR "match found - '$field': '$_'\n"; > > if ($found{$field}{$_}) { > > $found{$field}{$_}++; > > } else { > > $found{$field}{$_}=1; > > } > > This can be shortened to « $found{$field}{$_}=1; » - without the if/else. > See autovivification. That would not give me a tally which is what I'm after. I've changed it to just have: $found{$field}{$_}++; which works in perl. I wrote it long hand initialy, because in PHP which I use most, that throws an exception. > > > } > > } > > } > > > > } # foreach page > > > > > > foreach my $field (keys %found) { # foreach field > > my $value=''; > > my $count=0; > > foreach my $key (keys %{$found{$field}}) { # foreach field -> value > > $value=$key if ($found{$field}{$key} > $count); > > } > > shouldn't you use "first" from List::Util here? How would this select the key in %found{$field} with the highest tally? -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/