Hi Gary, see below for my comments.
On Wed, 22 Jul 2015 17:11:29 +0100 Gary Stainburn <gary.stainb...@ringways.co.uk> wrote: > On Wednesday 22 July 2015 16:10:18 Shlomi Fish wrote: > > Hi Gary, > > > > some comments about your code. > > > > On Wed, 22 Jul 2015 15:32:33 +0100 > > > > Gary Stainburn <gary.stainb...@ringways.co.uk> wrote: > > > I've written the code below to parse a number of text page files > > > generated by Tesseract OCR software to look for a guess the most likely > > > values for VIN, Reg number and stock number for a vehicle. > > > > > > On my development site it works fine and gives the values it should. > > > However, on my live server it only returns '1' as the value. > > > > > > The only thing I can think of is that my live server is much older than > > > my development server. My perl versions are: > > > > > > Live v5.10.0 > > > Devel v5.18.4 > > > > version 5.10.0 of perl is really old and you should definitely upgrade. > > Even 5.18.4 is somewhat old and was end-of-lifed. That put aside. > > > > That is the most recent version of perl installed on my Fedora boxes using > yum update, which I try to avoid messing with. > You can try using perlbrew - http://perlbrew.pl/ . > I will at some point be building a new server, but that will be long and > painful as the version changes for Postgresql, Perl and PHP will all be > significant and need extensive testing with my services. Good luck! > > > > Am I right, and what can I do to get the code to work on my live server > > > (other than upgrage perl) > > > > > > > > > #!/usr/bin/perl -w > > > > Don't use the -w flag: > > > > http://perl-begin.org/tutorials/bad-elements/#the-dash-w-flag > > I've removed that. I'm surprised by this though, as when I used to write Perl > regularly, I'm sure that was the norm. As you may tell, I haven't written > perl for a while. > Well, the -w flag has been largely superseded by "use warnings;" for many years now. The state-of-the-art can progress sometimes. > > > > > # searches a series of OCR generated text files - one per page > > > # looks for sets of regex's for field contents and stores in arrays > > > > > > use warnings; > > > use strict; > > > > That's good. > > > > > use Data::Dumper; > > > > > > my %searches=('stock'=>[qr/\b([NU][LD] *\d{5})\b/], > > > 'regno'=>[qr/\b([A-Za-z]{2}\d{2}[A-Za-z]{3})\b/], > > > 'vin'=>[qr/\b(WF[0O]XX[A-Z]{6}\d{5}\b)/i,qr/\b([A-Z] > > > {6}\d{5}\b)/i]); > > > > The regex here got broken into two, possibly by your mailer. I hope it's OK > > in the original code. > > Yes, this was a split by the email program. The original is all on one line > OK, good. > > > > > my %found; > > > my %values; > > > foreach my $file (<*.txt>) { > > > > 1. Don't call variables "file": > > http://perl-begin.org/tutorials/bad-elements/#calling-variables-file > > Fair enough, but that isn't the problem > > > > > 2. It would be safer to do: > > > > my @fns = glob('*.txt'); > > I can see how this is tidier code. > > > > > foreach my $filename (@fns) { > > > > > print "file.....$file\n"; > > > if (!open FH,$file ) { > > > print "file open failed: $!\n"; > > > next; > > > } > > > > 1. Maybe it's just me but I feel an indentation step of 2 spaces is too > > hard to read. I recommend increasing it. > > It's just personal preference. I've found that using 2 spaces makes my code > easier to read when you have 3 or more indents, loops, if's, etc. Using Tab > was a farse as the code ended up starting half way across the screen. > I tend to prefer a 4 spaces indent step which is a good compromise. If you have too many indent step, it is often an indication that you should extract a function or a method. > > > > 2. See http://perl-begin.org/tutorials/bad-elements/#open-function-style > > > > > my $content = do { local $/; <FH> }; > > > close(FH); > > > > You really should extract it into its own slurp function or use a slurp > > module such as https://metacpan.org/pod/File::Slurper . > > I made the choice to not rely on modules to make the code portable - or so I > thought. See http://perl-begin.org/topics/cpan/ - modules won't likely make your code less portable - just somewhat harder to deploy , and that is only if you're not using the right tool. And you can always at least follow my first suggestion and write: sub slurp { . . . } my $content = slurp( ... ); > > > > > # print "*****$content*********\n"; > > placing a # in column 1 to remove debug code etc. makes it easier to search > in gvim. Again, this isn't the issue > > > > > You shouldn't put the comments right at the start of the line - indent them > > with the rest of the code. > > > > > foreach my $field (keys %searches) { # foreach field > > > > What does the comment add? > > > > The improvement in variable names made the comment redundant > Ah. > > > #####THe following line is the one with the problem > > > if (my @matches = $content =~ @{$searches{$field}}) { > > > > What do you want to achieve when using an array at the right hand side of > > =~ ? Please use any/all/map/grep/none/notall from either perlfunc or > > https://metacpan.org/pod/List::Util / > > https://metacpan.org/pod/List::MoreUtils > > > > This statement was supposed to return an array of matches found by applying > multiple regex's to the string $content. This is where the problems appear. > On the devel box this works, whereas on the older box it returns '1' which > I'm guessing is just 'true' > In that case, you should try using map: (untested): if (my @matches = map { $content =~ $_ } @{$searches{$field}}) { Do you also want a /$_/g here? Not sure. > > > foreach (@matches) { > > > > Don't iterate using $_ - use a lexical iterator: > > http://perl-begin.org/tutorials/bad-elements/#overuse_dollar_underscore > > > > > $_=~s/ //g; > > > print STDERR "match found - '$field': '$_'\n"; > > > if ($found{$field}{$_}) { > > > $found{$field}{$_}++; > > > } else { > > > $found{$field}{$_}=1; > > > } > > > > This can be shortened to « $found{$field}{$_}=1; » - without the if/else. > > See autovivification. > > That would not give me a tally which is what I'm after. > I've changed it to just have: > > $found{$field}{$_}++; > Yes, that's what I meant - sorry. It was a thinko on my part. > which works in perl. I wrote it long hand initialy, because in PHP which I > use most, that throws an exception. > Ah. > > > > > } > > > } > > > } > > > > > > } # foreach page > > > > > > > > > foreach my $field (keys %found) { # foreach field > > > my $value=''; > > > my $count=0; > > > foreach my $key (keys %{$found{$field}}) { # foreach field -> value > > > $value=$key if ($found{$field}{$key} > $count); > > > } > > > > shouldn't you use "first" from List::Util here? > > How would this select the key in %found{$field} with the highest tally? > I don't see where you select the highest tally here and you can try using max/max_by . $value is always overrided with $key if it's bigger than zero. Finally note that you should do «my $ff = $found{$field};» and then use %$ff because you're using it several times. Regards, Shlomi Fish -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Original Riddles - http://www.shlomifish.org/puzzles/ In Soviet Russia, joke is tired of you. — apeiron, #perl Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/