From: "John W. Krahn" <[EMAIL PROTECTED]> > "Perl.Org" wrote: > > > > Can anyone share a script that recurses a filesystem for files > > containing one or more patterns? Seems like it would be easy to > > write but if it's already out there... > > This will probably work: > > #!/usr/bin/perl > use warnings; > use strict; > use File::Find; > > my $dir = shift || '.'; > > $/ = \2_048; # set buffer size to 2,048 bytes, YMMV
Wow > find( sub { > local @ARGV = $File::Find::name; Wow > my $last = ''; > while ( <> ) { > $_ = $last . $_; > if ( /pattern1/ or /pattern2/ or /pattern3/ ) { > print "$ARGV\n"; > close ARGV; > return; > } > $last = $_; > } > }, $dir ); You are trying to scare everyone away, aren't you? ;-) Besides changing $/ globaly is not the best thing to do. It will work in a tiny script, but once you try to use this code in something bigger (or do something more complex with the found files) you are bound to run into problems. I think this would be both safer and more readable: #!/usr/bin/perl use warnings; use strict; use File::Find; my $dir = shift || '.'; find( sub { unless (open my $FH, $_) { print STDERR "Can't open $File::Find::name : $!\n"; return; } my ($chunk, $last) = ('',''); while ( read $FH, $chunk, 2048 ) { $chunk = $last . $chunk; if ( $chunk=~/pattern1/ or $chunk=~/pattern2/ or $chunk=~/pattern3/ ) { print "$File::Find::name\n"; close $FH; return; } $last = $chunk; } } close $FH; }, $dir ); __END__ Actually there might be a problem with that code. After the first iteration $last contains 2048 characters, after the second 4096, ... it keeps growing! So if the file is huge and it doesn't contain any of the patterns you'll end up with the whole file in memory. Twice. If you do want to search through the whole file (but start with the begginning first you might do something like this: #!/usr/bin/perl use warnings; use strict; use File::Find; my $dir = shift || '.'; find( sub { unless (open my $FH, $_) { print STDERR "Can't open $File::Find::name : $!\n"; return; } my ($chunk, $pos) = ('',0); while ( read $FH, $chunk, 2048, $pos ) { $pos+=2048; if ( $chunk=~/pattern1/ or $chunk=~/pattern2/ or $chunk=~/pattern3/ ) { print "$File::Find::name\n"; close $FH; return; } } } close $FH; }, $dir ); __END__ this way we only have the file in memory once and we do not copy it between two variables. There is still a problem with the code, it is possible to get some false positives. Assume one of the patterns ends with a $. That is it is supposed to match at the end of line. But the chunks do not have to end at the end of lines, they may end anywhere. And since $ normaly means either end of line or end of string, the pattern may match at the end of chunk instead of end of line/file. Another possible cause of problem is \b. If the pattern ends by \b it may also match incorrectly at the end of chunk even if the chunk ends in midword. To fix this we need something like this: #!/usr/bin/perl use warnings; use strict; use File::Find; my $dir = shift || '.'; find( sub { unless (open my $FH, $_) { print STDERR "Can't open $File::Find::name : $!\n"; return; } my ($chunk, $pos, $last_match) = ('', 0, 0); while ( read $FH, $chunk, 2048, $pos ) { $pos+=2048; if ( $chunk=~/pattern1/ or $chunk=~/pattern2/ or $chunk=~/pattern3/ ) { if ($-[0] == $pos-1) { # matched at the end of chunk $last_match = 1; } else { print "$File::Find::name\n"; close $FH; return; } } else { $last_match = 0; } } close $FH; print "$File::Find::name\n" if $last_match; # in case the match at the end of chunk was also at the end of file }, $dir ); __END__ I think you'd only get a false positive from this if you used look- aheads. In that case the script would not notice that the match was near the end of a chunk and that the look-ahead matched only thanks to the end of the chunk. If the intent was to search two chunks at once to make sure we do not miss the pattern because it would be found on the crossing of two chunks, we could use something like this: #!/usr/bin/perl use warnings; use strict; use File::Find; my $dir = shift || '.'; find( sub { unless (open my $FH, $_) { print STDERR "Can't open $File::Find::name : $!\n"; return; } my ($chunk, $last_match) = ('', 0); read $FH, $chunk, 2048 or return; if ( $chunk=~/pattern1/ or $chunk=~/pattern2/ or $chunk=~/pattern3/ ) { if ($-[0] == $pos-1) { # matched at the end of chunk $last_match = 1; } else { print "$File::Find::name\n"; close $FH; return; } } while ( read $FH, $chunk, 2048, 2048) { if ( $chunk=~/pattern1/ or $chunk=~/pattern2/ or $chunk=~/pattern3/ ) { if ($-[0] == $pos-1) { # matched at the end of chunk $last_match = 1; } elsif ($+[0] == $pos-1) { # matched at the start of chunk # this is a false match. If it was real # it'd match in the previous iteration. $last_match = 0; } else { print "$File::Find::name\n"; close $FH; return; } } else { $last_match = 0; } $chunk = substr( $chunk, 2048, 2048); } close $FH; print "$File::Find::name\n" if $last_match; # in case the match at the end of chunk was also at the end of file }, $dir ); __END__ Of course this would miss a match that would be longer than two chunks and could miss one that's longer than one chunk. There is also the easiest case, if the patterns do not match newlines we ran read the file line by line, instead of in chunks: #!/usr/bin/perl use warnings; use strict; use File::Find; my $dir = shift || '.'; find( sub { unless (open my $FH, $_) { print STDERR "Can't open $File::Find::name : $!\n"; return; } while ( <$FH> ) { chomp; if ( /pattern1/ or /pattern2/ or /pattern3/ ) { print "$File::Find::name\n"; close $FH; return; } } } close $FH; }, $dir ); __END__ Humpf! Jenda P.S.: All code is untested! ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>