--- Hans Holtan <[EMAIL PROTECTED]> wrote: > I am having a problem searching a large (600 mb) text file. What I > need to do is find a match with a short bit of text and then look up > to 200 characters forwards and backwards for other matches to > different short bits of text. I tried reading the file to memory > first and then doing the search, but it's a serious hog, and I need > to leave a lot of memory open for other operations. Does anyone have > suggestions on how I can do this while limiting memory usage, speed > is a factor but not paramount. > Thanks, > Hans > --
Hans, This was such a fun little problem that I went ahead and wrote the program for you. You may have to modify this to fit your needs. I've also "over commented" it to give you some pointers, in case you're not too familiar with Perl. Here's the basic idea: 1. Read from a file and search for target text 2. If we're at the target text, find out where we are in the file 3. From current location, set start and end positions in file to mark needed text. Test to ensure we haven't gone beyond beginning or end of file. 4. Grab text from start to end and push onto array. The only caveat I can think of is this: if you are grabbing too many chunks of data, you may wish to process them individually rather than pushing them onto an array (since you may have memory issues). Enjoy! #!/usr/bin/perl -w use strict; use Data::Dumper; # this is how far forward or back you need to read my $width = 20; # this is your target string. You can make it a regex if you prefer my $target = 'search'; # file to search my $file = 'test.txt'; my $fsize = -s $file; # when you're done, this should contain the data you're looking for my @chunks; open FILE, "< $file" or die "Cannot open $file for reading: $!"; while (<FILE>) { if ( /$target/g ) { my $file_position = tell FILE; # backwards from end of string my $word_position = $file_position - (length( $_ ) - pos( $_ )); # to beginning of word. It's separate so you can # pull it out if necessary. $word_position -= length $target; push @chunks, get_chunk( \*FILE, $word_position, $file_position, $width, $fsize ); } } print Dumper \@chunks; close FILE; sub get_chunk { my ( $fh, $word_position, $file_position, $width, $fsize ) = @_; # don't try to read before beginning of file my $start = $word_position >= $width ? $word_position - $width : 0; # don't try to read after end of file my $end = $word_position + $width <= $fsize ? $word_position + $width : $fsize; # position to start of where we want to read seek $fh, $start, 0; my $chunk; # shouldn't fail unless I got my boundaries wrong read ( $fh, $chunk, $end-$start ) or die "Problem reading file: $!"; # put us back to where we were seek $fh, $file_position, 0; return $chunk; } Cheers, Curtis "Ovid" Poe ===== "Ovid" on http://www.perlmonks.org/ Someone asked me how to count to 10 in Perl: push@A,$_ for reverse q.e...q.n.;for(@A){$_=unpack(q|c|,$_);@a=split//; shift@a;shift@a if $a[$[]eq$[;$_=join q||,@a};print $_,$/for reverse @A __________________________________________________ Do You Yahoo!? Yahoo! Greetings - Send FREE e-cards for every occasion! http://greetings.yahoo.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]