[EMAIL PROTECTED] wrote: >is this the best/fastest way to search through 800,000 hl7 files?
Nope :) >for each file i am grepping for 6 names... thus each file is >scanned/grepped 6 times over. Basically i am searching for 1 name in 4 1/2 >million files. Even though the server is fast, it is still processing on >average 2 files per second. > >here is my script... any thoughts would be appreciated as we have a tight >schedule. > > >cd /backup/Loaders/Ld21/HL7FILES >#for file in `find * -print` >for file in `ls` >do > while read name > do > search=`cut -d "|" -f 20 < $file | grep $name` {extracts name from >field 20 of hl7 file} > if [ "$search" > /dev/null ] > then > dir=`pwd` > echo "$name -> $dir/$file" >> /home/zane/found.list {adds found > names > and relevant filenames} > fi > done < /home/zane/scripts/filelist {file containing the six names to >search for} >done This is certainly going to be slow. I'll offer you two solutions, one in shell script, one in Perl. Shell script (requires bash-2.00 or above due to using $() instead of backquotes; replace $(...$(...)...) by `...\`...\`...` if you have an older version): cd /backup/Loaders/Ld21/HL7FILES pattern=$(cat /home/zane/scripts/filelist) for file in `find . -type f -print`; do cut -d'|' -f20 $file | egrep $pattern | sort | uniq \ | sed "s/$/ -> $file/g" done > /home/zane/found.list What I do here is build up a single egrep pattern beforehand which matches any one of the six names. You then only need to run egrep over each file once, which will be faster. You'll need to make filelist read something like: John Smith|Joe Brown|Colin Watson|... Perl: ===== cut here ===== #! /usr/bin/perl -w use diagnostics; use strict; my $names = shift; open NAMES, $names or die "Couldn't open name list: $!"; my @names = map { chomp; $_ } <NAMES>; close NAMES; chdir '/backup/Loaders/Ld21/HL7FILES' or die "Couldn't chdir: $!"; opendir HL7FILES, '.' or die "Couldn't open directory: $!"; while (defined(my $file = readdir HL7FILES)) { next unless -f $file; open HL7FILE, $file or die "Couldn't open data file: $!"; my @hl7 = map { chomp; (split /\|/, $_, 20)[19] } <HL7FILE>; foreach my $name (@names) { print "$name -> $file\n" if scalar grep m/\Q$name/, @hl7; } } ===== cut here ===== Call this with ./search.pl namelist, or similar. The list of names should be one per line this time. Caveat: if this is mission-critical, *test these first*. I've only done fairly minimal testing, and these scripts may very well still contain bugs. -- Colin Watson [EMAIL PROTECTED]