Howdy, scripting with perl is a hobby and not a vocation so i apologize in advance for rough looking code.
I have a very large list of 16-letter words called "hashsequence16.txt". This file is 203MB in size. I have a large list of data called "newrawdata.txt". This file is 95MB. For each 16-letter word, I am looping through "newrawdata.txt" to 1) find a match and 2) take the the full line of rawdata.txt and associate that with the 16-letter word. Using a filesize line-counter and timing how long it takes to process my data lets me know that I have 9534 hours to see if I can find an alternative solution. It's pretty brute force but I don't know if there is another way to do it. Any comments or guidance would be greatly appreciated. Thanks, Dan ========================================== print "**fisher**"; $flatfile = "newrawdata.txt"; # 95MB in size $datafile = "hashsequence16.txt"; # 203MB in size my $filesize = -s "hashsequence16.txt"; # for use in processing time calculation open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n"; open(FILE2, "$datafile") || die "Can't open '$flatfile': $!\n"; open (SEQFILE, ">fishersearch.txt") || die "Can't open '$seqparsed': $! \n"; @preparse = <FILE>; @hashdata = <FILE2>; close(FILE); close(FILE2); for my $list1 (@hashdata) { # iterating through hash16 data $finish++; if ($finish ==10 ) { # line counter $marker = $marker + $finish; $finish =0; $left = $filesize - $marker; printf "$left\/$filesize\n"; # this prints every 17 seconds } ($line, $freq) = split(/\t/, $list1); for my $rawdata (@preparse) { # iterating through rawdata $rawdata=~ s/\n//; if ($rawdata =~ m/$line/) { # matching hash16 word with rawdata line my $first_pos = index $rawdata,$line; print SEQFILE "$first_pos\t$rawdata\n"; # printing to info to new file } } print SEQFILE "PROCESS\t$line\n"; # printing hash16 word and "process" } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/