Re: A loop to parse a large text file--output is empty!

D. Bolliger Fri, 23 Jun 2006 13:31:16 -0700

Michael Oldham am Freitag, 23. Juni 2006 18:20:
> Hello again,

Hello Michael


> Thanks to everyone for their helpful suggestions.  I finally got it to
> work, using the following script.  However, it takes about 5 hours to
> run on a fast computer.  Using grep (in bash), on the other hand, takes
> about 5 minutes (see below if you are interested).  Thanks again!
>
> SLOW perl script:
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID_all_X';
>
> unless (open(IDFILE, $IDs)) {
>       print "Could not open file $IDs!\n";
>       }
>
> my $probes = 'HG_U95Av2_probe_fasta';
>
> unless (open(PROBES, $probes)) {
>       print "Could not open file $probes!\n";
>       }
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

There are at least two reasons for the slowlyness in the following nested 
loop:

- thousands of regexes applied for each line
- if a line is selected, further regexes are applied, although
  not necessary anymore

A faster strategy would be:

1. create a lookup hash with the IDs of IDFILE (IDs as keys)
2. in the while loop, first extract the string you want to test
   for selection from the line. 
   Use split or a single capturing regex for this.
   perldoc perlre 
   perldoc -f split
3. instead of the foreach loop below, simply use a single test
   if the extracted string is a key in the lookup hash.
   ( print OUT $line if exists $lookup_hash{$extracted_string) )

(sorry, not much time left...)

> my @ID = <IDFILE>;
> print @ID;
> chomp @ID;
>
> while (my $line = <PROBES>) {
>       foreach my $identifier (@ID) {
>               if($line=~/^>probe:\w+:$identifier:/) {
>                               print OUT $line;
>                               print OUT scalar(<PROBES>);
>               }
>       }
> }
> exit;
[...]

Hope this helps
Dani

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: A loop to parse a large text file--output is empty!

Reply via email to