Thanks Timothy. I tried the code you supplied and unfortunately the output file is still empty. Do you think there might be a problem with the regular expression in:
if($find =~ /^>probe:\w+:(\w+):/) ? Mike -----Original Message----- From: Timothy Johnson [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 13, 2006 6:59 PM To: Michael Oldham; [EMAIL PROTECTED] Org Subject: RE: A loop to parse a large text file--output is empty! One problem is that you are using the $_ variable twice. "while(<FILE>)" assigns $_ to the current line being read, and "foreach(@array)" assigns $_ to the current element of the array in question. It's usually a good idea to be more explicit anyway, and keep the $_ usage to a minimum so you don't have to worry about this kind of thing. Also, I'm not sure what you're trying to accomplish by this line: print OUT scalar(<PROBES>); As far as I can see, you're grabbing the next line, assigning it to $_ (maybe), and printing it out in scalar context. I'm assuming that you actually wanted to print the line you read instead, so that's what I did. Try this and see if it is closer to what you want: ################### #!/usr/bin/perl -w use strict; my $IDs = 'ID.txt'; unless (open(IDFILE, $IDs)) { print "Could not open file $IDs!\n"; } my $probes = 'HG_U95Av2_probe_fasta.txt'; unless (open(PROBES, $probes)) { print "Could not open file $probes!\n"; } open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; my @ID = <IDFILE>; chomp @ID; # vvvvvvvvvvvvvvvvvvv while (my $line = <PROBES>) { # vvvvvvvv foreach my $find(@ID) { if($find =~ /^>probe:\w+:(\w+):/) { print OUT $find."\n"; # VVVVVV print OUT $line."\n"; } } } exit; ########################### -----Original Message----- From: Michael Oldham [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 13, 2006 6:42 PM To: [EMAIL PROTECTED] Org Subject: A loop to parse a large text file--output is empty! Dear all, I am a Perl newbie struggling to accomplish a conceptually simple bioinformatics task. I have a single large file containing about 200,000 DNA probe sequences (from an Affymetrix microarray), each of which is accompanied by a header, like so (this is in FASTA format): >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense; TGGCTCCTGCTGAGGTCCCCTTTCC >probe:HG_U95Av2:107_at:543:519; Interrogation_Position=258; Antisense; CTACTCTCGTGGTGCACAAGGAGTG >probe:HG_U95Av2:1156_at:528:483; Interrogation_Position=2054; Antisense; TGCAGGTGGCAGATCTGCAGTCCAT >probe:HG_U95Av2:1102_s_at:541:589; Interrogation_Position=4316; Antisense; GTGAAGGTTGCTGAGGCTCTGACCC .........etc. What I would like to do is extract from this file a subset of ~130,800 probes (both the header and the sequence) and output this subset into a new text file in the same (FASTA) format. These 130,800 probes correspond to 8,175 probe set IDs ("1138_at" is an example of a probe set ID in the header listed above). I have these 8,175 IDs listed in a separate file called "ID.txt" and the 200,000 probe sequences in a file called "HG_U95Av2_probe_fasta.txt". The script below is missing something because the output file ("probe_subset.txt") is blank. This is also the case if I replace the file "ID.txt" with a file consisting of a single probe set ID (e.g. 1138_at). Does anyone know what I am missing? I am running this script in Cygwin on Windows XP. I appreciate any suggestions! ~ Mike O. #!/usr/bin/perl -w use strict; my $IDs = 'ID.txt'; unless (open(IDFILE, $IDs)) { print "Could not open file $IDs!\n"; } my $probes = 'HG_U95Av2_probe_fasta.txt'; unless (open(PROBES, $probes)) { print "Could not open file $probes!\n"; } open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; my @ID = <IDFILE>; chomp @ID; while (<PROBES>) { foreach (@ID) { if(/^>probe:\w+:(\w+):/) { print OUT; print OUT scalar(<PROBES>); } } } exit; -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response> -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>