I show in order: the format of the input data, the output of the input data, and finally the working script. I have yet to add comments into the code but I am sure many of you veterans will figure it out.
-Thanks again all! As always comments, questions or suggestions are welcome!
-Mike
-------- INPUT -------- >dog atcg--acgat---act-ca---- >cat acgt-acgt----acgt-gt-agct- >mouse ---acgtacg-atcg---actgac-
------- OUTPUT --------- ***Discovered the following DNA sequences:*** dog found! cat found! mouse found! >>>>>> mouse <<<<<< Indel size: 1 Times found: 2 Positions: 11 25
Indel size: 3 Times found: 2 Positions: 1 16
>>>>>> cat <<<<<< Indel size: 1 Times found: 4 Positions: 5 18 21 26
Indel size: 4 Times found: 1 Positions: 10
>>>>>> dog <<<<<< Indel size: 1 Times found: 1 Positions: 18
Indel size: 2 Times found: 1 Positions: 5
Indel size: 3 Times found: 1 Positions: 12
Indel size: 4 Times found: 1 Positions: 21
------- Script -------
#!usr/bin/perl
# By Michael S. Robeson II, with the help of friends at lernperl.org and bioperl.org! :-)
# 10/16/2004
use warnings; use strict;
############################### # Open Sequence Data & OUTFILE ###############################
print "Enter in the name of the DNA sequence file:\n"; chomp (my $dna_seq = <STDIN>);
open(DNA_SEQ, $dna_seq) or die "Can't open file: $!\n";
open(OUTFILE, ">indel_list_"."$dna_seq") or die "Can't open outfile: $!\n";
############################ # Read sequence data into a hash ############################
my %sequences;
$/ = '>';
print "\n***Discovered the following DNA sequences:***\n";
while ( <DNA_SEQ> ) { chomp; next unless s/^\s*(.+)//; my $name = $1; s/\s//g; $sequences{$name} = $_; print "$name found!\n"; } close DNA_SEQ;
###################################### # iterate over gaps and write to file ######################################
foreach (keys %sequences) { print "\t\t\>\>\>\>\>\> $_ \<\<\<\<\<\<\n"; print OUTFILE "\>\>\>\>\>\> $_ \<\<\<\<\<\<\n"; my $dna = $sequences{$_}; my %gap_data; my %position; while ($dna =~ /(\-+)/g) { my $gap_pos = pos ($dna) - length($&) + 1; my $gap_length = length $1; #$1 =~ tr/\-+// $gap_data{$gap_length}++; $position{$gap_length} .= "$gap_pos"." \n"; } my @indels = keys (%gap_data); my @keys = sort { $a <=> $b} @indels; foreach my $key (@keys) { print "Indel size:\t$key\tTimes found:\t$gap_data{$key}\n"; print OUTFILE "Indel size:\t$key\tTimes found:\t$gap_data{$key}\n"; print "Positions:\n"; print OUTFILE "Positions:\n"; print "$position{$key}"; print OUTFILE "$position{$key}"; print "\n"; print OUTFILE "\n"; } # Can replace the last "foreach loop" above with the while loop # below to do the same thing. Only Gap sizes will not be sorted. # nor is it set up to print to a file # # while (my ($key, $vlaue) = each (%gap_data)) { # print "Indel size:\t$key\tTimes found:\t$gap_data{$key}\n"; # print "Positions:\n"; # print "$position{$key}"; # print "\n\n"; # }
}
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>