here is the script. give it a name, say, seqComp.pl. usage: perl seqComp.pl
<input_FASTA_file>.
HTH,
Anjan

#! /usr/bin/perl -w
use strict;

open (S, "$ARGV[0]") || die "cannot open FASTA file to read: $!";

my %s;# a hash of arrays, to hold each line of sequence
my %seq; #a hash to hold the AA sequences.
my $key;

while (<S>){ #Read the FASTA file.
    chomp;

    if (/>/){
        s/>//;
        $key= $_;
    }else{
        push (@{$s{$key}}, $_);
    }

}

foreach my $a (keys %s){
    my $s= join("", @{$s{$a}});
    $seq{$a}=$s;
    #print("$a\t$s\n");
}

my @aa= qw(A R N D C Q E G H I L K M F P S T W Y V);

foreach my $k (keys %seq){
    my %count; # a hash to hold the count for each amino acid in the
protein.
    my @seq= split(//, $seq{$k});
    foreach my $r(@seq){
        $count{$r}++;
    }
    print("$k\n");
    foreach my $a (@aa){
        $count{$a}||=0;
        $count{$a}= sprintf("%0.2f", $count{$a}/length($seq{$k}));
        print("$a\t$count{$a}\n");
    }

}

On Wed, Dec 1, 2010 at 2:31 PM, Rob Dixon <rob.di...@gmx.com> wrote:

> On 01/12/2010 08:44, Changrong Ge wrote:
>
>>
>> I am quite new to this perl language-I am from biochemistry field.
>> Now trying to write a script for my current work but could not make
>> it. The idea is to calculate the composition (percentage) of amino
>> acids in a protein sequence.
>>
>> Input is a series of fasta format (protein sequence) output is a tab
>> delimited format like below: >
>>
>>     Name            A      T       C        D      N     Q      E .......
>>     protein1     0.23   0.40  0.20 ...
>>     protein2     0.52   0.01   ....
>>     protein3
>>     ......
>> Could somebody help me with this? I tried reading some books like
>> perl for bioinformatics, but still not into it.
>>
>
> Hi Changrong
>
> The problem doesn't seem difficult, but I'm afraid we don't have much
> knowledge of bioinformatics between us. If you post a sample of input
> data and the corresponding output you desire then I am sure we can help.
>
> Regards,
>
> Rob
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


-- 
===================================
anjan purkayastha, phd.
research associate
fas center for systems biology,
harvard university
52 oxford street
cambridge ma 02138
phone-703.740.6939
===================================

Reply via email to