here is the script. give it a name, say, seqComp.pl. usage: perl seqComp.pl <input_FASTA_file>. HTH, Anjan
#! /usr/bin/perl -w use strict; open (S, "$ARGV[0]") || die "cannot open FASTA file to read: $!"; my %s;# a hash of arrays, to hold each line of sequence my %seq; #a hash to hold the AA sequences. my $key; while (<S>){ #Read the FASTA file. chomp; if (/>/){ s/>//; $key= $_; }else{ push (@{$s{$key}}, $_); } } foreach my $a (keys %s){ my $s= join("", @{$s{$a}}); $seq{$a}=$s; #print("$a\t$s\n"); } my @aa= qw(A R N D C Q E G H I L K M F P S T W Y V); foreach my $k (keys %seq){ my %count; # a hash to hold the count for each amino acid in the protein. my @seq= split(//, $seq{$k}); foreach my $r(@seq){ $count{$r}++; } print("$k\n"); foreach my $a (@aa){ $count{$a}||=0; $count{$a}= sprintf("%0.2f", $count{$a}/length($seq{$k})); print("$a\t$count{$a}\n"); } } On Wed, Dec 1, 2010 at 2:31 PM, Rob Dixon <rob.di...@gmx.com> wrote: > On 01/12/2010 08:44, Changrong Ge wrote: > >> >> I am quite new to this perl language-I am from biochemistry field. >> Now trying to write a script for my current work but could not make >> it. The idea is to calculate the composition (percentage) of amino >> acids in a protein sequence. >> >> Input is a series of fasta format (protein sequence) output is a tab >> delimited format like below: > >> >> Name A T C D N Q E ....... >> protein1 0.23 0.40 0.20 ... >> protein2 0.52 0.01 .... >> protein3 >> ...... >> Could somebody help me with this? I tried reading some books like >> perl for bioinformatics, but still not into it. >> > > Hi Changrong > > The problem doesn't seem difficult, but I'm afraid we don't have much > knowledge of bioinformatics between us. If you post a sample of input > data and the corresponding output you desire then I am sure we can help. > > Regards, > > Rob > > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- =================================== anjan purkayastha, phd. research associate fas center for systems biology, harvard university 52 oxford street cambridge ma 02138 phone-703.740.6939 ===================================