Re: formatting and syntax

R. Joseph Newton Thu, 05 Feb 2004 13:52:06 -0800

"Michael S. Robeson II" wrote:

> Hi I am all still to new to PERL and I am having trouble playing with
> formatting my data into a new format. So here is my problem:
>
> I have data (DNA sequence) in a file that looks like this:
>
> ####
> # Infile
> ####
>  >bob
> AGTGATGCCGACG
>  >fred
> ACGCATATCGCAT
>  >jon
> CAGTACGATTTATC


Good we can see the input structure here.  What jumps out at me is that the
input file comes in pairs of lines.  You will want to structure your input
routine to read and handle the lines by the pair, then.

>
>
> and I need it converted to:
>
> ####
> # Outfile
> ####
> R 1 20
>
>   A G U G A T G C C G A C G - - - - - - -       bob
>   A C G C A U A U C G C A U - - - - - - -       fred
>   C A G U A C G A U U U A U C - - - - - -       jon

>

[snip-a picture is worth  athousands woprds, and you showed us the picture
above.]

Well we have a fairly simple problem here, I'd say:

Greetings! E:\d_drive\perlStuff\giffy>perl -w
my $sequence_length = 20;
my $line = <DATA>;
chomp $line;
while ($line) {
   my $sequence_tag = trim_line($line);
   $line = <DATA>;
   chomp $line;
   my @nucleotides = split //, $line;
   push @nucleotides, '_' for (1..($sequence_length - @nucleotides));
   print join(' ', @nucleotides), "   $sequence_tag\n";
   $line = <DATA>;
   chomp $line;
}

sub trim_line {
  my $in_line = shift;
  $in_line =~ s/^ >//;
  chomp $in_line;
  return $in_line;
}

__DATA__
 >bob
AGTGATGCCGACG
A G T G A T G C C G A C G _ _ _ _ _ _ _   bob
 >fred
ACGCATATCGCAT
A C G C A T A T C G C A T _ _ _ _ _ _ _   fred
 >jon
CAGTACGATTTATC
C A G T A C G A T T T A T C _ _ _ _ _ _   jon

or, better yet...

Greetings! E:\d_drive\perlStuff\giffy>perl -w
my $sequence_length = 20;
my $line = <DATA>;
chomp $line;
while ($line) {
   my $sequence_tag = trim_line($line);
   $line = <DATA>;
   chomp $line;
   $line = print_underscore_padded($line, $sequence_length, $sequence_tag);

}


sub trim_line {
  my $in_line = shift;
  $in_line =~ s/^ >//;
  chomp $in_line;
  return $in_line;
}

sub print_underscore_padded {
   my ($line, $sequence_length, $sequence_tag) = @_;
   my @nucleotides = split //, $line;
   push @nucleotides, '_' for (1..($sequence_length - @nucleotides));
   print join(' ', @nucleotides), "   $sequence_tag\n";
   $line = <DATA>;
   chomp $line;
   return $line;
}

__DATA__
 >bob
AGTGATGCCGACG
A G T G A T G C C G A C G _ _ _ _ _ _ _   bob
 >fred
ACGCATATCGCAT
A C G C A T A T C G C A T _ _ _ _ _ _ _   fred
 >jon
CAGTACGATTTATC
C A G T A C G A T T T A T C _ _ _ _ _ _   jon


Does that help?

Joseph


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: formatting and syntax

Reply via email to