"Michael S. Robeson II" wrote: > Hi I am all still to new to PERL and I am having trouble playing with > formatting my data into a new format. So here is my problem: > > I have data (DNA sequence) in a file that looks like this: > > #### > # Infile > #### > >bob > AGTGATGCCGACG > >fred > ACGCATATCGCAT > >jon > CAGTACGATTTATC
Good we can see the input structure here. What jumps out at me is that the input file comes in pairs of lines. You will want to structure your input routine to read and handle the lines by the pair, then. > > > and I need it converted to: > > #### > # Outfile > #### > R 1 20 > > A G U G A T G C C G A C G - - - - - - - bob > A C G C A U A U C G C A U - - - - - - - fred > C A G U A C G A U U U A U C - - - - - - jon > [snip-a picture is worth athousands woprds, and you showed us the picture above.] Well we have a fairly simple problem here, I'd say: Greetings! E:\d_drive\perlStuff\giffy>perl -w my $sequence_length = 20; my $line = <DATA>; chomp $line; while ($line) { my $sequence_tag = trim_line($line); $line = <DATA>; chomp $line; my @nucleotides = split //, $line; push @nucleotides, '_' for (1..($sequence_length - @nucleotides)); print join(' ', @nucleotides), " $sequence_tag\n"; $line = <DATA>; chomp $line; } sub trim_line { my $in_line = shift; $in_line =~ s/^ >//; chomp $in_line; return $in_line; } __DATA__ >bob AGTGATGCCGACG A G T G A T G C C G A C G _ _ _ _ _ _ _ bob >fred ACGCATATCGCAT A C G C A T A T C G C A T _ _ _ _ _ _ _ fred >jon CAGTACGATTTATC C A G T A C G A T T T A T C _ _ _ _ _ _ jon or, better yet... Greetings! E:\d_drive\perlStuff\giffy>perl -w my $sequence_length = 20; my $line = <DATA>; chomp $line; while ($line) { my $sequence_tag = trim_line($line); $line = <DATA>; chomp $line; $line = print_underscore_padded($line, $sequence_length, $sequence_tag); } sub trim_line { my $in_line = shift; $in_line =~ s/^ >//; chomp $in_line; return $in_line; } sub print_underscore_padded { my ($line, $sequence_length, $sequence_tag) = @_; my @nucleotides = split //, $line; push @nucleotides, '_' for (1..($sequence_length - @nucleotides)); print join(' ', @nucleotides), " $sequence_tag\n"; $line = <DATA>; chomp $line; return $line; } __DATA__ >bob AGTGATGCCGACG A G T G A T G C C G A C G _ _ _ _ _ _ _ bob >fred ACGCATATCGCAT A C G C A T A T C G C A T _ _ _ _ _ _ _ fred >jon CAGTACGATTTATC C A G T A C G A T T T A T C _ _ _ _ _ _ jon Does that help? Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>