nac wrote:
HI,

Hello,

I need your wisdom on this parsing script. I have a fastq file,this
contains info for reads ( from nextGen), 1 line starts with a @, second
contain the sequence info from which I want to count pattern, third line
with a sign, fourth with info about the sequence quality ( see attached
working example).

  I have created an array containing my patterns,
@sequences=qw{^TGGCAGTGGAGG ^TGTCTGGCAGTG ^TG....GCAGTG TCTGTCTG TCTGGCAG
GCAGTGGA TGTCTGGC ^TGTCTGGC ^..TCTGGCAGTG ^TGTCTGGCAGTG ^TGCATGGC}. Some
patterns have to be at the beginning of the sequence, some not.
I try to use the grep function to loop through a list in order to test if
the sequence match the elements from the list. then i use a hashe to count.
In the end I create an output file which contain the first line of the
fastq as keys and not at all the elements from my @sequences (class counted
attached).
I would appreciate any pointers on this,
many thanks
Nat


#!/usr/bin/perl
use strict;
use warnings;


my @sequences;
@sequences=qw{^TGGCAGTGGAGG ^TGTCTGGCAGTG ^TG....GCAGTG TCTGTCTG TCTGGCAG
GCAGTGGA TGTCTGGC ^TGTCTGGC ^..TCTGGCAGTG ^TGTCTGGCAGTG ^TGCATGGC};

You don't have to declare and define a variable in two separate steps:

my @sequences = qw{
    ^TGGCAGTGGAGG ^TGTCTGGCAGTG ^TG....GCAGTG TCTGTCTG
    TCTGGCAG      GCAGTGGA      TGTCTGGC      ^TGTCTGGC
    ^..TCTGGCAGTG ^TGTCTGGCAGTG ^TGCATGGC
    };

And since the contents of @sequences are to be used as regular expressions it is probably better to compile them here with the qr// operator:

my @sequences = map qr/$_/, qw{
    ^TGGCAGTGGAGG ^TGTCTGGCAGTG ^TG....GCAGTG TCTGTCTG
    TCTGGCAG      GCAGTGGA      TGTCTGGC      ^TGTCTGGC
    ^..TCTGGCAGTG ^TGTCTGGCAGTG ^TGCATGGC
    };


my %final_hash;

while ( <IN> ) {

You don't open the IN filehandle anywhere so this won't work.


    if ( /^\@/ ) {
        my $seq = <IN>; get the sequences
        chomp;
        if ( grep { $seq } @sequences ) {# I want to test if $seq contain 
anything

$seq will always be TRUE so that is the same as saying:

          if ( @sequences ) {

And @sequences will always be TRUE so that is the same as saying:

          if ( 1 ) {

Which makes the whole if block superfluous.


that will match with any of the element from @sequences, this is where it
goes wrong I think.
            if ( !exists $final_hash{ $_ } ) {
                $final_hash{ $_ } = 1;
            } else {
                $final_hash{ $_ }++;
            }

Perl has a thing called autovivification which means that you don't need to care if the key doesn't exist because perl will automatically create it for you. So you can remove that if test and just do:

              $final_hash{ $_ }++;


        }
    }
}
for my $key ( sort { $final_hash{ $b } <=> $final_hash{ $a } } keys %final_hash 
) {
    my $value = $final_hash{ $key };
    print OUT $key, "\t", $value, "\n";

You don't open the OUT filehandle anywhere so this won't work.


}



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.                   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to