minky arora wrote:
>
> I have a file of the follwoing form
>
>> FFM50HR02GMY4E length=75 xy=2604_3772 region=2 run=R_2008_08_19_08_32_31_
>
> TCAATGGGTCCGACGGAGAAAGCGCGACAGAAAAGCCCTTTTGT
>
> TCGACTAGCGTCGTG
>
>> FFM50HR02F5QTS length=59 xy=2408_2686 region=2 run=R_2008_08_19_08
Hello,
Use bioperl (http://www.bioperl.org/wiki/Main_Page) for this task.
This should do what you want:
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
my $fastaFile = 'myfile';
my $pattern = 'CTTGGCGAGAAGGGCCGCTACCTGCTGGCCGCCTCCTTCGGCAACGT';
my $blockThreshold = '500';
my $numFasta