Praedor Atrebates wrote:
I am dinking around with perl for bioinformatics purposes. I have written a small perl script that reads FASTA formatted sequence files and searches the sequence therein for user-entered sequences. This is primarily targetted at protein sequence analysis and for my purposes, I've included - or am trying to include - one hard-coded sequence string.
I have this (pertinent) code in my script:

$dnakmotif  = '[KRH][L{3,}V{3,}I{3,}F{3,}Y{3,}A{3,}][KRH];

print "Enter the motif/amino acid sequence pattern to search for:\n";
$motif = <STDIN>;
chomp $motif;

if ($motif =~ 'dnak') {
   $motif = $dnakmotif;
}

Thus, if I enter the string "dnak" I want my query to be set to the value of $dnakmotif.

It sound like you could use a hash:

my %sequences = (
    dnak => qr/pattern1/,
    dank => qr/pattern2/,
    nakd => qr/pattern3/,
    );

if ( exists $sequences{ $motif } ) {
    # do something with pattern in $sequences{ $motif }
    }

I am just learning as I go here but there are two problems with this, one of which I understand but the other I do not. First, the one I do not understand. My intent with the value set to $dnakmotif was to search for K or R or H in a sequence followed by a string of 3 or more of any of the contents of the second bracket pair (L V I F Y A). When I run the program and run a search for "dnak" I get a string of hits in my test sequence they don't match what I am after. I get a series of hits, for instance, of a K followed by ONE A or ONE V followed by an H instead of at LEAST 3 of any of L or V or I, etc. Why doesn't this work?

Anything inside the [] brackets is a character class so [L{3,}V{3,}I{3,}F{3,}Y{3,}A{3,}] says match ONE of either 'L', 'V', 'I', 'F', 'Y', 'A', '{', '}', '3' or ',' and since duplicates are ignored it could be written as [AFILVY{}3,].

The next problem is one I understand but have no idea how to correct. The value I set to $dnakmotif is too restrictive for the actual searches I need to do. What I want is to search for a sequence/character string with any of K or R or H on either end, but _between them_ any combination of L, V, I, F, A, or Y is OK, in repeats or all individually so long as the minimum number is 3 and the max number (any combination of the characters) is _no more_ than 5. How do I make this character search be much less restrictive than I've started out with?

It SOUNDS like you want: /[KRH][LVIFYA]{3,5}[KRH]/



John
--
use Perl;
program
fulfillment

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to