Praedor Atrebates wrote:
I am dinking around with perl for bioinformatics purposes. I have written a
small perl script that reads FASTA formatted sequence files and searches the
sequence therein for user-entered sequences.
This is primarily targetted at protein sequence analysis and for my purposes,
I've included - or am trying to include - one hard-coded sequence string.
I have this (pertinent) code in my script:
$dnakmotif = '[KRH][L{3,}V{3,}I{3,}F{3,}Y{3,}A{3,}][KRH];
print "Enter the motif/amino acid sequence pattern to search for:\n";
$motif = <STDIN>;
chomp $motif;
if ($motif =~ 'dnak') {
$motif = $dnakmotif;
}
Thus, if I enter the string "dnak" I want my query to be set to the value of
$dnakmotif.
It sound like you could use a hash:
my %sequences = (
dnak => qr/pattern1/,
dank => qr/pattern2/,
nakd => qr/pattern3/,
);
if ( exists $sequences{ $motif } ) {
# do something with pattern in $sequences{ $motif }
}
I am just learning as I go here but there are two problems with this, one of
which I understand but the other I do not. First, the one I do not
understand. My intent with the value set to $dnakmotif was to search for K
or R or H in a sequence followed by a string of 3 or more of any of the
contents of the second bracket pair (L V I F Y A). When I run the program
and run a search for "dnak" I get a string of hits in my test sequence they
don't match what I am after. I get a series of hits, for instance, of a K
followed by ONE A or ONE V followed by an H instead of at LEAST 3 of any of
L or V or I, etc. Why doesn't this work?
Anything inside the [] brackets is a character class so
[L{3,}V{3,}I{3,}F{3,}Y{3,}A{3,}] says match ONE of either 'L', 'V', 'I', 'F',
'Y', 'A', '{', '}', '3' or ',' and since duplicates are ignored it could be
written as [AFILVY{}3,].
The next problem is one I understand but have no idea how to correct. The
value I set to $dnakmotif is too restrictive for the actual searches I need
to do. What I want is to search for a sequence/character string with any of
K or R or H on either end, but _between them_ any combination of L, V, I, F,
A, or Y is OK, in repeats or all individually so long as the minimum number
is 3 and the max number (any combination of the characters) is _no more_ than
5. How do I make this character search be much less restrictive than I've
started out with?
It SOUNDS like you want: /[KRH][LVIFYA]{3,5}[KRH]/
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>