Hi Paul,

Look at the syntax for prosite patterns. It will
do what you want.  There are probably
better syntax rule descriptions / tutorial, but 
here is a quick link...
http://ca.expasy.org/tools/scnpsit3.html#patsyntax 

E.g. you can search with a pattern like
'D-x(2,4)-D-[N,Q]'

Regards,
Mitch

 

-----Original Message-----
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of P Hubbard
Sent: Friday, August 17, 2007 9:14 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Data mining using consensus sequences

Hi all,

I'm certain this can be done, but I figure it would be much easier asking 
the news group than spending hours trawling though random web pages:

Does anyone know how to search protein sequence databases using a consensus 
sequence which allows varying gap sizes, and homologous amino acids. For 
example, say I have a consensus sequence DXXDN - can I scan databases which 
will pull out sequences containing, for example, EGGGGDQ?

A quick comment on the structure validation thread... Are bioscientists as 
critical of mistakes in protein and genome databases as the crystallography 
community seems to be of the PDB? Are mistakes in these databases more or 
less common that the PDB? Also, do they require you to submit your raw data 
(i.e., sequencing chromatograms, etc)? I feel that a simple improvement in 
the current validation tools which would flag unusual data submissions based 
on various parameters, and require another expert to go over just these 
structures to confirm them, would be sufficient.

Thanks,

AGS

_________________________________________________________________
Find a local pizza place, movie theater, and more....then map the best route! 
http://maps.live.com/default.aspx?v=2&ss=yp.bars~yp.pizza~yp.movie%20theater&cp=42.358996~-71.056691&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=950607&encType=1&FORM=MGAC01

Reply via email to