Hi Paul, Look at the syntax for prosite patterns. It will do what you want. There are probably better syntax rule descriptions / tutorial, but here is a quick link... http://ca.expasy.org/tools/scnpsit3.html#patsyntax
E.g. you can search with a pattern like 'D-x(2,4)-D-[N,Q]' Regards, Mitch -----Original Message----- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of P Hubbard Sent: Friday, August 17, 2007 9:14 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] Data mining using consensus sequences Hi all, I'm certain this can be done, but I figure it would be much easier asking the news group than spending hours trawling though random web pages: Does anyone know how to search protein sequence databases using a consensus sequence which allows varying gap sizes, and homologous amino acids. For example, say I have a consensus sequence DXXDN - can I scan databases which will pull out sequences containing, for example, EGGGGDQ? A quick comment on the structure validation thread... Are bioscientists as critical of mistakes in protein and genome databases as the crystallography community seems to be of the PDB? Are mistakes in these databases more or less common that the PDB? Also, do they require you to submit your raw data (i.e., sequencing chromatograms, etc)? I feel that a simple improvement in the current validation tools which would flag unusual data submissions based on various parameters, and require another expert to go over just these structures to confirm them, would be sufficient. Thanks, AGS _________________________________________________________________ Find a local pizza place, movie theater, and more....then map the best route! http://maps.live.com/default.aspx?v=2&ss=yp.bars~yp.pizza~yp.movie%20theater&cp=42.358996~-71.056691&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=950607&encType=1&FORM=MGAC01