Hi, The up-to-date list of mappings between PDB and sequence database UniProt is available at -
ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/csv/pdb_chain_uniprot.csv This gives mapping between PDB chains and UniProt accession numbers. This will allow you to find all DB entries for a particular UniProt accession number in the PDB. To answer original question about sequence search the following PDBe service - pdbe.org/fasta allows you to set % identity value and perform search against PDB sequences. cheers, Sameer Velankar PDBe > > Hi Ed, If you are looking for a specific protein, why not get all PDB > files with a DBREF record pointing at the uniprot record of the protein > you want? You can do a simple text search in the PDB, e.g. 'MYG_PHYCA'. > Cheers,Robbie > > Date: Fri, 22 Jun 2012 22:39:12 -0400 >> From: epozh...@umaryland.edu >> Subject: Re: [ccp4bb] pdb sequence search >> To: CCP4BB@JISCMAIL.AC.UK >> >> Tim, >> >> >> > I did not understand your objection against solution 1 - is it because >> > it is not automated? You can sort the results by max. Ident so that >> > you can sroll down to the limit you set yourself. >> >> More that it does not generate a list of PDB IDs. What I want to do is >> to find every structure of a particular protein and line them all up. I >> am not saying it's not doable with option 1, it's just not too >> convenient. >> > >> > Why do you think a identity cut-off was a good criterium? I usually >> > cut by E-value because I assume the developers of blast know what they >> > are doing and I have the impression they consider the E-value a better >> > criterium than the max. Ident. >> Because I want all the structures of a particular protein itself, not >> it's homologues. I just went through several cycles of reducing E-value >> down to 1e-100, and I still get one hit included at 88% identity. >> Setting E-value cutoff to 0 doesn't work, it just returns them all. >> Well, thanks to you I now see how to figure out the cutoff - the results >> are sorted by E-values and list them, so I can just go to the first >> non-identical hit and use a slightly smaller number. It's just that >> sequence identity is easier for me to interpret and it's (emotionally) >> easier to select a cutoff at, say, no more than 5 mutations rather than >> E-value of 10e-150. >> >> Cheers, >> >> Ed. >> >> Cheers >> >> >> >> -- >> Oh, suddenly throwing a giraffe into a volcano to make water is crazy? >> Julian, King of Lemurs >