Hi Vivek,
There are two distinct steps involved: (1) the fit of your trajectory to a
reference structure, which corresponds to choose a conformation space; (2)
the use of the PCA method, which corresponds to find in that space a new
basis set whose ordered axes sequentially maximize dispersion (hopefully
capturing the distribution main features with only a few of the new
coordinates). The two steps just happen to be done by the same program.
The structure chosen for fitting is related to step 1, while the average
structure used to compute the covariance matrix is related to step 2 -- as
already pointed by Tjerk, the two structures are generally not the same.
The aim of the fit is to get rid of the global translation and rotation of
your protein in the simulation box, trying to place all the sampled
structures in a single 3D space that reflects "only" the conformational
differences. But this is necessarily approximate, because the
superimposition of any pair of structures after the global fit will be
always worse than you would get by making a pairwise fit of the two. Thus,
you want to get a final dispersion around the reference as small as
possible. So, of the two average structures that you tried, you should
choose the one computed from the last 30 ns (it's not surprising that it
gives a smaller dispersion, because it refers to the segment you are
analyzing). Still, using an average structure as a reference is a somewhat
illusory solution, because that average must itself be obtained after
fitting the trajectory to some reference... In a study of a small flexible
peptide (where the choice of reference may have drastic effects), we found
that a good reference seems to be the "central structure" of your sample,
defined as the one that, when taken as a reference, leads to the lowest
overall dispersion (http://dx.doi.org/10.1021/jp902991u). The article
discusses the issues pointed above, so you may want to give it a look.
You can also avoid the need of a reference by choosing a different
conformation space for PCA, a popular alternative being the phi and psi
dihedrals (look in the manual). Note that this dihedral space is a bit
different from the more usual one discussed above, each reflecting a
different kind of conformational proximity (this is also discussed in the
article). It's up to you to decide which one better suits your problem.
Hope this helps.
Cheers,
Antonio
On Sat, 9 Feb 2013, Tsjerk Wassenaar wrote:
Hi,
The commands would certainly help, including the commands for getting the
reference structure. Do note that the reference is the reference for
fitting, which is 'external', i.e. provided by the user. This is not the
same as the structure used to calculate the deviations, which is the
average structure of the frames selected.
Cheers,
Tsjerk
On Sat, Feb 9, 2013 at 7:06 PM, bipin singh <bipinel...@gmail.com> wrote:
Hi vivek,
I have few questions related to your query:
During covariance matrix calculation, g_covar by default takes average
structure of the trajectory as a reference structure then why you are
giving it average structure of your trajectory (0-100ns) manually.
Moreover without looking at your commands which you have used, it would be
difficult for anyone that why are you getting these surprising results.
On Thu, Feb 7, 2013 at 1:26 PM, vivek modi <modi.vivek2...@gmail.com>
wrote:
Hello,
I have troubled you with a similar question before also, but I guess I
need
some more clarification. My question is about the reference structure in
PCA analysis.
I have 100ns long protein simulation which I want to analyze using PCA.
The
RMSD shows fluctuations upto initial 25-30ns and then becomes very
stable.
I have performed PCA on the last 30ns window of the simulation where I
assume the simulation has converged (I also did on other time windows as
well).
The question is this:
I did the analysis on the last 30ns window in two ways by taking two
different reference structures.
a. I take the average structure of the trajectory (0-100ns) as
the reference and then do the fitting and calculate covariance matrix for
last 30ns. This is done because I suspect that the average structure over
full trajectory will reflect all the changes occurring in the protein. It
also gives me low cosines (<0.1). The PCs show movement occurring in
certain regions of the protein.
b. I take the average structure from the same window (last 30ns) then do
the fitting and calculate covariance matrix for the same. This is done
with
an assumption that the reference structure must reflect the
equilibriated/stable part of the trajectory unlike the previous case.
Surprisingly it gives me high cosines (>0.5). Unlike the previous case,
this method shows very small movement in the protein (very low RMSF).
Both of these methods give me different RMSF for the PCs although they
are
done on the same part of the trajectory but the reference structure is
influencing the output.
Which protocol among the two is appropriate ? And how can we explain
high
cosines in second case where the reference structure is the average of
the
same time window (there must not be large deviation) while I get low
cosine
for the first case where deviations are calculated from the full
trajectory
average (large deviation) ?
Any help is appreciated.
Thanks,
-Vivek Modi
Graduate Student
IITK.
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
*-----------------------
Thanks and Regards,
Bipin Singh*
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
Tsjerk A. Wassenaar, Ph.D.
post-doctoral researcher
Biocomputing Group
Department of Biological Sciences
2500 University Drive NW
Calgary, AB T2N 1N4
Canada
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
Antonio M. Baptista
Instituto de Tecnologia Quimica e Biologica, Universidade Nova de Lisboa
Av. da Republica - EAN, 2780-157 Oeiras, Portugal
phone: +351-214469619 email: bapti...@itqb.unl.pt
fax: +351-214411277 WWW: http://www.itqb.unl.pt/~baptista
--------------------------------------------------------------------------
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists