In the stats literature these are more often called permutation tests. Looking up that term should give you some results (if not, I have some references, but they are at work and I am not, I could probably get them for you on Monday if you have not found anything before then).
-- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Damjan Krstajic > Sent: Friday, March 05, 2010 5:39 PM > To: r-help@r-project.org > Subject: [R] scientific (statistical) foundation for Y-RANDOMIZATION in > regression analysis > > > Dear all, > > I am a statistician doing research in QSAR, building regression models > where the dependent variable is a numerical expression of some chemical > activity and input variables are chemical descriptors, e.g. molecular > weight, number of carbon atoms, etc. > > I am building regression models and I am confronted with a widely a > technique called Y-RANDOMIZATION for which I have difficulties in > finding references in general statistical literature regarding > regression analysis. I would be grateful if someone could point me to > papers/literature in statistical regression analysis which give > scientific (statistical) foundation for using Y-RANDOMIZATION. > > Y-RANDOMIZATION is a widely used technique in QSAR community to unsure > the robustness of a QSPR (regression) model. It is used after the > "best" regression model is selected and to make sure that there are no > chance correlations. Here is a short description. The dependent > variable vector (Y-vector) is randomly shuffled and a new QSPR > (regression) model is fitted using the original independent variable > matrix. By repeating this a number of times, say 100 times, one will > get hundred R2 and q2 (leave one out cross-validation R2) based on > hundred shuffled Y. It is expected that the resulting regression models > should generally have low R2 and low q2 values. However, if the > majority of hundred regression models obtained in the Y-randomization > have relatively high R2 and high q2 then it implies that an acceptable > regression model cannot be obtained for the given data set by the > current modelling method. > > I cannot find any references to Y-randomization or Y-scrambling > anywhere in the literature outside chemometrics/QSAR. Any links or > references would be much appreciated. > > Thanks in advance. > > DK > ---------------------------------------------- > Damjan Krstajic > Director > Research Centre for Cheminformatics > Belgrade, Serbia > > ---------------------------------------------- > > > _________________________________________________________________ > Tell us your greatest, weirdest and funniest Hotmail stories > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.