Re: [R] fda modeling

Spencer Graves Mon, 28 May 2012 10:16:55 -0700

Hi, Troels:

I'm still trying to understand the structure of your data.Please check the discussion below. If what I suggest is correct, itshould make the analysis much more routine and therefore easierrequiring less time to analyze.



On 5/21/2012 1:33 PM, Troels Ring wrote:

Dear friends - We have 25 rats, 14 of these subjected to partialremoval of kidney tissue, 11 to sham operation, and then followed for6 weeks. So far we have data on 26 urine metabolites measured by NMR 7times during the observation.

So you collected urine samples at 7 different times on each ratthroughout the experiment, separated out 26 different metabolites andmeasured each of those 7 using Nuclear Magnetic Resonance (NMR)? Whatwere the ages of the rats at the time of the operation and at the timesthat each of the 7 urine samples were collected? In particular, werethe 7 urine samples equally spaced? If yes, that could simplify theanalysis. The greater the time differences between samples and betweenrats, the more difficult the analysis potentially.

What were the ages of the 25 rats? Were they all from the samelitter? If no, how were they related? The worst possible case is thatyou have 14 from one litter and 11 from another. If that's the case,then any difference you see between the two groups could be a littereffect. If they are one rat from each of 25 litters, that wouldsimplify the statistical analyses. Scientifically, the best might be tohave at 4 or 5 rats from each of 6 litters, assigned with at least 2rats to experiment and 2 to control from each litter. You probablydon't have that, but the litter effect is likely to be important andthat needs to be part of the analysis, I think.

I have smoothed the measurements by b.splines in fda including aroughness penalty, and inspecting the mean curves for nephrectomizedand sham animals indicate differences for several of the metabolites.Now the real idea is to use the NMR measurements to understand whatgoes on in the kidneys since we know the partial removal of kidneytissue will result in progressive damage in the kidneys - the natureof that is what we want to understand. We have a blood sample from therats just prior to sacrifice, and the creatinine concentration thereis a good proxy for "renal function".

So you have one measure of creatinine for each rat measured justprior to sacrifice?

So the course of concentrations of the metabolites are thought to bevaluable in understanding the physiology. Some of these are thought tobe correlated. We have two groups where sham animals have better renalfunction than partially nephrectomized, but there is variation in bothgroups which is also interesting - some animals progress more rapidlyafter "the same operation" than others - we would like to know why.The data are available (eventually - the resulting blood tests stillare missing) if anyone would like to have a look but the main issue isif it is at all feasible to make fda work on such a problem.

I suggest you forget about fda at least initially and start withsimpler, more traditional tools. Later, you may or may not want toreturn to fda. I suggest you proceed as follows:

I. DATA CLEANING: Make normal probability plots ofeverything: I'd start with making one normal probability plot for eachof the 26 metabolites. Normally distributed data with approximately thesame mean and standard deviation will look approximately like a straightline. The scientist's dream with this is the image of two lines with agap in the middle, with the two lines corresponding exactly to the twogroups (nephrectomized vs. controls). It's more likely that you willsee mostly one distribution with a few observations away from amoderately straight line in the middle. If you see this, you shouldcheck the records and samples for the deviant observations to see if youcan find, e.g., a data entry error or a problem with mishandling asample. If you can't fix any observation that way, you should replacethe numbers with NA (not available = missing). Another possibility isyou see several little clusters corresponding to the litters. Or youmight see curvature to the line; with curvature, if all the numbers arepositive, you should try normal plots of the logarithms. If that helpsstraighten out the lines, you should analyze the logarithms not the rawnumbers. I usually do this with something like qqnorm(x, datax=TRUE).The use of "datax" means that with one or more outliers, the slope ofthe center portion will be closer to 45 degrees and therefore moreeasily processed with the naked eye.

II. UNIVARIATE ANALYSES: After data cleaning, I'd thenuse something like lme{nlme} to analyze each response variable(metabolite or creatinine) separately. I recommend lme, because it isexactly what is needed for this kind of thing AND there is a great bookavailable to describe how to do it: Pinheiro and Bates (2000)Mixed-Effects Models with S and S-PLUS (Springer). This book hascompanion script files in the nmle package (similar to those with fda),which are quite valuable for understanding the book, because there weresome changes in nlme, so in a very few cases, the code in the bookdoesn't work in R, but the code in the companion script file does.There is better software available today and there may be better books,but for what you have, I would probably not mess with anything else.The techniques described early in this book should help you analyzebetween treatment and between litter effects for all the differentvariables AND the impact of one or more variables on others.



            III.  MULTIVARIATE ANALYSES:

(a) Analyzing the variations over time of eachmetabolite for each rat in each litter and treatment group can bechallenging. In essence, you have 26 time series of length 7 for eachrat. This is very much the problem that pushed Doug Bates into studyingmixed-effects models: Earlier, he had studied nonlinear modeling, asdocumented in Bates and Watts (1988) Nonlinear Regression & ItsApplications (Wiley). Many of his datasets were metabolites collectedover time like the data you have. The second half of Pinheiro and Batesdescribes how to model mixed effects within nonlinear models like youhave. If you can NOT get simple linear or nonlinear models for eachmetabolite over time and the fda models provide something useful, youmight look at the fdaMixed package available from CRAN. I haven't usedit, but the name makes it sound like it might help you.

(b) You probably will also want to do eitherprinciple components or factor analysis of the 26 differentmetabolites: The first few principle components or factors will likelyrepresent the major modes of behavior among the metabolites. This couldreduce the analysis from 26 different matabolites to 2 or 5 differentprimary modes of variation in the biochemistry -- possibly clusteringthe metabolites to simplify the analysis and strengthen theinterpretation. Then take the principle components or factors back tonlme to complete the analysis.

I apologize for encouraging you too much to study fdatechniques. The above describes a standard analysis protocol that hasbeen used with great success by many people. Many of my data analysisfailures involved jumping straight to a multivariate analysis beforedoing the simple things first ;-)



       Hope this helps.
      Spencer Graves

Best wishes
Troels Ring,
Nephrology
Aalborg,  Denmark


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fda modeling

Reply via email to