Hi, Troels:

I'm still trying to understand the structure of your data. Please check the discussion below. If what I suggest is correct, it should make the analysis much more routine and therefore easier requiring less time to analyze.


On 5/21/2012 1:33 PM, Troels Ring wrote:
Dear friends - We have 25 rats, 14 of these subjected to partial removal of kidney tissue, 11 to sham operation, and then followed for 6 weeks. So far we have data on 26 urine metabolites measured by NMR 7 times during the observation.


So you collected urine samples at 7 different times on each rat throughout the experiment, separated out 26 different metabolites and measured each of those 7 using Nuclear Magnetic Resonance (NMR)? What were the ages of the rats at the time of the operation and at the times that each of the 7 urine samples were collected? In particular, were the 7 urine samples equally spaced? If yes, that could simplify the analysis. The greater the time differences between samples and between rats, the more difficult the analysis potentially.


What were the ages of the 25 rats? Were they all from the same litter? If no, how were they related? The worst possible case is that you have 14 from one litter and 11 from another. If that's the case, then any difference you see between the two groups could be a litter effect. If they are one rat from each of 25 litters, that would simplify the statistical analyses. Scientifically, the best might be to have at 4 or 5 rats from each of 6 litters, assigned with at least 2 rats to experiment and 2 to control from each litter. You probably don't have that, but the litter effect is likely to be important and that needs to be part of the analysis, I think.


I have smoothed the measurements by b.splines in fda including a roughness penalty, and inspecting the mean curves for nephrectomized and sham animals indicate differences for several of the metabolites. Now the real idea is to use the NMR measurements to understand what goes on in the kidneys since we know the partial removal of kidney tissue will result in progressive damage in the kidneys - the nature of that is what we want to understand. We have a blood sample from the rats just prior to sacrifice, and the creatinine concentration there is a good proxy for "renal function".


So you have one measure of creatinine for each rat measured just prior to sacrifice?



So the course of concentrations of the metabolites are thought to be valuable in understanding the physiology. Some of these are thought to be correlated. We have two groups where sham animals have better renal function than partially nephrectomized, but there is variation in both groups which is also interesting - some animals progress more rapidly after "the same operation" than others - we would like to know why. The data are available (eventually - the resulting blood tests still are missing) if anyone would like to have a look but the main issue is if it is at all feasible to make fda work on such a problem.


I suggest you forget about fda at least initially and start with simpler, more traditional tools. Later, you may or may not want to return to fda. I suggest you proceed as follows:


I. DATA CLEANING: Make normal probability plots of everything: I'd start with making one normal probability plot for each of the 26 metabolites. Normally distributed data with approximately the same mean and standard deviation will look approximately like a straight line. The scientist's dream with this is the image of two lines with a gap in the middle, with the two lines corresponding exactly to the two groups (nephrectomized vs. controls). It's more likely that you will see mostly one distribution with a few observations away from a moderately straight line in the middle. If you see this, you should check the records and samples for the deviant observations to see if you can find, e.g., a data entry error or a problem with mishandling a sample. If you can't fix any observation that way, you should replace the numbers with NA (not available = missing). Another possibility is you see several little clusters corresponding to the litters. Or you might see curvature to the line; with curvature, if all the numbers are positive, you should try normal plots of the logarithms. If that helps straighten out the lines, you should analyze the logarithms not the raw numbers. I usually do this with something like qqnorm(x, datax=TRUE). The use of "datax" means that with one or more outliers, the slope of the center portion will be closer to 45 degrees and therefore more easily processed with the naked eye.


II. UNIVARIATE ANALYSES: After data cleaning, I'd then use something like lme{nlme} to analyze each response variable (metabolite or creatinine) separately. I recommend lme, because it is exactly what is needed for this kind of thing AND there is a great book available to describe how to do it: Pinheiro and Bates (2000) Mixed-Effects Models with S and S-PLUS (Springer). This book has companion script files in the nmle package (similar to those with fda), which are quite valuable for understanding the book, because there were some changes in nlme, so in a very few cases, the code in the book doesn't work in R, but the code in the companion script file does. There is better software available today and there may be better books, but for what you have, I would probably not mess with anything else. The techniques described early in this book should help you analyze between treatment and between litter effects for all the different variables AND the impact of one or more variables on others.


            III.  MULTIVARIATE ANALYSES:


(a) Analyzing the variations over time of each metabolite for each rat in each litter and treatment group can be challenging. In essence, you have 26 time series of length 7 for each rat. This is very much the problem that pushed Doug Bates into studying mixed-effects models: Earlier, he had studied nonlinear modeling, as documented in Bates and Watts (1988) Nonlinear Regression & Its Applications (Wiley). Many of his datasets were metabolites collected over time like the data you have. The second half of Pinheiro and Bates describes how to model mixed effects within nonlinear models like you have. If you can NOT get simple linear or nonlinear models for each metabolite over time and the fda models provide something useful, you might look at the fdaMixed package available from CRAN. I haven't used it, but the name makes it sound like it might help you.


(b) You probably will also want to do either principle components or factor analysis of the 26 different metabolites: The first few principle components or factors will likely represent the major modes of behavior among the metabolites. This could reduce the analysis from 26 different matabolites to 2 or 5 different primary modes of variation in the biochemistry -- possibly clustering the metabolites to simplify the analysis and strengthen the interpretation. Then take the principle components or factors back to nlme to complete the analysis.


I apologize for encouraging you too much to study fda techniques. The above describes a standard analysis protocol that has been used with great success by many people. Many of my data analysis failures involved jumping straight to a multivariate analysis before doing the simple things first ;-)


       Hope this helps.
      Spencer Graves


Best wishes
Troels Ring,
Nephrology
Aalborg,  Denmark


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to