David Winsemius wrote:
On Apr 25, 2009, at 9:25 AM, Frank E Harrell Jr wrote:
Emmanuel Charpentier wrote:
Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit :
I'm trying to use either mice or norm to perform multiple imputation
to fill
in some missing values in my data. The data has some missing values
because
of a chemical detection limit (so they are left censored). I'd like
to use
MI because I have several variables that are highly correlated. In
SAS's
proc MI, there is an option with which you can limit the imputed
values that
are returned to some range of specified values. Is there a way to
limit the
values in mice?
You may do that by writing your own imputation function and assign them
for the imputation of particular variable (see argument
"imputationMethod" and details in the man page for "mice").
If not, is there another MI tool in R that will
allow me to
specify a range of acceptable values for my imputed data?
In the function amelia (package "Amelia"), you might specify a "bounds"
argument, which allows for such a limitation. However, be aware that
this might destroy the basic assumption of Amelia, which is that your
data are multivariate normal. Maybe a change of variable is in order (e.
g. log(concentration) has usually much better statistical properties
than concentration).
Frank Harrell's aregImpute (package Hmisc) has the "curtail" argument
(TRUE by default) which limits imputations to the range of observed
values.
But if your left-censored variables are your dependent variables (not
covariates), may I suggest to analyze these data as censored data, as
allowed by Terry Therneau's "coxph" function (package "survival") ? code
your "missing" data as such a variable (use :
coxph(Surv(min(x,<yourlimit>,na.rm=TRUE),
!is.na(x),type="left")~<Yourmodel>) to do this on-the-fly).
Another possible idea is to split your (supposedly x) variable in two :
observed (logical), and value (observed value if observed, <detection
limit> if not) and include these two data in your model. You probably
will run into numerical difficulties due to the (built-in total
separation...).
HTH,
Emmanuel Charpentier
Thanks for the help,
Todd
All see
@Article{zha09non,
author = {Zhang, Donghui and Fan, Chunpeng and Zhang,
Juan and Zhang, {Cun-Hui}},
title = {Nonparametric methods for measurements below
detection limit},
journal = Stat in Med,
year = 2009,
volume = 28,
pages = {700-715},
annote = {lower limit of detection;left censoring;Tobit
model;Gehan test;Peto-Peto test;log-rank test;Wilcoxon test;location
shift model;superiority of nonparametric methods}
}
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
It appears they were dealing with outcomes possibly censored at a limit
of detection. At least that was the example they used to illustrate.
Is there a message that can be inferred about what to do with covariates
with values below the limit of detection? And can someone translate to a
non-statistician what the operational process was on the values below
the limit of detection in the Wilcoxon approach that they endorsed? They
transformed the right censored situation into a left censored one and
then they do ... what?
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
Yes it's easier to handle in the dependent variable. For independent
variables below the limit of detection we are left with model-based
extrapolation for multiple imputation, with no way to check the
imputation model's regression assumption. Predictive mean matching
can't be used.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.