On 14-Oct-07 08:33:41, Thomas Frööjd wrote: > Hi R users. I am new to the community and have got myself into > a little problem.
It does not look as though it was yourself who got you into this problem! You have been given the bathwater along with the baby. > I have a dataset of birth weights recorded by nurses at a delivery > clinic in an developing country. > > The weights are entered in KiloGrams with one decimal. However > there is substantial heaping at each 500g when looking at the > sample in a histogram. Do anyone of you know a easy way to adjust > for this and if it exists an R package to implement the method? > > Best regards > Thomas Fröjd It is quite a common problem for data to be badly recorded in this kind of way (as well as other bad kinds of ways). You can't "adjust" for it (in the sense of "compensate") directly since such a rounding does not tell where it was rounded from. There may, howevr, be information in the covariates which could be relevant to that question. I'll comment on two extreme approaches and a possible intermediate approach. 1) If you want to treat all data on the same footing, then you can round every weight to the nearest 500gm. This has the disadvantage of losing the information in the weights which have been recorded more precisely. The potential difference of up to 250gm, in a typical birth weight of say 2-2.5kgm, could result in a serious disstortion. However, you could assess the effect of this by performing your intended analysis using the data as you have them, the repeating it with the full-rounded data , and seeing how much difference it makes. 2) You could attempt to evaluate the extra uncertainty which results from this rounding which has been done by the nurses. One approach could be to fit a Normal distribution (say) to the data as you have them. Say this estimates mu0 for the mean and s0 for the stahdard deviation. You can then "un-round" the rounded data at random, on the basis that, given that a weight is say 2.5 kgm, it might be anywhere from 2.25 to 2.75 according to that distribution conditional on being in that range. This is quite easily done in R: if wt=2.5, say, p0 <- pnorm((wt - 0.25 - mu0)/s00 p1 <- pnorm((wt + 0.25 - mu0)/s0) X <- runif(1,p0,p1) rwt <- mu0 + s0*qnorm(X) rwt <- round(rwt,1) ## see below If you do this for every truly rounded 'wt', and perform you intended analysis for the resulting "un-rounded" dataset (of course after rounfing the results to 100gm, to be compatible with the 0.1kgm general rounding0, and then repeat this unrounding+analysis a few times, you will have an estimate of the ucertainty, in your final results, which has been introduced by the gross rounding. However, you will have to make a decision about what proportion of the data at each whole 500gm have really been rounded! Some of these are likely to be measurements which would have been quite appropriately rounded to the nearest 500gm -- e.g. 2.05kgm -> 2.0kgm. You may be able to estimate this proportion from the heights of the "factory chimneys" in the histogram. Then apply the above procedure to that fraction. 3. If you have covariates with your weight data, you may be able to fit an appropriate model to your original data which would enable you to estimate, for any given "rounded" weight, the mu0 and s0 for that weight in terms of the values of the covariates. Then proceed as in (2). However, having done that, it may transpire that you should re-estimate the model, which would imply re-estimating the m0 and s0 used for the "random unrounding", and then going round the loop again. You're moving into Multiple Imputation territory now, and again there are resources in R for doing it; but it's deeper and more coplex territory! In both (2) and (3), the same check as in (1) should be carried out: Has it made any difference that matters to the results, compared with what you get from the original data? Hoping this helps (at least a bit). Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 14-Oct-07 Time: 11:39:47 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.