In the spirit of Martin's comments, it is perhaps worthwhile to note one of John Tukey's (who I actually knew) pertinent quotes: "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. <https://www.azquotes.com/quote/603406>"
"Sunset Salvo" by John Tukey in The American Statistician, Volume 40, No. 1 (pp. 72-76), www.jstor.org. February 1986. Cheers, Bert <https://www.azquotes.com/author/14847-John_Tukey> On Mon, Jan 22, 2024 at 12:23 PM Bert Gunter <bgunter.4...@gmail.com> wrote: > > Ah.... LOD's, typically LLOD's ("lower limits of detection"). > > Disclaimer: I am *NOT* in any sense an expert on such matters. What > follows are just some comments based on my personal experience. Please > filter accordingly. Also, while I kept it on list as Martin suggested it > might be useful to do so, most folks probably can safely ignore the rant > that follows as off topic and not of interest. So you've been warned!! > > The rant: > My experience is: data that contain a "bunch" of values that are, e.g. > below a LLOD, are frequently reported and/or analyzed by various ad hoc, > and imho, uniformly bad methods. e.g.: > > 1) The censored values are recorded and analyzed as at the LLOD; > 2) The censored values are recorded and analyzed at some arbitrary value > below the LLOD, like LLOD/2; > 3) The censored values are are "imputed" by ad hoc methods, e.g. uniform > random values between 0 and the LLOD for left censoring. > > To repeat, *IMO*, all of this is junk and will produced misleading > statistical results. Whether they mislead enough to substantively affect > the science or regulatory decisions depend on the specifics of the > circumstances. I accept no general claim as to their innocuousness. > > Further: > > a) When you have a "lot" of values -- 50%? 75%?, 25%? -- face facts: you > have (practically) no useful information from the values that you do have > to infer what the distribution of values that you don't have looks like. > All one can sensibly do is say that x% of the values are below a LOD and > here's the distribution of what lies above. Presumably, if you have such > data conditional on covariates with the obvious intent to determine the > relationship to those covariates, you could analyze the percentages of > LLOD's and known values separately. There are undoubtedly more > sophisticated methods out there, so this is where you need to go to the > literature to see what might suit; though I think it will still have to > come down to looking at these separately (e.g. with extra parameters to > account for unmeasurable values). Another way of saying this is: any > analysis which treats all the data as arising from a single distribution > will depend more on the assumptions you make than on the data. So good luck > with that! > > b) If you have a "modest" amount of (known) censoring -- 5%?, 20%? 10%? -- > methods for the analysis of censored data should be useful. My > understanding is that MI (multiple imputation) is regarded as a generally > useful approach, and there are many R packages that can do various flavors > of this. Again, you should consult the literature: there are very likely > nontechnical reviews of this topic, too, as well as online discussions and > tutorials. > > So if you are serious about dealing with this and have a lot of data with > these issues, my advice would be to stop looking for ad hoc advice and dig > into the literature: it's one of the many areas of "data science" where > seemingly simple but pervasive questions require complex answers. > > And, again, heed my personal caveats. > > Thus endeth my rant. > > Cheers to all, > Bert > > > > On Mon, Jan 22, 2024 at 9:29 AM Rich Shepard <rshep...@appl-ecosys.com> > wrote: > >> On Mon, 22 Jan 2024, Martin Maechler wrote: >> >> > I think it is a good question, not really only about geo-chemistry, but >> > about statistics in applied sciences (and engineering for that matter). >> >> > John W Tukey (and several other of the grands of the time) had the log >> > transform among the "First aid transformations": >> > >> > If the data for a continuous variable must all be positive it is also >> > typically the case that the distribution is considerably skewed to the >> > right. In such a case behave as a good human who sees another human in >> > health distress: apply First Aid -- do the things you learned to do >> > quickly without too much thought, because things must happen fast ---to >> > hopefully save the other's life. >> >> Martin, >> >> Thanks very much. I will look further into this because toxic metals and >> organic compounds in geochemical collections almost always have censored >> lab >> results (below method dection limits) that range from about 15% to 80% or >> more, and there almost always are very high extreme values. >> >> I'll learn to understand what benefits log transforms have over >> compositional data analyses. >> >> Best regards, >> >> Rich >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.