Still OT... but here is my own (I think previously mentioned here) rant on people thrashing about with log transformation and an all-too-common kludge to deal with zeros mixed among small numbers... https://gist.github.com/jdnewmil/99301a88de702ad2fcbaef33326b08b4
OP perhaps posting a link here to your question posed wherever you end up with it will help shorten this thread. On January 22, 2024 12:23:20 PM PST, Bert Gunter <bgunter.4...@gmail.com> wrote: >Ah.... LOD's, typically LLOD's ("lower limits of detection"). > >Disclaimer: I am *NOT* in any sense an expert on such matters. What follows >are just some comments based on my personal experience. Please filter >accordingly. Also, while I kept it on list as Martin suggested it might be >useful to do so, most folks probably can safely ignore the rant that >follows as off topic and not of interest. So you've been warned!! > >The rant: >My experience is: data that contain a "bunch" of values that are, e.g. >below a LLOD, are frequently reported and/or analyzed by various ad hoc, >and imho, uniformly bad methods. e.g.: > >1) The censored values are recorded and analyzed as at the LLOD; >2) The censored values are recorded and analyzed at some arbitrary value >below the LLOD, like LLOD/2; >3) The censored values are are "imputed" by ad hoc methods, e.g. uniform >random values between 0 and the LLOD for left censoring. > >To repeat, *IMO*, all of this is junk and will produced misleading >statistical results. Whether they mislead enough to substantively affect >the science or regulatory decisions depend on the specifics of the >circumstances. I accept no general claim as to their innocuousness. > >Further: > >a) When you have a "lot" of values -- 50%? 75%?, 25%? -- face facts: you >have (practically) no useful information from the values that you do have >to infer what the distribution of values that you don't have looks like. >All one can sensibly do is say that x% of the values are below a LOD and >here's the distribution of what lies above. Presumably, if you have such >data conditional on covariates with the obvious intent to determine the >relationship to those covariates, you could analyze the percentages of >LLOD's and known values separately. There are undoubtedly more >sophisticated methods out there, so this is where you need to go to the >literature to see what might suit; though I think it will still have to >come down to looking at these separately (e.g. with extra parameters to >account for unmeasurable values). Another way of saying this is: any >analysis which treats all the data as arising from a single distribution >will depend more on the assumptions you make than on the data. So good luck >with that! > >b) If you have a "modest" amount of (known) censoring -- 5%?, 20%? 10%? -- >methods for the analysis of censored data should be useful. My >understanding is that MI (multiple imputation) is regarded as a generally >useful approach, and there are many R packages that can do various flavors >of this. Again, you should consult the literature: there are very likely >nontechnical reviews of this topic, too, as well as online discussions and >tutorials. > >So if you are serious about dealing with this and have a lot of data with >these issues, my advice would be to stop looking for ad hoc advice and dig >into the literature: it's one of the many areas of "data science" where >seemingly simple but pervasive questions require complex answers. > >And, again, heed my personal caveats. > >Thus endeth my rant. > >Cheers to all, >Bert > > > >On Mon, Jan 22, 2024 at 9:29 AM Rich Shepard <rshep...@appl-ecosys.com> >wrote: > >> On Mon, 22 Jan 2024, Martin Maechler wrote: >> >> > I think it is a good question, not really only about geo-chemistry, but >> > about statistics in applied sciences (and engineering for that matter). >> >> > John W Tukey (and several other of the grands of the time) had the log >> > transform among the "First aid transformations": >> > >> > If the data for a continuous variable must all be positive it is also >> > typically the case that the distribution is considerably skewed to the >> > right. In such a case behave as a good human who sees another human in >> > health distress: apply First Aid -- do the things you learned to do >> > quickly without too much thought, because things must happen fast ---to >> > hopefully save the other's life. >> >> Martin, >> >> Thanks very much. I will look further into this because toxic metals and >> organic compounds in geochemical collections almost always have censored >> lab >> results (below method dection limits) that range from about 15% to 80% or >> more, and there almost always are very high extreme values. >> >> I'll learn to understand what benefits log transforms have over >> compositional data analyses. >> >> Best regards, >> >> Rich >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.