Re: [R] Creating NA equivalent

Avi Gross via R-help Tue, 21 Dec 2021 17:37:04 -0800

Jim,

there are indeed many mathematical areas where data are not quite fixed. 
Consider inequalities such as a value that can be higher than some number but 
lower than another. A grade of A can often mean a score between 90 and 100 (no 
extra credit). An event deemed to be "significant at the 95% level of 
probability can be in a 5% range or based on various errors, may not even be in 
the range. Some places you can have infinitesimals or things approaching 
infinity and yet sometimes cancel things out without having an exact number.

The list of such things is vast and as was already pointed out here, many such 
cases have some info, even USEFUL info, that is lost if you declare them to be 
an NA or an Inf or by say choosing to view an A is exactly 95. If a student has 
straight A's, there is an excellent chance many of those A's came from scores 
above 95. A student with an overall C average may be more likely to have the 
single A be in the low 90's. 

R was not necessarily designed to work this way. For some purposes, you may 
want to use a variable that is more of a range. When I make plots in ggplot, I 
often use Inf or -Inf to specify one end of a range, so that, for example, 
whatever the data makes ggplot choose for upper and lower bounds, something I 
draw in the background will extend to that border. 

But there is a difference between how we store info, and how we use it. Many R 
functions have a feature like saying na.rm=TRUE that may not make sense if you 
store a value as an NA whose meaning is "between 95 and 100". You might want to 
write code that makes two copies of any vector which has an NA value associated 
with a range, and do something like place the minimum value(s) in one and the 
maximum in the other and then do some complex calculation.

Or consider a value like measuring a room with a ruler accurate only to 1/4 
inch? If a side is 100 inches, the real value can be between 99.75 and 100.25 
inches. Each measurement can be stored as a number and a plus/minus. To 
calculate the volume of a room, you might multiply all the low values to get 
one number and the high values to get another and store that as a range or 
whatever else makes send like averaging the two. 

Still, some of that is normally ignored or done some other way, without 
inventing new meanings for NA. I noted earlier that programs outside R will 
often do something like store out-of-band info that when imported into R is 
always treated as NA. Some thig may be unavailable because the person did not 
show up, others because they had horrible handwriting and the one who typed it 
in guessed what it said, and others who refused to answer . It may be that much 
of your program should treat all those as NA but other parts might want to 
record that some percent of the responders did this or that. As noted, Adrian 
Dusa and others had such needs and have a package that in some way annotates NA 
values when asked. I have played with it but currently have no need for it. 
And, just FYI, Adrian tried other things first as there already are multiple 
bit patterns that mean specific variation on an NA such as NA_integer_ (note 
the two underscores) and other variants for character, real, complex and a few 
more. In a bizarre way, you can play games and test them as in:

  > a=NA_integer_
  > b=NA_character_
  > identical(a, NA_integer_)
  [1] TRUE
  > identical(a, NA_character_)
  [1] FALSE
  > identical(a, a)
  [1] TRUE
  > identical(a, b)
  [1] FALSE
  > identical(a, NA)
  [1] FALSE

So, in THEORY, you might get away to using these oddball bitmap variations, or 
adding to them but they do not survive well in vectors which must in some sense 
only contain one type. I have had some minor success making a list and test the 
contents, which normally show all version as NA but clearly retain subtle 
differences:

  > temp=list(1, NA_integer_, 2, NA_character_, 3, NA)
  > temp
  [[1]]
  [1] 1

  [[2]]
  [1] NA

  [[3]]
  [1] 2

  [[4]]
  [1] NA

  [[5]]
  [1] 3

  [[6]]
  [1] NA

  > temp[[2]]
  [1] NA
  > identical(temp[[2]], NA_integer_)
  [1] TRUE
  > identical(temp[[2]], NA_character_)
  [1] FALSE
  > identical(temp[[4]], NA_character_)
  [1] TRUE

So, yes, I can imagine a subtle window of opportunity for re-using some of 
these NA variants to act like an NA but also be able to carefully signal some 
other opportunities. But as noted, vectors break the scheme so your data.frame 
might need to use list columns, which is doable. I bet many tools you use, 
especially ones that make copies or conversions, will break the scheme.

Please note that for ME, the above discussion is academic and a reaction to the 
ideas raised by others. I am not in any way suggesting R is deficient for not 
being designed for things like this, nor that wanting some such feature is a 
bad thing. What Adrian provided is sort of in between as real NA are stored but 
also some attributes record what the NA is supposed to represent.

-----Original Message-----
From: Jim Lemon <drjimle...@gmail.com> 
Sent: Tuesday, December 21, 2021 5:00 PM
To: Avi Gross <avigr...@verizon.net>
Cc: r-help mailing list <r-help@r-project.org>; Adrian Dușa 
<dusa.adr...@unibuc.ro>
Subject: Re: [R] Creating NA equivalent

Please pardon a comment that may be off-target as well as off-topic.
This appears similar to a number of things like fuzzy logic, where an instance 
can take incompatible truth values.

It is known that an instance may have an attribute with a numeric value, but 
that value cannot be determined.

It seems to me that an appropriate designation for the value is Unk, perhaps 
with an associated probability of determination to distinguish it from NA (it 
is definitely not known).

Jim

On Wed, Dec 22, 2021 at 6:55 AM Avi Gross via R-help <r-help@r-project.org> 
wrote:
>
> I wonder if the package Adrian Dușa created might be helpful or point you 
> along the way.
>
> It was eventually named "declared"
>
> https://cran.r-project.org/web/packages/declared/index.html
>
> With a vignette here:
>
> https://cran.r-project.org/web/packages/declared/vignettes/declared.pd
> f
>
> I do not know if it would easily satisfy your needs but it may be a step 
> along the way. A package called Haven was part of the motivation and Adrian 
> wanted a way to import data from external sources that had more than one 
> category of NA that sounds a bit like what you want. His functions should 
> allow the creation of such data within R, as well. I am including him in this 
> email if you want to contact him or he has something to say.
>
>
> -----Original Message-----
> From: R-help <r-help-boun...@r-project.org> On Behalf Of Duncan 
> Murdoch
> Sent: Tuesday, December 21, 2021 5:26 AM
> To: Marc Girondot <marc_...@yahoo.fr>; r-help@r-project.org
> Subject: Re: [R] Creating NA equivalent
>
> On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
> > Dear members,
> >
> > I work about dosage and some values are bellow the detection limit. 
> > I would like create new "numbers" like LDL (to represent lower than 
> > detection limit) and UDL (upper the detection limit) that behave 
> > like NA, with the possibility to test them using for example 
> > is.LDL() or is.UDL().
> >
> > Note that NA is not the same than LDL or UDL: NA represent missing data.
> > Here the data is available as LDL or UDL.
> >
> > NA is built in R language very deep... any option to create new 
> > version of NA-equivalent ?
> >
>
> There was a discussion of this back in May.  Here's a link to one approach 
> that I suggested:
>
>    https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html
>
> Read the followup messages, I made at least one suggested improvement.
> I don't know if anyone has packaged this, but there's a later version of the 
> code here:
>
>    https://stackoverflow.com/a/69179441/2554330
>
> Duncan Murdoch
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating NA equivalent

Reply via email to