Dear Robert,

this is really not asking for help about R  but rather wishing
for new features of a (very long) existing R function.
Hence this is a topic for the 'R-devel'  mailing list
(https://stat.ethz.ch/mailman/listinfo/R-devel )
rather than 'R-help'; see also  https://www.r-project.org/mail.html
on what the different lists are aimed at.

==> I will do a long reply to this post but divert it to R-devel
 (and will CC you at least in the first reply).

--> Further follow up to this: Please on 'R-devel'

>>>>> Robert Almgren 
>>>>>     on Fri, 3 May 2019 15:45:44 -0400 writes:

    > There is something I do not think is right in the approx() function in 
base R, with method="constant" and in the presence of NA values. I have 3.6.0, 
but the behavior seems to be the same in earlier versions.
    > My suggested fix is to add an "na.rm" argument to approx(), as in mean(). 
If this argument is FALSE, then NA values should be propagated into the output 
rather than being removed.

    > Details:

    > The documentation says 

    > "f: for method = "constant" a number between 0 and 1 inclusive, 
indicating a compromise between left- and right-continuous step functions. If 
y0 and y1 are the values to the left and right of the point then the value is 
y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this 
way the result is right-continuous for f == 0 and left-continuous for f == 1, 
even for non-finite y values."

    > This suggests to me that if the left value y0 is NA, and if f=0 (the 
default), then the interpolated value should be NA. (Regardless of the right 
value y1, see bug 15655 fixed in 2014.)

    > The documentation further says, below under "Details", that

    > "The inputs can contain missing values which are deleted."

    > The question is what is the appropriate behavior if one of the input 
values y is NA. Currently, approx() seems to interpret NA values as faulty data 
points, which should be deleted and the previous values carried forward 
(example below).

    > But in many applications, especially with "constant" interpolation, an NA 
value is intended to mean that we really do not know the value in the next 
interval, or explicitly that there is no value. Therefore the NA should not be 
removed, but should be propagated forward into the output within the 
corresponding interval.

    > The situation is similar with functions like mean(). The presence of an 
NA value may mean either (a) we want to compute the mean without that value 
(na.rm=TRUE), or (b) we really are missing important information, we cannot 
determine the mean, and we should return NA (na.rm=FALSE).

    > Therefore, I propose that approx() also be given an na.rm argument, 
indicating whether we wish to delete NA values, or treat them as actual values 
on the corresponding interval. That option makes even more sense for approx() 
than for mean(), since the NA values apply only on small regions of the data 
range.

    > --Robert Almgren

    > Example:

    > : R --vanilla

    > R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
    > Copyright (C) 2019 The R Foundation for Statistical Computing
    > Platform: x86_64-apple-darwin15.6.0 (64-bit)
    > ...

    >> t1 <- 1:5
    >> x1 <- c( 1, as.numeric(NA), 3, as.numeric(NA), 5 )
    >> print(data.frame(t1,x1))
    > t1 x1
    > 1  1  1
    > 2  2 NA   <-- we do not know the value between t=2 and t=3
    > 3  3  3
    > 4  4 NA   <-- we do not know the value between t=4 and t=5
    > 5  5  5
    >> X <- approx( t1, x1, (1:4) + 0.5, method='constant', rule=c(1,2) )
    >> print(data.frame(X))
    > x y
    > 1 1.5 1
    > 2 2.5 1   <---- I believe that these two values should be NA
    > 3 3.5 3
    > 4 4.5 3   <---- I believe that these two values should be NA

    > --

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to