[ 
https://issues.apache.org/jira/browse/ARROW-12960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461669#comment-17461669
 ] 

Ian Cook commented on ARROW-12960:
----------------------------------

As seen in the first link Dewey shared above, the R bindings currently 
translate the R expression
{code:java}
is.nan(x){code}
to the Arrow expression 
{code:java}
and_kleene(is_nan(x), is_valid(x)){code}
when {{x}} has a floating point data type. This is for consistency with the 
behavior of R.

The gist of this Jira is that we would like to simplify that translation (and 
other bindings' translations) so they can simply call {{is_nan}} with an option.

> [C++][R] Option for is_nan(null) to evaluate to false or true
> -------------------------------------------------------------
>
>                 Key: ARROW-12960
>                 URL: https://issues.apache.org/jira/browse/ARROW-12960
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>            Reporter: Ian Cook
>            Assignee: Christian Cordova
>            Priority: Major
>              Labels: good-first-issue, kernel
>             Fix For: 7.0.0
>
>
> (This is the flip side of ARROW-12959.)
> Currently the Arrow compute kernel {{is_nan}} always treats {{null}} as a 
> missing value, returning {{null}} at positions of the input datum with 
> {{null}} (missing) values.
> It would be helpful to be able to control this behavior with an option. The 
> option could be named {{value_for_null}} or something similar and it would 
> take a nullable boolean scalar.  It would default to {{null}}, consistent 
> with current behavior. When set to {{false}} or {{true}}, it would return 
> {{false}} or {{true}} at positions of the input datum with {{null}} values.
> Among other things, this would enable the {{arrow}} R package to evaluate 
> {{is.nan()}} consistently with the way base R does. In base R, {{is.nan()}} 
> returns {{FALSE}} on {{NA}}. But in the {{arrow}} R package, it returns 
> {{NA}}:
> {code:r}
> > is.nan(c(3.14, NA, NaN))
> ##[1] FALSE FALSE  TRUE
> as.vector(is.nan(Array$create(c(3.14, NA, NaN))))
> ##[1] FALSE    NA  TRUE{code}
>  I think solving this with an option in the C++ kernel is the best solution, 
> because I suspect there are other cases in which users would want the ability 
> to return all non-missing values in the output from {{is_nan}} without 
> needing to call another kernel to fill the missing values in. However, it 
> would also be possible to solve this just in the R package, by changing the 
> mapping of {{is.nan}} in the R package. If we choose to go that route, we 
> should change this Jira issue summary to "[R] Make is.nan(NA) consistent with 
> base R".



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to