[ 
https://issues.apache.org/jira/browse/ARROW-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-14649:
-----------------------------
    Summary: [R] Include unused factor levels in coalesce() and if_else() 
output  (was: [R] Include unused factor levels in coalesce() output)

> [R] Include unused factor levels in coalesce() and if_else() output
> -------------------------------------------------------------------
>
>                 Key: ARROW-14649
>                 URL: https://issues.apache.org/jira/browse/ARROW-14649
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Ian Cook
>            Priority: Minor
>
> ARROW-14167 added support for factors in {{{}coalesce(){}}}, but the factors 
> that are returned will not necessarily retain the factor levels like 
> {{coalesce()}} does when used on an R data frame.
> For example, compare these, noticing the difference in the levels:
> {code:r}
> # R data frame
> tibble(x = factor(c("a", NA_character_)), y = factor(c("b", "c"))) %>%
>   mutate(y = coalesce(x, y)) %>%
>   pull(y)
> #> [1] a c
> #> Levels: a b c{code}
> {code:r}
> # Arrow Table
> tibble(x = factor(c("a", NA_character_)), y = factor(c("b", "c"))) %>%
>   Table$create() %>%
>   mutate(y = coalesce(x, y)) %>%
>   pull(y)
> #> [1] a c
> #> Levels: a c{code}
> I'm not sure if it is practical to make Arrow return the factors with the 
> unused levels included like R does. If so, we should do it.
> See the test in {{test-dplyr-funcs-conditional.R}} that refers to this Jira.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to