[ https://issues.apache.org/jira/browse/ARROW-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson resolved ARROW-14575. ------------------------------------- Resolution: Fixed Issue resolved by pull request 13160 [https://github.com/apache/arrow/pull/13160] > [R] Allow functions with {{pkg::}} prefixes > ------------------------------------------- > > Key: ARROW-14575 > URL: https://issues.apache.org/jira/browse/ARROW-14575 > Project: Apache Arrow > Issue Type: Bug > Components: R > Reporter: Jonathan Keane > Assignee: Dragoș Moldovan-Grünfeld > Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 18.5h > Remaining Estimate: 0h > > {*}Proposed approach{*}: > * add functionality to allow binding registration with the {{pkg::fun()}} > name; > ** Modify register_binding() to register 2 identical copies for each > pkg::fun binding, fun and pkg::fun. > ** Add a binding for the :: operator, which helps with retrieving bindings > from the function registry. > ** Add generic unit tests for the pkg::fun functionality. > * register {{nse_funcs}} requiring indirect mapping > ** register each binding with and without the pkg:: prefix > ** add / update unit tests for the nse_funcs bindings to include at least one > pkg::fun() call for each binding > * register {{nse_funcs}} requiring direct mapping (unary and binary bindings) > ** register each binding with and without the pkg:: prefix > ** add / update unit tests for the nse_funcs bindings to include at least one > pkg::fun() call for each binding > * register {{agg_funcs}} for use with {{summarise()}} > * document changes in the _Writing bindings_ documentation > ** going forward we should be using pkg::fun when defining a binding, which > will register 2 copies of the same binding. > Different implementation options are outlined and discussed in the [design > document|https://docs.google.com/document/d/1Om-vYb31b6p_u4tyl86SGW1DrtWBfksq8NYG1Seqaxg/edit?usp=sharing]. > {*}Description{*}: > Currently we implement a number of functions from packages like {{lubridate}} > which work well when called without namespacing (e.g. {{{}year(){}}}), > however if someone calls {{lubridate::year()}} we get a not-implemented > method (e.g. {{{}Warning: Expression lubridate::year(time_hour) not supported > in Arrow{}}}). Is it possible for us to look and see if we have an arrow > function that matches the function itself. > {code:r} > library(arrow, warn.conflicts = FALSE) > library(dplyr, warn.conflicts = FALSE) > ds <- InMemoryDataset$create(nycflights13::flights) > ds %>% > mutate(year = lubridate::year(time_hour)) %>% > collect() > #> Warning: Expression lubridate::year(time_hour) not supported in Arrow; > pulling > #> data into R > #> # A tibble: 336,776 × 19 > #> year month day dep_time sched_dep_time dep_delay arr_time > sched_arr_time > #> <dbl> <int> <int> <int> <int> <dbl> <int> > <int> > #> 1 2013 1 1 517 515 2 830 > 819 > #> 2 2013 1 1 533 529 4 850 > 830 > #> 3 2013 1 1 542 540 2 923 > 850 > #> 4 2013 1 1 544 545 -1 1004 > 1022 > #> 5 2013 1 1 554 600 -6 812 > 837 > #> 6 2013 1 1 554 558 -4 740 > 728 > #> 7 2013 1 1 555 600 -5 913 > 854 > #> 8 2013 1 1 557 600 -3 709 > 723 > #> 9 2013 1 1 557 600 -3 838 > 846 > #> 10 2013 1 1 558 600 -2 753 > 745 > #> # … with 336,766 more rows, and 11 more variables: arr_delay <dbl>, > #> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, > #> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour > <dttm> > ds %>% > mutate(year = year(time_hour)) %>% > collect() > #> # A tibble: 336,776 × 19 > #> year month day dep_time sched_dep_time dep_delay arr_time > sched_arr_time > #> <int> <int> <int> <int> <int> <dbl> <int> > <int> > #> 1 2013 1 1 517 515 2 830 > 819 > #> 2 2013 1 1 533 529 4 850 > 830 > #> 3 2013 1 1 542 540 2 923 > 850 > #> 4 2013 1 1 544 545 -1 1004 > 1022 > #> 5 2013 1 1 554 600 -6 812 > 837 > #> 6 2013 1 1 554 558 -4 740 > 728 > #> 7 2013 1 1 555 600 -5 913 > 854 > #> 8 2013 1 1 557 600 -3 709 > 723 > #> 9 2013 1 1 557 600 -3 838 > 846 > #> 10 2013 1 1 558 600 -2 753 > 745 > #> # … with 336,766 more rows, and 11 more variables: arr_delay <dbl>, > #> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, > #> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour > <dttm> > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)