[ https://issues.apache.org/jira/browse/ARROW-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-5190: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/16714 > [R] Discussion: tibble dependency in R package > ---------------------------------------------- > > Key: ARROW-5190 > URL: https://issues.apache.org/jira/browse/ARROW-5190 > Project: Apache Arrow > Issue Type: Wish > Components: R > Reporter: James Lamb > Assignee: Romain Francois > Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Hello, > > I would like to have a discussion on the use of *tibble* in the Apache Arrow > R package. I looked at the [the project contributor > guidelines|[https://github.com/apache/arrow/blob/master/docs/source/developers/contributing.rst]] > and could not tell where the best place might be to start a public > discussion on this topic, so I decided on JIRA. I apologize if this is not > the right place. > > *TL;DR* > I would like to propose moving the *tibble* dependency in the *arrow* R > package to "Suggests", removing the _as_tibble()_ in _read_arrow()_, and > having the core R code implementing the Arrow API only return data.frames or > other base-R data structures wherever possible. > > *Reasoning* > [As far as I can > tell|[https://github.com/apache/arrow/search?p=1&q=tibble&unscoped_q=tibble]], > outside of tests and examples *tibble* is only used in three places in the > package: > * S3 methods to convert Arrow objects to tibbles > (_as_tibble.arrow__::__RecordBatch()_, _as.tibble.arrow::Table()_) > * optional "convert to tibble on the way out" behavior controlled by a flag > in interfaces to file types (parquet and feather) > * > [_read_arrow()_|[https://github.com/apache/arrow/blob/0536ef8174982a7a13a251174cc38701e8663b68/r/R/read_table.R#L88]] > > In my opinion, all three of these uses of *tibble* are valuable for > developers who use that package (or other packages in its ecosystem), but I > am not convinced that the Arrow R package should be tightly coupled to them. > In the Python community, *pandas* is a broadly agreed-upon standard for > representing data frames. Even with that ubiquity, *pyarrow* does not depend > on *pandas* (it is not necessary to work with it) and all "compatibility with > *pandas*" code is isolated in a place explicitly intended for that purpose: > [https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py] > I think that is the ideal handling for integration of Arrow extensions with > other software it might be used with. This allows users who care about only > one of the integrations (e.g. feather, parquet, HDFS, Apache Spark, tibble, > data.table, etc.) to only have to build things they're already using. > > *Other background information* > I took the time to write this tonight after talking a colleague through the > issues *feather* (R package) users experienced after the *tibble 2.0* > release. See for example > [wesm/feather#374|[https://github.com/wesm/feather/issues/374]] and > [wesm/feather#372|[https://github.com/wesm/feather/issues/37|https://github.com/wesm/feather/issues/374]2]. > When *tibble 2.0* came out it broke *feather 0.3.1* and the maintainers > there promptly released to CRAN a *feather 0.3.2* which was compatible with > *tibble 2.0+*. Unfortunately, this still caused disruptions for many people > using *feather* (who inadvertently had *tibble* upgraded as part of > installing other packages which depended on it). Nothing about *tibble* was > necessary to the implementation of _read_feather()_, as far as I can tell, > but this design choice made installing and upgrading *tibble* non-optional > for developers who just wanted to use the feather file format and all it's > awesome features. > > If the proposal here is accepted, I hope it will mean we can prevent > repeating the same experience with the R *arrow* package and set a strong > precedent for developers who want to add compatibility in this package for > other members of the ecosystem like parquet or Apache Spark. > > > Thank you for hearing me out! > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)