Hi Jeffrey, I don't have expertise in this area (hopefully someone else can chime in), but we recently released 0.17.0 could you check if this is still an issue with the newer version?
Thanks, Micah On Sat, Apr 25, 2020 at 10:43 PM Jeffrey Wong <jeffr...@netflix.com.invalid> wrote: > I was able to simplify this very much. There is a problem with > pyarrow==0.16.0, r-arrow==0.16.0, and rpy2. Just by loading pyarrow, rpy2 > will not be able to load r-arrow. This set of imports fails now, but was > fine in 0.14.1. Is it possible there is a conflict with shared objects that > pyarrow loads, and shared objects that r-arrow tries to load after? > > # Fails > import rpy2.robjects as ro > import pyarrow > ro.r("library(arrow)") > > # Succeeds > import rpy2.robjects as ro > ro.r("library(arrow)") > > # Also fails > import rpy2.robjects as ro > import pyarrow > import pyarrow.parquet > import pyarrow.dataset > ro.r("library(arrow)") > > On Sat, Apr 25, 2020 at 12:19 PM Jeffrey Wong <jeffr...@netflix.com> > wrote: > > > Hello, I am using Arrow Table's to facilitate fast data transfer between > > python and R. The below strategy worked with arrow==0.14.1, but is no > > longer working in arrow == 0.16.0. > > > > Using pyarrow, I convert a pandas dataframe to a pyarrow Table, then get > > the memory address to the underlying Arrow Table. Something like this: > > > > unsigned long get_arrow_table_memory_address(py::object pyarrow_table) { > > arrow::py::import_pyarrow(); > > std::shared_ptr<arrow::Table> table; > > arrow::py::unwrap_table(pyarrow_table.ptr(), &table); > > return (unsigned long) table.get(); > > } > > > > Using rpy2 I can create an R process inside the python process. The arrow > > table is still in memory. In the R process, I receive the memory address > > (as a string, which is then converted to unsigned int in Rcpp), and > return > > a shared_ptr for R > > > > SEXP arrow_table_from_memory_address(std::string memory_address) { > > std::shared_ptr<arrow::Table> table((arrow::Table *) > > std::stoul(memory_address)); > > Rcpp::XPtr<std::shared_ptr<arrow::Table>> output(new > > std::shared_ptr<arrow::Table>(table), false); > > return output; > > } > > > > Finally, I can create a r-arrow Table object, using arrow::Table$new(xp). > > My ultimate goal is to then do as.data.frame, materializing the exact > same > > dataframe in R as the original one in pandas. > > > > In arrow == 0.16.0, I get an error concerning the r-arrow.so not being > > able to see a symbol in libarrow_dataset.so. > > > > 10: dyn.load(file, DLLpath = DLLpath, ...) > > 9: library.dynam(lib, package, package.lib) > > 8: loadNamespace(name) > > 7: getNamespace(ns) > > 6: asNamespace(pkg) > > 5: get(name, envir = asNamespace(pkg), inherits = FALSE) > > 4: arrow:::shared_ptr at core_ArrowTablePointer.R#35 > > 3: ArrowTablePointer$new("94637300534352")$to_table(as_tibble = FALSE) > > 2: (function (expr, envir = parent.frame(), enclos = if (is.list(envir) > || > > is.pairlist(envir)) parent.frame() else baseenv()) > > .Internal(eval(expr, envir, enclos)))(expression(mydata = > > ArrowTablePointer$new("94637300534352")$to_table(as_tibble = FALSE))) > > 1: (function (expr, envir = parent.frame(), enclos = if (is.list(envir) > || > > is.pairlist(envir)) parent.frame() else baseenv()) > > .Internal(eval(expr, envir, enclos)))(expression(mydata = > > ArrowTablePointer$new("94637300534352")$to_table(as_tibble = FALSE))) > > Traceback (most recent call last): > > File "/root/nflx_causal_models/causal_models/r/rpy2_patches.py", line > > 30, in wrapped > > return f(self, *args, **kwargs) > > File > > > "/opt/conda/lib/python3.7/site-packages/rpy2/rinterface_lib/conversion.py", > > line 28, in _ > > cdata = function(*args, **kwargs) > > File "/opt/conda/lib/python3.7/site-packages/rpy2/rinterface.py", line > > 785, in __call__ > > raise embedded.RRuntimeError(_rinterface._geterrmessage()) > > rpy2.rinterface_lib.embedded.RRuntimeError: Error in dyn.load(file, > > DLLpath = DLLpath, ...) : > > unable to load shared object > > '/opt/conda/lib/R/library/arrow/libs/arrow.so': > > /opt/conda/lib/R/library/arrow/libs/../../../../libarrow_dataset.so.16: > > undefined symbol: > > > _ZN5arrow2fs8internal17SplitAbstractPathERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE > > > > Running ldd on the r-arrow.so, I do see that it is properly linked > against > > the arrow_dataset.so > > > > ldd /opt/conda/lib/R/library/arrow/libs/arrow.so > > linux-vdso.so.1 => (0x00007ffc046d2000) > > libarrow_dataset.so.16 => > > /opt/conda/lib/R/library/arrow/libs/../../../../libarrow_dataset.so.16 > > (0x00007ffb76a5f000) > > libparquet.so.16 => > > /opt/conda/lib/R/library/arrow/libs/../../../../libparquet.so.16 > > (0x00007ffb76757000) > > libarrow.so.16 => > > /opt/conda/lib/R/library/arrow/libs/../../../../libarrow.so.16 > > (0x00007ffb757c7000) > > libR.so => /opt/conda/lib/R/library/arrow/libs/../../../lib/libR.so > > (0x00007ffb7532a000) > > > > > > I think the symbol is hashed, so I can't tell what function in > > libarrow_dataset.so it is looking for > > > > > _ZN5arrow2fs8internal17SplitAbstractPathERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE > > > > Did I need to compile a version of Arrow with some kind of flag in order > > to see this symbol? I currently get arrow-cpp, pyarrow, and r-arrow all > > from conda-forge. > > > > Thank you so much for all the amazing development in arrow. This exchange > > of pandas dataframe to R dataframe via arrow table is amazingly fast. > > -- > > Jeffrey Wong > > Computational Causal Inference > > > > > -- > Jeffrey Wong > Computational Causal Inference >