See https://github.com/apache/arrow/blob/master/r/R/python.R for the
r_to_py/py_to_r functions, and
https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp for how they
call the C++ library's implementation of the C data interface, in case you
need to go down to that level.

In case it's helpful, here's a getting-started guide for the R/Python
bridge (from the R side, so maybe it's not useful for you):
https://arrow.apache.org/docs/r/articles/python.html

Since the issue you're experiencing is about loading shared libraries in a
conda environment, this might not be relevant for you, but note that with
this C data interface, you won't necessarily need to have pyarrow and the
arrow R package linking with the same C++ library version.

Neal


On Sun, Apr 26, 2020 at 5:24 PM Wes McKinney <wesmck...@gmail.com> wrote:

> Agreed -- please check 0.17.0. Also, note that the C Data Interface
> makes moving shared_ptr<T> between R and Python radically simpler. See
> the "py_to_r" functions in
>
> https://github.com/apache/arrow/blob/master/r/tests/testthat/test-python.R
>
> Similar "r_to_py" functions could be written to use rpy2
>
> If you can provide instructions for reproducing this error in an
> isolated environment that would be helpful
>
> On Sun, Apr 26, 2020 at 5:44 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >
> > Hi Jeffrey,
> > I don't have expertise in this area (hopefully someone else can chime
> in),
> > but we recently released 0.17.0 could you check if this is still an issue
> > with the newer version?
> >
> > Thanks,
> > Micah
> >
> > On Sat, Apr 25, 2020 at 10:43 PM Jeffrey Wong
> <jeffr...@netflix.com.invalid>
> > wrote:
> >
> > > I was able to simplify this very much. There is a problem with
> > > pyarrow==0.16.0, r-arrow==0.16.0, and rpy2. Just by loading pyarrow,
> rpy2
> > > will not be able to load r-arrow. This set of imports fails now, but
> was
> > > fine in 0.14.1. Is it possible there is a conflict with shared objects
> that
> > > pyarrow loads, and shared objects that r-arrow tries to load after?
> > >
> > > # Fails
> > > import rpy2.robjects as ro
> > > import pyarrow
> > > ro.r("library(arrow)")
> > >
> > > # Succeeds
> > > import rpy2.robjects as ro
> > > ro.r("library(arrow)")
> > >
> > > # Also fails
> > > import rpy2.robjects as ro
> > > import pyarrow
> > > import pyarrow.parquet
> > > import pyarrow.dataset
> > > ro.r("library(arrow)")
> > >
> > > On Sat, Apr 25, 2020 at 12:19 PM Jeffrey Wong <jeffr...@netflix.com>
> > > wrote:
> > >
> > > > Hello, I am using Arrow Table's to facilitate fast data transfer
> between
> > > > python and R. The below strategy worked with arrow==0.14.1, but is no
> > > > longer working in arrow == 0.16.0.
> > > >
> > > > Using pyarrow, I convert a pandas dataframe to a pyarrow Table, then
> get
> > > > the memory address to the underlying Arrow Table. Something like
> this:
> > > >
> > > > unsigned long get_arrow_table_memory_address(py::object
> pyarrow_table) {
> > > >     arrow::py::import_pyarrow();
> > > >     std::shared_ptr<arrow::Table> table;
> > > >     arrow::py::unwrap_table(pyarrow_table.ptr(), &table);
> > > >     return (unsigned long) table.get();
> > > > }
> > > >
> > > > Using rpy2 I can create an R process inside the python process. The
> arrow
> > > > table is still in memory. In the R process, I receive the memory
> address
> > > > (as a string, which is then converted to unsigned int in Rcpp), and
> > > return
> > > > a shared_ptr for R
> > > >
> > > > SEXP arrow_table_from_memory_address(std::string memory_address) {
> > > >   std::shared_ptr<arrow::Table> table((arrow::Table *)
> > > > std::stoul(memory_address));
> > > >   Rcpp::XPtr<std::shared_ptr<arrow::Table>> output(new
> > > > std::shared_ptr<arrow::Table>(table), false);
> > > >   return output;
> > > > }
> > > >
> > > > Finally, I can create a r-arrow Table object, using
> arrow::Table$new(xp).
> > > > My ultimate goal is to then do as.data.frame, materializing the exact
> > > same
> > > > dataframe in R as the original one in pandas.
> > > >
> > > > In arrow == 0.16.0, I get an error concerning the r-arrow.so not
> being
> > > > able to see a symbol in libarrow_dataset.so.
> > > >
> > > > 10: dyn.load(file, DLLpath = DLLpath, ...)
> > > > 9: library.dynam(lib, package, package.lib)
> > > > 8: loadNamespace(name)
> > > > 7: getNamespace(ns)
> > > > 6: asNamespace(pkg)
> > > > 5: get(name, envir = asNamespace(pkg), inherits = FALSE)
> > > > 4: arrow:::shared_ptr at core_ArrowTablePointer.R#35
> > > > 3: ArrowTablePointer$new("94637300534352")$to_table(as_tibble =
> FALSE)
> > > > 2: (function (expr, envir = parent.frame(), enclos = if
> (is.list(envir)
> > > ||
> > > >        is.pairlist(envir)) parent.frame() else baseenv())
> > > >    .Internal(eval(expr, envir, enclos)))(expression(mydata =
> > > > ArrowTablePointer$new("94637300534352")$to_table(as_tibble = FALSE)))
> > > > 1: (function (expr, envir = parent.frame(), enclos = if
> (is.list(envir)
> > > ||
> > > >        is.pairlist(envir)) parent.frame() else baseenv())
> > > >    .Internal(eval(expr, envir, enclos)))(expression(mydata =
> > > > ArrowTablePointer$new("94637300534352")$to_table(as_tibble = FALSE)))
> > > > Traceback (most recent call last):
> > > >   File "/root/nflx_causal_models/causal_models/r/rpy2_patches.py",
> line
> > > > 30, in wrapped
> > > >     return f(self, *args, **kwargs)
> > > >   File
> > > >
> > >
> "/opt/conda/lib/python3.7/site-packages/rpy2/rinterface_lib/conversion.py",
> > > > line 28, in _
> > > >     cdata = function(*args, **kwargs)
> > > >   File "/opt/conda/lib/python3.7/site-packages/rpy2/rinterface.py",
> line
> > > > 785, in __call__
> > > >     raise embedded.RRuntimeError(_rinterface._geterrmessage())
> > > > rpy2.rinterface_lib.embedded.RRuntimeError: Error in dyn.load(file,
> > > > DLLpath = DLLpath, ...) :
> > > >   unable to load shared object
> > > > '/opt/conda/lib/R/library/arrow/libs/arrow.so':
> > > >
>  /opt/conda/lib/R/library/arrow/libs/../../../../libarrow_dataset.so.16:
> > > > undefined symbol:
> > > >
> > >
> _ZN5arrow2fs8internal17SplitAbstractPathERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
> > > >
> > > > Running ldd on the r-arrow.so, I do see that it is properly linked
> > > against
> > > > the arrow_dataset.so
> > > >
> > > > ldd /opt/conda/lib/R/library/arrow/libs/arrow.so
> > > > linux-vdso.so.1 =>  (0x00007ffc046d2000)
> > > > libarrow_dataset.so.16 =>
> > > >
> /opt/conda/lib/R/library/arrow/libs/../../../../libarrow_dataset.so.16
> > > > (0x00007ffb76a5f000)
> > > > libparquet.so.16 =>
> > > > /opt/conda/lib/R/library/arrow/libs/../../../../libparquet.so.16
> > > > (0x00007ffb76757000)
> > > > libarrow.so.16 =>
> > > > /opt/conda/lib/R/library/arrow/libs/../../../../libarrow.so.16
> > > > (0x00007ffb757c7000)
> > > > libR.so => /opt/conda/lib/R/library/arrow/libs/../../../lib/libR.so
> > > > (0x00007ffb7532a000)
> > > >
> > > >
> > > > I think the symbol is hashed, so I can't tell what function in
> > > > libarrow_dataset.so it is looking for
> > > >
> > > >
> > >
> _ZN5arrow2fs8internal17SplitAbstractPathERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
> > > >
> > > > Did I need to compile a version of Arrow with some kind of flag in
> order
> > > > to see this symbol? I currently get arrow-cpp, pyarrow, and r-arrow
> all
> > > > from conda-forge.
> > > >
> > > > Thank you so much for all the amazing development in arrow. This
> exchange
> > > > of pandas dataframe to R dataframe via arrow table is amazingly fast.
> > > > --
> > > > Jeffrey Wong
> > > > Computational Causal Inference
> > > >
> > >
> > >
> > > --
> > > Jeffrey Wong
> > > Computational Causal Inference
> > >
>

Reply via email to