Re: Contributing to Arrow

2020-04-25 Thread karuppayya
Hi Ji, Thanks for letting me know. I will pick it up. Thanks & regards Karuppayya On Sat, 25 Apr 2020, 23:30 Ji Liu, wrote: > Hi Karuppayya, > Welcome! > If you are interested in this issue, feel free to take it. > > > Thanks, > Ji Liu > > > -

Re: Contributing to Arrow

2020-04-25 Thread Ji Liu
Hi Karuppayya, Welcome! If you are interested in this issue, feel free to take it. Thanks, Ji Liu -- From:karuppayya Send Time:2020年4月25日(星期六) 14:21 To:emkornfield Cc:dev Subject:Re: Contributing to Arrow Hi Micah, Thanks for

[jira] [Created] (ARROW-8595) [C++] Rearrange code in bit-util.h/.cc for AppendWord

2020-04-25 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8595: -- Summary: [C++] Rearrange code in bit-util.h/.cc for AppendWord Key: ARROW-8595 URL: https://issues.apache.org/jira/browse/ARROW-8595 Project: Apache Arrow

Re: Sending shared pointers from python to R

2020-04-25 Thread Jeffrey Wong
I was able to simplify this very much. There is a problem with pyarrow==0.16.0, r-arrow==0.16.0, and rpy2. Just by loading pyarrow, rpy2 will not be able to load r-arrow. This set of imports fails now, but was fine in 0.14.1. Is it possible there is a conflict with shared objects that pyarrow loads

[jira] [Created] (ARROW-8594) [GLib][Plasma] Check failed: object.data_size == data_size

2020-04-25 Thread Tanveer (Jira)
Tanveer created ARROW-8594: -- Summary: [GLib][Plasma] Check failed: object.data_size == data_size Key: ARROW-8594 URL: https://issues.apache.org/jira/browse/ARROW-8594 Project: Apache Arrow Issue T

Sending shared pointers from python to R

2020-04-25 Thread Jeffrey Wong
Hello, I am using Arrow Table's to facilitate fast data transfer between python and R. The below strategy worked with arrow==0.14.1, but is no longer working in arrow == 0.16.0. Using pyarrow, I convert a pandas dataframe to a pyarrow Table, then get the memory address to the underlying Arrow Tabl

Re: Strategy for Writing a Large Table?

2020-04-25 Thread Hei Chan
I managed to iterate through small chucks of data, and for each chuck, I convert it into pandas.DataFrame, convert into Table, and then write to a parquet file. Is there any advantage to write RecordBatch (through pyarrow.RecordBatchFileWriter.write_batch()?) instead of ParquetWriter.write_tab

[NIGHTLY] Arrow Build Report for Job nightly-2020-04-25-0

2020-04-25 Thread Crossbow
Arrow Build Report for Job nightly-2020-04-25-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-25-0 Failed Tasks: - debian-stretch-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-25-0-github-debian-stretch-amd64 - tes

[jira] [Created] (ARROW-8593) Parquet file_serialize_test.cc fails to build with musl libc

2020-04-25 Thread Tobias Mayer (Jira)
Tobias Mayer created ARROW-8593: --- Summary: Parquet file_serialize_test.cc fails to build with musl libc Key: ARROW-8593 URL: https://issues.apache.org/jira/browse/ARROW-8593 Project: Apache Arrow