Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Micah Kornfield
Hi Rares, This last issue sounds like you are trying to write data from 0.16.0 version of the library and read it from a pre-0.15.0 version of the python library. If you want to do this you need to set "bool write_legacy_ipc_format" to true on IpcWriterOptions/IpcOptions object and construct the

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Rares Vernica
With open_stream I get a different error: > python -c "import pyarrow; pyarrow.ipc.open_stream('/tmp/foo')" Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/pyarrow/ipc.py", line 137, in open_stream return RecordBatchStreamReader(source)

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Wes McKinney
On Mon, Jun 15, 2020 at 11:24 PM Rares Vernica wrote: > > I was able to reproduce my issue in a small, fully-contained, program. Here > is the source code: > > #include > #include > #include > #include > > arrow::Status foo() { > std::shared_ptr arrowStream; > std::shared_ptr arrowWriter;

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Rares Vernica
I was able to reproduce my issue in a small, fully-contained, program. Here is the source code: #include #include #include #include arrow::Status foo() { std::shared_ptr arrowStream; std::shared_ptr arrowWriter; std::shared_ptr arrowBatch; std::shared_ptr arrowReader; std::vector>

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Rares Vernica
This is the compiler: > g++ --version g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 And this is how I compile the code: g++ -W -Wextra -Wall -Wno-unused-parameter -Wno-variadic-macros -Wno-strict-aliasing -Wno-long-long -Wno-unused -fPIC -D_STDC_FORMAT_MACROS -Wno-system-headers -O3 -g -DN

Timeline for next major Arrow release (1.0.0)

2020-06-15 Thread Wes McKinney
hi folks, Based on the previous discussions about release timelines, the window for the next major release would be around the week of July 6. Does this sound reasonable? I see that Neal has created a wiki page to help track the burndown https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Wes McKinney
What compiler are you using? In 0.16.0 (what you said you were targeting, though it would be better for you to upgrade to 0.17.1) schema is written in the CheckStarted function here https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/writer.cc#L972 Status CheckStarted() {

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Rares Vernica
Sure, here is briefly what I'm doing: bool append = false; std::shared_ptr arrowStream; auto arrowResult = arrow::io::FileOutputStream::Open(fileName, append); arrowStream = arrowResult.ValueOrDie(); std::shared_ptr arrowWriter; std::shared_ptr arrowBatch; std::shared_

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-06-14-0

2020-06-15 Thread Neal Richardson
Thanks for looking into these, Kou! Neal On Sun, Jun 14, 2020 at 2:35 PM Sutou Kouhei wrote: > Hi, > > I took a look failed tasks: > > * Dask and kartothek related tasks: > > https://github.com/apache/arrow/pull/7421 breaks these > tasks. > > Details: > https://github.com/apache/a

Re: Using gdb on a test

2020-06-15 Thread Wes McKinney
I also use gdb on the command line for all my debugging. I've always heard good things about CLion for visual debugging + breakpoints on Linux but I haven't invested the time to set it up. On Mon, Jun 15, 2020 at 10:10 AM Antoine Pitrou wrote: > > > I mostly only use gdb on crashes, otherwise I

Re: Using gdb on a test

2020-06-15 Thread Antoine Pitrou
I mostly only use gdb on crashes, otherwise I rely on unit tests and logical analysis. As for text editor, I use Kate. Regards Antoine. Le 15/06/2020 à 17:01, Maarten Breddels a écrit : > Thanks, that was it (ran the wrong history command). That running command > should have given a hint :)

Re: Using gdb on a test

2020-06-15 Thread Maarten Breddels
Thanks, that was it (ran the wrong history command). That running command should have given a hint :) Can I ask what people here use for debugging/editor? I'm settling on vscode, and using bare gdb for debugging. cheers, Maarten Op ma 15 jun. 2020 om 16:47 schreef Francois Saint-Jacques < fs

Re: Using gdb on a test

2020-06-15 Thread Francois Saint-Jacques
As Antoine said, debug mode is probably the most important configuration. You can also try the `relwithdebinfo` if you're trying to debug the optimized code. I'd also add the following: 1. Building out of conda provides a much better integration with gdb and the system's libstdc++ due to the prett

Re: Using gdb on a test

2020-06-15 Thread Antoine Pitrou
Hi Maarten, You should build in debug mode, i.e. pass -DCMAKE_BUILD_TYPE=Debug Regards Antoine. Le 15/06/2020 à 16:35, Maarten Breddels a écrit : > Hi all, > > I have trouble getting gdb working with a test suite. > Running e.g.: > $ gdb ./release/arrow-compute-scalar-test > I can't set a b

Using gdb on a test

2020-06-15 Thread Maarten Breddels
Hi all, I have trouble getting gdb working with a test suite. Running e.g.: $ gdb ./release/arrow-compute-scalar-test I can't set a breakpoint on e.g. arrow::compute::internal::TransformAsciiUpper in arrow/compute/kernels/scalar_string.cc. Tab completion on arrow::compute::int give no tab completi

Re: Flight benchmark question

2020-06-15 Thread Wes McKinney
On Mon, Jun 15, 2020 at 8:43 AM Antoine Pitrou wrote: > > > Le 15/06/2020 à 15:36, Wes McKinney a écrit : > > > > When you have only a single server, all the gRPC traffic goes through > > a common port and is handled by a common server, so if both client and > > server are roughly IO bound you are

Re: Flight benchmark question

2020-06-15 Thread Antoine Pitrou
Le 15/06/2020 à 15:36, Wes McKinney a écrit : > > When you have only a single server, all the gRPC traffic goes through > a common port and is handled by a common server, so if both client and > server are roughly IO bound you aren't going to get better performance > by hitting the server with m

Re: Flight benchmark question

2020-06-15 Thread Wes McKinney
We had a _very_ similar discussion in April https://lists.apache.org/thread.html/rd2aa01f460dd1092c60d1ba75087c2ce87c81ac543a246549b4713fb%40%3Cdev.arrow.apache.org%3E When you have only a single server, all the gRPC traffic goes through a common port and is handled by a common server, so if both

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Wes McKinney
Can you show the code you are writing? The first thing the stream writer does before writing any record batch is write the schema. It sounds like you are using arrow::ipc::WriteRecordBatch somewhere. On Sun, Jun 14, 2020, 11:44 PM Rares Vernica wrote: > Hello, > > I have a RecordBatch that I wou

Flight benchmark question

2020-06-15 Thread Yibo Cai
I'm evaluating flight benchmark [1] on single host. Met with one problem. Would like to seek for help. Flight benchmark has a "num_threads" parameter [1] to set "number of current gets". Counter-intuitively, setting it to larger values drops performance, "arrow-flight-benchmark --num_threads=1"

[NIGHTLY] Arrow Build Report for Job nightly-2020-06-15-0

2020-06-15 Thread Crossbow
Arrow Build Report for Job nightly-2020-06-15-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-15-0 Failed Tasks: - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-15-0-github-test-conda-py