Re: C++ RecordBatch Debugging Segmentation Fault

2021-05-20 Thread Yibo Cai
Great analysis Weston! Looks SimpleRecordBatch::column() is not thread safe for gcc < 5.0 as we are simulating shared_ptr atomic load/store with normal load/store. https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.cc#L80-L87 On 5/21/21 8:15 AM, Weston Pace wrote: I like Yib

Re: C++ RecordBatch Debugging Segmentation Fault

2021-05-20 Thread Rares Vernica
The field is always Int64Array. Regarding the arrowBatch *error reading variable* message, we believe this is an artifact of gdb/gcc optimizations. I examined the variable in lower contexts with gdb and it looks fine. We replaced: std::static_pointer_cast(_arrowBatch->column(_nAtts + dim))->raw_v

Re: C++ RecordBatch Debugging Segmentation Fault

2021-05-20 Thread Weston Pace
I like Yibo's stack overflow theory given the "error reading variable" but I did confirm that I can cause a segmentation fault if std::atomic_store / std::atomic_load are unavailable. I simulated this by simply commenting out the specializations rather than actually run against GCC 4.9.2 so it ma

Re: C++ RecordBatch Debugging Segmentation Fault

2021-05-20 Thread Wes McKinney
Also, is it possible that the field is not an Int64Array? On Wed, May 19, 2021 at 10:19 PM Yibo Cai wrote: > > On 5/20/21 4:15 AM, Rares Vernica wrote: > > Hello, > > > > I'm using Arrow for accessing data outside the SciDB database engine. It > > generally works fine but we are running into Segm

Re: C++ RecordBatch Debugging Segmentation Fault

2021-05-19 Thread Yibo Cai
On 5/20/21 4:15 AM, Rares Vernica wrote: Hello, I'm using Arrow for accessing data outside the SciDB database engine. It generally works fine but we are running into Segmentation Faults in a corner multi-threaded case. I identified two threads that work on the same Record Batch. I wonder if ther

Re: C++ RecordBatch Debugging Segmentation Fault

2021-05-19 Thread Rares Vernica
Is there a better (safer) way of accessing a specific Int64 cell in a RecordBatch? Currently I'm doing something like this: std::static_pointer_cast(batch->column(i))->raw_values()[j] On Wed, May 19, 2021 at 3:09 PM Rares Vernica wrote: > > /opt/rh/devtoolset-3/root/usr/bin/g++ -v > Using built

Re: C++ RecordBatch Debugging Segmentation Fault

2021-05-19 Thread Rares Vernica
> /opt/rh/devtoolset-3/root/usr/bin/g++ -v Using built-in specs. COLLECT_GCC=/opt/rh/devtoolset-3/root/usr/bin/g++ COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-3/root/usr/libexec/gcc/x86_64-redhat-linux/4.9.2/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/opt/rh/devtoolse

Re: C++ RecordBatch Debugging Segmentation Fault

2021-05-19 Thread Weston Pace
What compiler / glibc version are you using? arrow::SimpleRecordBatch::column does some non-trivial caching which uses std::atomic_load[1] which is not implemented properly on gcc < 5 so our behavior is different depending on the compiler version. [1] https://en.cppreference.com/w/cpp/atomic/atomi

C++ RecordBatch Debugging Segmentation Fault

2021-05-19 Thread Rares Vernica
Hello, I'm using Arrow for accessing data outside the SciDB database engine. It generally works fine but we are running into Segmentation Faults in a corner multi-threaded case. I identified two threads that work on the same Record Batch. I wonder if there is something internal about RecordBatch t