Great analysis Weston!
Looks SimpleRecordBatch::column() is not thread safe for gcc < 5.0 as we are
simulating shared_ptr atomic load/store with normal load/store.
https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.cc#L80-L87
On 5/21/21 8:15 AM, Weston Pace wrote:
I like Yib
The field is always Int64Array. Regarding the arrowBatch *error reading
variable* message, we believe this is an artifact of gdb/gcc optimizations.
I examined the variable in lower contexts with gdb and it looks fine.
We replaced:
std::static_pointer_cast(_arrowBatch->column(_nAtts +
dim))->raw_v
I like Yibo's stack overflow theory given the "error reading variable"
but I did confirm that I can cause a segmentation fault if
std::atomic_store / std::atomic_load are unavailable. I simulated
this by simply commenting out the specializations rather than actually
run against GCC 4.9.2 so it ma
Also, is it possible that the field is not an Int64Array?
On Wed, May 19, 2021 at 10:19 PM Yibo Cai wrote:
>
> On 5/20/21 4:15 AM, Rares Vernica wrote:
> > Hello,
> >
> > I'm using Arrow for accessing data outside the SciDB database engine. It
> > generally works fine but we are running into Segm
On 5/20/21 4:15 AM, Rares Vernica wrote:
Hello,
I'm using Arrow for accessing data outside the SciDB database engine. It
generally works fine but we are running into Segmentation Faults in a
corner multi-threaded case. I identified two threads that work on the same
Record Batch. I wonder if ther
Is there a better (safer) way of accessing a specific Int64 cell in a
RecordBatch? Currently I'm doing something like this:
std::static_pointer_cast(batch->column(i))->raw_values()[j]
On Wed, May 19, 2021 at 3:09 PM Rares Vernica wrote:
> > /opt/rh/devtoolset-3/root/usr/bin/g++ -v
> Using built
> /opt/rh/devtoolset-3/root/usr/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/opt/rh/devtoolset-3/root/usr/bin/g++
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-3/root/usr/libexec/gcc/x86_64-redhat-linux/4.9.2/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/opt/rh/devtoolse
What compiler / glibc version are you using?
arrow::SimpleRecordBatch::column does some non-trivial caching which
uses std::atomic_load[1] which is not implemented properly on gcc < 5
so our behavior is different depending on the compiler version.
[1] https://en.cppreference.com/w/cpp/atomic/atomi
Hello,
I'm using Arrow for accessing data outside the SciDB database engine. It
generally works fine but we are running into Segmentation Faults in a
corner multi-threaded case. I identified two threads that work on the same
Record Batch. I wonder if there is something internal about RecordBatch
t