Re: C++ RecordBatch Debugging Segmentation Fault

Rares Vernica Wed, 19 May 2021 15:23:36 -0700

Is there a better (safer) way of accessing a specific Int64 cell in a
RecordBatch? Currently I'm doing something like this:


std::static_pointer_cast<arrow::Int64Array>(batch->column(i))->raw_values()[j]

On Wed, May 19, 2021 at 3:09 PM Rares Vernica <rvern...@gmail.com> wrote:

> > /opt/rh/devtoolset-3/root/usr/bin/g++ -v
> Using built-in specs.
> COLLECT_GCC=/opt/rh/devtoolset-3/root/usr/bin/g++
>
> COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-3/root/usr/libexec/gcc/x86_64-redhat-linux/4.9.2/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/opt/rh/devtoolset-3/root/usr
> --mandir=/opt/rh/devtoolset-3/root/usr/share/man
> --infodir=/opt/rh/devtoolset-3/root/usr/share/info --with-bugurl=
> http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared
> --enable-threads=posix --enable-checking=release --enable-multilib
> --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
> --enable-gnu-unique-object --enable-linker-build-id
> --enable-languages=c,c++,fortran,lto --enable-plugin
> --with-linker-hash-style=gnu --enable-initfini-array --disable-libgcj
> --with-isl=/builddir/build/BUILD/gcc-4.9.2-20150212/obj-x86_64-redhat-linux/isl-install
> --with-cloog=/builddir/build/BUILD/gcc-4.9.2-20150212/obj-x86_64-redhat-linux/cloog-install
> --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686
> --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)
>
> Installed Packages
> Name        : glibc
> Arch        : x86_64
> Version     : 2.17
> Release     : 307.el7.1
> Size        : 13 M
> Repo        : installed
> From repo   : base
>
>
> On Wed, May 19, 2021 at 2:22 PM Weston Pace <weston.p...@gmail.com> wrote:
>
>> What compiler / glibc version are you using?
>> arrow::SimpleRecordBatch::column does some non-trivial caching which
>> uses std::atomic_load[1] which is not implemented properly on gcc < 5
>> so our behavior is different depending on the compiler version.
>>
>> [1] https://en.cppreference.com/w/cpp/atomic/atomic_load
>>
>> On Wed, May 19, 2021 at 10:15 AM Rares Vernica <rvern...@gmail.com>
>> wrote:
>> >
>> > Hello,
>> >
>> > I'm using Arrow for accessing data outside the SciDB database engine. It
>> > generally works fine but we are running into Segmentation Faults in a
>> > corner multi-threaded case. I identified two threads that work on the
>> same
>> > Record Batch. I wonder if there is something internal about RecordBatch
>> > that might help solve the mystery.
>> >
>> > We are using Arrow 0.16.0. The backtrace of the triggering thread looks
>> > like this:
>> >
>> > Program received signal SIGSEGV, Segmentation fault.
>> > [Switching to Thread 0x7fdad5fb4700 (LWP 3748)]
>> > 0x00007fdaa805abe0 in ?? ()
>> > (gdb) thread
>> > [Current thread is 2 (Thread 0x7fdad5fb4700 (LWP 3748))]
>> > (gdb) bt
>> > #0  0x00007fdaa805abe0 in ?? ()
>> > #1  0x0000000000850212 in
>> > std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
>> > #2  0x00007fdae4b1fbf1 in
>> > std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count
>> > (this=0x7fdad5fb1ae8, __in_chrg=<optimized out>) at
>> >
>> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:666
>> > #3  0x00007fdae4b39d74 in std::__shared_ptr<arrow::Array,
>> > (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fdad5fb1ae0,
>> > __in_chrg=<optimized out>) at
>> >
>> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:914
>> > #4  0x00007fdae4b39da8 in std::shared_ptr<arrow::Array>::~shared_ptr
>> > (this=0x7fdad5fb1ae0, __in_chrg=<optimized out>) at
>> > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr.h:93
>> > #5  0x00007fdae4b6a8e1 in scidb::XChunkIterator::getCoord
>> > (this=0x7fdaa807f9f0, dim=1, index=1137) at XArray.cpp:358
>> > #6  0x00007fdae4b68ecb in scidb::XChunkIterator::XChunkIterator
>> > (this=0x7fdaa807f9f0, chunk=..., iterationMode=0, arrowBatch=<error
>> reading
>> > variable: Cannot access memory at address 0xd5fb1b90>) at XArray.cpp:157
>> > ...
>> >
>> > The backtrace of the other thread working on exactly the same Record
>> Batch
>> > looks like this:
>> >
>> > (gdb) thread
>> > [Current thread is 3 (Thread 0x7fdad61b5700 (LWP 3746))]
>> > (gdb) bt
>> > #0  0x00007fdae3bc1ec7 in arrow::SimpleRecordBatch::column(int) const ()
>> > from /lib64/libarrow.so.16
>> > #1  0x00007fdae4b6a888 in scidb::XChunkIterator::getCoord
>> > (this=0x7fdab00c0bb0, dim=0, index=71) at XArray.cpp:357
>> > #2  0x00007fdae4b6a5a2 in scidb::XChunkIterator::operator++
>> > (this=0x7fdab00c0bb0) at XArray.cpp:305
>> > ...
>> >
>> > In both cases, the last non-Arrow code is the getCorord function
>> > https://github.com/Paradigm4/bridge/blob/master/src/XArray.cpp#L355
>> >
>> >     int64_t XChunkIterator::getCoord(size_t dim, int64_t index)
>> >     {
>> >         return std::static_pointer_cast<arrow::Int64Array>(
>> >             _arrowBatch->column(_nAtts + dim))->raw_values()[index];
>> >     }
>> > ...
>> > std::shared_ptr<const arrow::RecordBatch> _arrowBatch;
>> >
>> > Do you see anything suspicious about this code? What would trigger the
>> > shared_ptr destruction which takes place in thread 2?
>> >
>> > Thank you!
>> > Rares
>>
>

Re: C++ RecordBatch Debugging Segmentation Fault

Reply via email to