So this particular toolchain mix seems to be broken, does everything
work if you compile Arrow, the plugin, and the core database with
devtoolset-3? I think the weak link is Arrow C++ compiled with a
non-devtoolset compiler toolchain. If there were the package
maintainer bandwidth, having both devtoolset-gcc and system-gcc
pre-built RPMs would be potentially interesting (but there are so many
devtoolsets, which one should you use?).

On Thu, Jun 10, 2021 at 6:04 PM Rares Vernica <rvern...@gmail.com> wrote:
>
> Yes, the pre-built binaries are the official RPM packages.
>
> I recompilled 4.0.1 with the default gcc-g++ from CentOS 7 and Debug flag.
> The segmentation fault occurred. See below for the backtrace.
>
> Please note that the SciDB database as well as the Plug-in where the Arrow
> library is used are compiled with g++ from devtoolset-3. Maybe this problem
> is due to the different versions of the g++ compiler being used...
>
> Also note that the code path that writes Arrow files work fine, it is just
> the path that reads the files that breaks.
>
> > g++ --version
> g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fae877fe700 (LWP 16783)]
> 0x00007fae8eb1e000 in ?? ()
> (gdb) bt
> #0  0x00007fae8eb1e000 in ?? ()
> #1  0x00007fae906bd4d0 in arrow::ipc::ArrayLoader::ReadBuffer
> (this=0x7fae877fa090, offset=0, length=24, out=0x7fae5c004010) at
> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:163
> #2  0x00007fae906bd7b8 in arrow::ipc::ArrayLoader::GetBuffer
> (this=0x7fae877fa090, buffer_index=1, out=0x7fae5c004010) at
> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:199
> #3  0x00007fae906cbfa7 in
> arrow::ipc::ArrayLoader::LoadPrimitive<arrow::Int64Type>
> (this=0x7fae877fa090, type_id=arrow::Type::INT64) at
> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:241
> #4  0x00007fae906c72c7 in arrow::ipc::ArrayLoader::Visit<arrow::Int64Type>
> (this=0x7fae877fa090, type=...) at
> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:300
> #5  0x00007fae906c2bbc in arrow::VisitTypeInline<arrow::ipc::ArrayLoader>
> (type=..., visitor=0x7fae877fa090) at
> /apache-arrow-4.0.1/cpp/src/arrow/visitor_inline.h:89
> #6  0x00007fae906bd545 in arrow::ipc::ArrayLoader::LoadType
> (this=0x7fae877fa090, type=...) at
> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:166
> #7  0x00007fae906bd5f0 in arrow::ipc::ArrayLoader::Load
> (this=0x7fae877fa090, field=0x7fae5c004e38, out=0x7fae5c003f88) at
> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:176
> #8  0x00007fae906b1a92 in arrow::ipc::LoadRecordBatchSubset
> (metadata=0x7fae8ea140f4, schema=std::shared_ptr (count 2, weak 0)
> 0x7fae5c004ea8, inclusion_mask=0x0, context=..., file=0x7fae5c003e50)
>     at /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:481
> #9  0x00007fae906b24e7 in arrow::ipc::LoadRecordBatch
> (metadata=0x7fae8ea140f4, schema=std::shared_ptr (count 2, weak 0)
> 0x7fae5c004ea8, inclusion_mask=std::vector<bool> of length 0, capacity 0,
> context=..., file=0x7fae5c003e50)
>     at /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:532
> #10 0x00007fae906b35f3 in arrow::ipc::ReadRecordBatchInternal
> (metadata=..., schema=std::shared_ptr (count 2, weak 0) 0x7fae5c004ea8,
> inclusion_mask=std::vector<bool> of length 0, capacity 0, context=...,
> file=0x7fae5c003e50)
>     at /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:630
> #11 0x00007fae906bee31 in arrow::ipc::RecordBatchStreamReaderImpl::ReadNext
> (this=0x7fae5c007508, batch=0x7fae877face0) at
> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:837
> #12 0x00007fae912b7349 in scidb::ArrowReader::readObject
> (this=this@entry=0x7fae877fad80,
> name="index/0", reuse=reuse@entry=true, arrowBatch=std::shared_ptr (empty)
> 0x0) at XIndex.cpp:104
> #13 0x00007fae912b89da in scidb::XIndex::load (this=this@entry=0x7fae5c000c00,
> driver=std::shared_ptr (count 3, weak 0) 0x7fae5c003d50, query=warning:
> RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<scidb::Query,
> std::allocator<scidb::Query>, (__gnu_cxx::_Lock_policy)2>'
> warning: RTTI symbol not found for class
> 'std::_Sp_counted_ptr_inplace<scidb::Query, std::allocator<scidb::Query>,
> (__gnu_cxx::_Lock_policy)2>'
> std::shared_ptr (count 7, weak 7) 0x7fae680022d0) at XIndex.cpp:284
>
> The plug-in code (i.e., XIndex.cpp) is from here
> https://github.com/Paradigm4/bridge/tree/arrow3
>
> Thanks!
> Rares
>
> On Wed, Jun 9, 2021 at 9:53 PM Sutou Kouhei <k...@clear-code.com> wrote:
>
> > Hi,
> >
> > > Then I went back to the pre-built binaries for 3.0.0 and 4.0.0 from JFrog
> > > and the issue reappeared. I can only infer that it has to do with the way
> > > the pre-built binaries are generated...
> >
> > The pre-built binaries are the official RPM packages, right?
> >
> > They are built with the default gcc-g++ package not g++ from
> > devtoolset-3. This may be related. Could you try building
> > your program with the default gcc-g++ package?
> >
> >
> > Thanks,
> > --
> > kou
> >
> > In <calq9kxaxnyayqohuj3n0cknrbp6wbtxvj2pog7hcb0icy2r...@mail.gmail.com>
> >   "Re: C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only"
> > on Wed, 9 Jun 2021 21:39:04 -0700,
> >   Rares Vernica <rvern...@gmail.com> wrote:
> >
> > > I got the apache-arrow-4.0.1 source and compiled it with the Debug flag.
> > No
> > > segmentation fault occurred. I then removed the Debug flag and still no
> > > segmentation fault. I then tried the 4.0.0 source. Still no issues.
> > > Finally, I tried the 3.0.0 source and still no issues.
> > >
> > > Then I went back to the pre-built binaries for 3.0.0 and 4.0.0 from JFrog
> > > and the issue reappeared. I can only infer that it has to do with the way
> > > the pre-built binaries are generated...
> > >
> > > Here is how I compiled the Arrow sources on my CentOS 7.
> > >
> > > release$ cmake3 -DARROW_WITH_ZLIB=ON
> > > -DCMAKE_C_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/gcc
> > > -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/g++ ..
> > >
> > > Thanks,
> > > Rares
> > >
> > > On Tue, Jun 8, 2021 at 5:37 PM Sutou Kouhei <k...@clear-code.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> Could you try building Apache Arrow C++ with
> > >> -DCMAKE_BUILD_TYPE=Debug and get backtrace again? It will
> > >> show the source location on segmentation fault.
> > >>
> > >> Thanks,
> > >> --
> > >> kou
> > >>
> > >> In <calq9kxa8sh07shuckhka9fuzu2n87tbydlp--aahgcwkfwo...@mail.gmail.com>
> > >>   "C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only" on
> > >> Tue, 8 Jun 2021 12:01:27 -0700,
> > >>   Rares Vernica <rvern...@gmail.com> wrote:
> > >>
> > >> > Hello,
> > >> >
> > >> > We recently migrated our C++ Arrow code from 0.16 to 3.0.0. The code
> > >> works
> > >> > fine on Ubuntu, but we get a segmentation fault in CentOS while
> > reading
> > >> > Arrow Record Batch files. We can successfully read the files from
> > Python
> > >> or
> > >> > Ubuntu so the files and the writer are fine.
> > >> >
> > >> > We use Record Batch Stream Reader/Writer to read/write data to files.
> > >> > Sometimes we use GZIP to compress the streams. The migration to 3.0.0
> > was
> > >> > pretty straight forward with minimal changes to the code
> > >> >
> > >>
> > https://github.com/Paradigm4/bridge/commit/03e896e84230ddb41bfef68cde5ed9b21192a0e9
> > >> > We have an extensive test suite and all is good on Ubuntu. On CentOS
> > the
> > >> > write works OK but we get a segmentation fault during reading from
> > C++.
> > >> We
> > >> > can successfully read the files using PyArrow. Moreover, the files
> > >> written
> > >> > by CentOS can be successfully read from C++ in Ubuntu.
> > >> >
> > >> > Here is the backtrace I got form gdb when the segmentation fault
> > >> occurred:
> > >> >
> > >> > Program received signal SIGSEGV, Segmentation fault.
> > >> > [Switching to Thread 0x7f548c7fb700 (LWP 2649)]
> > >> > 0x00007f545c003340 in ?? ()
> > >> > (gdb) bt
> > >> > #0  0x00007f545c003340 in ?? ()
> > >> > #1  0x00007f54903377ce in arrow::ipc::ArrayLoader::GetBuffer(int,
> > >> > std::shared_ptr<arrow::Buffer>*) () from /lib64/libarrow.so.300
> > >> > #2  0x00007f549034006c in arrow::Status
> > >> > arrow::VisitTypeInline<arrow::ipc::ArrayLoader>(arrow::DataType
> > const&,
> > >> > arrow::ipc::ArrayLoader*) () from /lib64/libarrow.so.300
> > >> > #3  0x00007f5490340db4 in arrow::ipc::ArrayLoader::Load(arrow::Field
> > >> > const*, arrow::ArrayData*) () from /lib64/libarrow.so.300
> > >> > #4  0x00007f5490318b5b in
> > >> >
> > >>
> > arrow::ipc::LoadRecordBatchSubset(org::apache::arrow::flatbuf::RecordBatch
> > >> > const*, std::shared_ptr<arrow::Schema> const&, std::vector<bool,
> > >> > std::allocator<bool> > const*, arrow::ipc::DictionaryMemo const*,
> > >> > arrow::ipc::IpcReadOptions const&, arrow::ipc::MetadataVersion,
> > >> > arrow::Compression::type, arrow::io::RandomAccessFile*) () from
> > >> > /lib64/libarrow.so.300
> > >> > #5  0x00007f549031952a in
> > >> > arrow::ipc::LoadRecordBatch(org::apache::arrow::flatbuf::RecordBatch
> > >> > const*, std::shared_ptr<arrow::Schema> const&, std::vector<bool,
> > >> > std::allocator<bool> > const&, arrow::ipc::DictionaryMemo const*,
> > >> > arrow::ipc::IpcReadOptions const&, arrow::ipc::MetadataVersion,
> > >> > arrow::Compression::type, arrow::io::RandomAccessFile*) () from
> > >> > /lib64/libarrow.so.300
> > >> > #6  0x00007f54903197ce in
> > >> arrow::ipc::ReadRecordBatchInternal(arrow::Buffer
> > >> > const&, std::shared_ptr<arrow::Schema> const&, std::vector<bool,
> > >> > std::allocator<bool> > const&, arrow::ipc::DictionaryMemo const*,
> > >> > arrow::ipc::IpcReadOptions const&, arrow::io::RandomAccessFile*) ()
> > from
> > >> > /lib64/libarrow.so.300
> > >> > #7  0x00007f5490345d9c in
> > >> >
> > >>
> > arrow::ipc::RecordBatchStreamReaderImpl::ReadNext(std::shared_ptr<arrow::RecordBatch>*)
> > >> > () from /lib64/libarrow.so.300
> > >> > #8  0x00007f549109b479 in scidb::ArrowReader::readObject
> > >> > (this=this@entry=0x7f548c7f7d80,
> > >> > name="index/0", reuse=reuse@entry=true, arrowBatch=std::shared_ptr
> > >> (empty)
> > >> > 0x0) at XIndex.cpp:104
> > >> > #9  0x00007f549109cb0a in scidb::XIndex::load (this=this@entry
> > >> =0x7f545c003ab0,
> > >> > driver=std::shared_ptr (count 3, weak 0) 0x7f545c003e70,
> > query=warning:
> > >> > RTTI symbol not found for class
> > >> 'std::_Sp_counted_ptr_inplace<scidb::Query,
> > >> > std::allocator<scidb::Query>, (__gnu_cxx::_Lock_policy)2>'
> > >> > warning: RTTI symbol not found for class
> > >> > 'std::_Sp_counted_ptr_inplace<scidb::Query,
> > std::allocator<scidb::Query>,
> > >> > (__gnu_cxx::_Lock_policy)2>'
> > >> > std::shared_ptr (count 7, weak 7) 0x7f546c005330) at XIndex.cpp:286
> > >> >
> > >> > I also tried Arrow 4.0.0. The code compiled just fine and the behavior
> > >> was
> > >> > the same, with the same backtrace.
> > >> >
> > >> > The code where the segmentation fault occurs is trying to read a GZIP
> > >> > compressed Record Batch Stream. The file is 144 bytes and has only one
> > >> > column with three int64 values.
> > >> >
> > >> >> file 0
> > >> > 0: gzip compressed data, from Unix
> > >> >
> > >> >> stat 0
> > >> >   File: ‘0’
> > >> >   Size: 144       Blocks: 8          IO Block: 4096   regular file
> > >> > Device: 10302h/66306d Inode: 33715444    Links: 1
> > >> > Access: (0644/-rw-r--r--)  Uid: ( 1001/   scidb)   Gid: ( 1001/
> >  scidb)
> > >> > Context: unconfined_u:object_r:user_tmp_t:s0
> > >> > Access: 2021-06-08 04:42:28.653548604 +0000
> > >> > Modify: 2021-06-08 04:14:14.638927052 +0000
> > >> > Change: 2021-06-08 04:40:50.221279208 +0000
> > >> >  Birth: -
> > >> >
> > >> > In [29]: s = pyarrow.input_stream('/tmp/bridge/foo/index/0',
> > >> > compression='gzip')
> > >> > In [30]: b = pyarrow.RecordBatchStreamReader(s)
> > >> > In [31]: t = b.read_all()
> > >> > In [32]: t.columns
> > >> > Out[32]:
> > >> > [<pyarrow.lib.ChunkedArray object at 0x7fefb5a552b0>
> > >> >  [
> > >> >    [
> > >> >      0,
> > >> >      5,
> > >> >      10
> > >> >    ]
> > >> >  ]]
> > >> >
> > >> > I removed the GZIP compression in both the writer and the reader but
> > the
> > >> > issue persists. So I don't think it is because of the compression.
> > >> >
> > >> > Here is the ldd on the library file which contains the reader and
> > writers
> > >> > that use the Arrow library. It is built on a CentOS 7 with the g++
> > 4.9.2
> > >> > compiler.
> > >> >
> > >> >> ldd libbridge.so
> > >> > linux-vdso.so.1 =>  (0x00007fffe4f10000)
> > >> > libarrow.so.300 => /lib64/libarrow.so.300 (0x00007f8a38908000)
> > >> > libaws-cpp-sdk-s3.so => /opt/aws/lib64/libaws-cpp-sdk-s3.so
> > >> > (0x00007f8a384b3000)
> > >> > libm.so.6 => /lib64/libm.so.6 (0x00007f8a381b1000)
> > >> > librt.so.1 => /lib64/librt.so.1 (0x00007f8a37fa9000)
> > >> > libdl.so.2 => /lib64/libdl.so.2 (0x00007f8a37da5000)
> > >> > libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f8a37a9e000)
> > >> > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8a37888000)
> > >> > libc.so.6 => /lib64/libc.so.6 (0x00007f8a374ba000)
> > >> > libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f8a37057000)
> > >> > libssl.so.10 => /lib64/libssl.so.10 (0x00007f8a36de5000)
> > >> > libbrotlienc.so.1 => /lib64/libbrotlienc.so.1 (0x00007f8a36b58000)
> > >> > libbrotlidec.so.1 => /lib64/libbrotlidec.so.1 (0x00007f8a3694b000)
> > >> > libbrotlicommon.so.1 => /lib64/libbrotlicommon.so.1
> > (0x00007f8a3672b000)
> > >> > libutf8proc.so.1 => /lib64/libutf8proc.so.1 (0x00007f8a3647b000)
> > >> > libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f8a3626b000)
> > >> > liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f8a3605c000)
> > >> > libsnappy.so.1 => /lib64/libsnappy.so.1 (0x00007f8a35e56000)
> > >> > libz.so.1 => /lib64/libz.so.1 (0x00007f8a35c40000)
> > >> > libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f8a3593a000)
> > >> > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8a3571e000)
> > >> > /lib64/ld-linux-x86-64.so.2 (0x00007f8a39b67000)
> > >> > libaws-cpp-sdk-core.so => /opt/aws/lib64/libaws-cpp-sdk-core.so
> > >> > (0x00007f8a35413000)
> > >> > libaws-c-event-stream.so.0unstable =>
> > >> > /opt/aws/lib64/libaws-c-event-stream.so.0unstable (0x00007f8a3520b000)
> > >> > libaws-c-common.so.0unstable =>
> > >> /opt/aws/lib64/libaws-c-common.so.0unstable
> > >> > (0x00007f8a34fd9000)
> > >> > libaws-checksums.so => /opt/aws/lib64/libaws-checksums.so
> > >> > (0x00007f8a34dce000)
> > >> > libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f8a34b81000)
> > >> > libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f8a34898000)
> > >> > libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f8a34694000)
> > >> > libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f8a34461000)
> > >> > libcurl.so.4 => /opt/curl/lib/libcurl.so.4 (0x00007f8a341ea000)
> > >> > libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f8a33fda000)
> > >> > libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f8a33dd6000)
> > >> > libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f8a33bbc000)
> > >> > libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f8a33995000)
> > >> > libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f8a33733000)
> > >> >
> > >> >> /opt/rh/devtoolset-3/root/usr/bin/g++ --version
> > >> > g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
> > >> >
> > >> > Do all of these ring any bells?
> > >> >
> > >> > Thank you!
> > >> > Rares
> > >>
> >

Reply via email to