FWIW, this CI C++ build script contains what I think is a comprehensive
list of cmake options supported in Arrow along with the common default
values:
https://github.com/apache/arrow/blob/master/ci/scripts/cpp_build.sh#L47-L132

Although I am not sure if this set of default values is used to build the
official package (e.g., debug mode is enabled), I thought this script might
help in some way.

~Eduardo


On Fri, Jun 11, 2021 at 7:12 PM Rares Vernica <rvern...@gmail.com> wrote:

> Understood. Yes, if I compile it with devtoolset-3 it all works as
> expected. We ran our testsuite on it (database + plugin + Arrow) and it
> passed.
>
> Which are the options set in cmake when the official package is built? We
> built with all the options set as default and realized that Zlib
> compression was not enabled. We went back and enabled it. Which others are
> enabled in the official release package?
>
> Thank you!
> Rares
>
> On Fri, Jun 11, 2021 at 11:18 AM Antoine Pitrou <anto...@python.org>
> wrote:
>
> >
> > Le 11/06/2021 à 20:10, Wes McKinney a écrit :
> > > So this particular toolchain mix seems to be broken, does everything
> > > work if you compile Arrow, the plugin, and the core database with
> > > devtoolset-3? I think the weak link is Arrow C++ compiled with a
> > > non-devtoolset compiler toolchain.
> >
> > This "toolchain mix" concern seems potentially similar to the issue we
> > had with Tensorflow wheels that were built with a different toolchain
> > than other manylinux1 wheels, producing crashes when both PyArrow and
> > Tensorflow were loaded in memory.
> >
> > It is probably expected that the Arrow C++ package for CentOS is
> > compiled with the default compiler for that CentOS version.
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> >   If there were the package
> > > maintainer bandwidth, having both devtoolset-gcc and system-gcc
> > > pre-built RPMs would be potentially interesting (but there are so many
> > > devtoolsets, which one should you use?).
> > >
> > > On Thu, Jun 10, 2021 at 6:04 PM Rares Vernica <rvern...@gmail.com>
> > wrote:
> > >>
> > >> Yes, the pre-built binaries are the official RPM packages.
> > >>
> > >> I recompilled 4.0.1 with the default gcc-g++ from CentOS 7 and Debug
> > flag.
> > >> The segmentation fault occurred. See below for the backtrace.
> > >>
> > >> Please note that the SciDB database as well as the Plug-in where the
> > Arrow
> > >> library is used are compiled with g++ from devtoolset-3. Maybe this
> > problem
> > >> is due to the different versions of the g++ compiler being used...
> > >>
> > >> Also note that the code path that writes Arrow files work fine, it is
> > just
> > >> the path that reads the files that breaks.
> > >>
> > >>> g++ --version
> > >> g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
> > >>
> > >> Program received signal SIGSEGV, Segmentation fault.
> > >> [Switching to Thread 0x7fae877fe700 (LWP 16783)]
> > >> 0x00007fae8eb1e000 in ?? ()
> > >> (gdb) bt
> > >> #0  0x00007fae8eb1e000 in ?? ()
> > >> #1  0x00007fae906bd4d0 in arrow::ipc::ArrayLoader::ReadBuffer
> > >> (this=0x7fae877fa090, offset=0, length=24, out=0x7fae5c004010) at
> > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:163
> > >> #2  0x00007fae906bd7b8 in arrow::ipc::ArrayLoader::GetBuffer
> > >> (this=0x7fae877fa090, buffer_index=1, out=0x7fae5c004010) at
> > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:199
> > >> #3  0x00007fae906cbfa7 in
> > >> arrow::ipc::ArrayLoader::LoadPrimitive<arrow::Int64Type>
> > >> (this=0x7fae877fa090, type_id=arrow::Type::INT64) at
> > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:241
> > >> #4  0x00007fae906c72c7 in
> > arrow::ipc::ArrayLoader::Visit<arrow::Int64Type>
> > >> (this=0x7fae877fa090, type=...) at
> > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:300
> > >> #5  0x00007fae906c2bbc in
> > arrow::VisitTypeInline<arrow::ipc::ArrayLoader>
> > >> (type=..., visitor=0x7fae877fa090) at
> > >> /apache-arrow-4.0.1/cpp/src/arrow/visitor_inline.h:89
> > >> #6  0x00007fae906bd545 in arrow::ipc::ArrayLoader::LoadType
> > >> (this=0x7fae877fa090, type=...) at
> > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:166
> > >> #7  0x00007fae906bd5f0 in arrow::ipc::ArrayLoader::Load
> > >> (this=0x7fae877fa090, field=0x7fae5c004e38, out=0x7fae5c003f88) at
> > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:176
> > >> #8  0x00007fae906b1a92 in arrow::ipc::LoadRecordBatchSubset
> > >> (metadata=0x7fae8ea140f4, schema=std::shared_ptr (count 2, weak 0)
> > >> 0x7fae5c004ea8, inclusion_mask=0x0, context=..., file=0x7fae5c003e50)
> > >>      at /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:481
> > >> #9  0x00007fae906b24e7 in arrow::ipc::LoadRecordBatch
> > >> (metadata=0x7fae8ea140f4, schema=std::shared_ptr (count 2, weak 0)
> > >> 0x7fae5c004ea8, inclusion_mask=std::vector<bool> of length 0, capacity
> > 0,
> > >> context=..., file=0x7fae5c003e50)
> > >>      at /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:532
> > >> #10 0x00007fae906b35f3 in arrow::ipc::ReadRecordBatchInternal
> > >> (metadata=..., schema=std::shared_ptr (count 2, weak 0)
> 0x7fae5c004ea8,
> > >> inclusion_mask=std::vector<bool> of length 0, capacity 0, context=...,
> > >> file=0x7fae5c003e50)
> > >>      at /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:630
> > >> #11 0x00007fae906bee31 in
> > arrow::ipc::RecordBatchStreamReaderImpl::ReadNext
> > >> (this=0x7fae5c007508, batch=0x7fae877face0) at
> > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:837
> > >> #12 0x00007fae912b7349 in scidb::ArrowReader::readObject
> > >> (this=this@entry=0x7fae877fad80,
> > >> name="index/0", reuse=reuse@entry=true, arrowBatch=std::shared_ptr
> > (empty)
> > >> 0x0) at XIndex.cpp:104
> > >> #13 0x00007fae912b89da in scidb::XIndex::load (this=this@entry
> > =0x7fae5c000c00,
> > >> driver=std::shared_ptr (count 3, weak 0) 0x7fae5c003d50,
> query=warning:
> > >> RTTI symbol not found for class
> > 'std::_Sp_counted_ptr_inplace<scidb::Query,
> > >> std::allocator<scidb::Query>, (__gnu_cxx::_Lock_policy)2>'
> > >> warning: RTTI symbol not found for class
> > >> 'std::_Sp_counted_ptr_inplace<scidb::Query,
> > std::allocator<scidb::Query>,
> > >> (__gnu_cxx::_Lock_policy)2>'
> > >> std::shared_ptr (count 7, weak 7) 0x7fae680022d0) at XIndex.cpp:284
> > >>
> > >> The plug-in code (i.e., XIndex.cpp) is from here
> > >> https://github.com/Paradigm4/bridge/tree/arrow3
> > >>
> > >> Thanks!
> > >> Rares
> > >>
> > >> On Wed, Jun 9, 2021 at 9:53 PM Sutou Kouhei <k...@clear-code.com>
> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>>> Then I went back to the pre-built binaries for 3.0.0 and 4.0.0 from
> > JFrog
> > >>>> and the issue reappeared. I can only infer that it has to do with
> the
> > way
> > >>>> the pre-built binaries are generated...
> > >>>
> > >>> The pre-built binaries are the official RPM packages, right?
> > >>>
> > >>> They are built with the default gcc-g++ package not g++ from
> > >>> devtoolset-3. This may be related. Could you try building
> > >>> your program with the default gcc-g++ package?
> > >>>
> > >>>
> > >>> Thanks,
> > >>> --
> > >>> kou
> > >>>
> > >>> In <
> calq9kxaxnyayqohuj3n0cknrbp6wbtxvj2pog7hcb0icy2r...@mail.gmail.com
> > >
> > >>>    "Re: C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS
> > only"
> > >>> on Wed, 9 Jun 2021 21:39:04 -0700,
> > >>>    Rares Vernica <rvern...@gmail.com> wrote:
> > >>>
> > >>>> I got the apache-arrow-4.0.1 source and compiled it with the Debug
> > flag.
> > >>> No
> > >>>> segmentation fault occurred. I then removed the Debug flag and still
> > no
> > >>>> segmentation fault. I then tried the 4.0.0 source. Still no issues.
> > >>>> Finally, I tried the 3.0.0 source and still no issues.
> > >>>>
> > >>>> Then I went back to the pre-built binaries for 3.0.0 and 4.0.0 from
> > JFrog
> > >>>> and the issue reappeared. I can only infer that it has to do with
> the
> > way
> > >>>> the pre-built binaries are generated...
> > >>>>
> > >>>> Here is how I compiled the Arrow sources on my CentOS 7.
> > >>>>
> > >>>> release$ cmake3 -DARROW_WITH_ZLIB=ON
> > >>>> -DCMAKE_C_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/gcc
> > >>>> -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/g++ ..
> > >>>>
> > >>>> Thanks,
> > >>>> Rares
> > >>>>
> > >>>> On Tue, Jun 8, 2021 at 5:37 PM Sutou Kouhei <k...@clear-code.com>
> > wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> Could you try building Apache Arrow C++ with
> > >>>>> -DCMAKE_BUILD_TYPE=Debug and get backtrace again? It will
> > >>>>> show the source location on segmentation fault.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> --
> > >>>>> kou
> > >>>>>
> > >>>>> In <
> > calq9kxa8sh07shuckhka9fuzu2n87tbydlp--aahgcwkfwo...@mail.gmail.com>
> > >>>>>    "C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS
> > only" on
> > >>>>> Tue, 8 Jun 2021 12:01:27 -0700,
> > >>>>>    Rares Vernica <rvern...@gmail.com> wrote:
> > >>>>>
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> We recently migrated our C++ Arrow code from 0.16 to 3.0.0. The
> code
> > >>>>> works
> > >>>>>> fine on Ubuntu, but we get a segmentation fault in CentOS while
> > >>> reading
> > >>>>>> Arrow Record Batch files. We can successfully read the files from
> > >>> Python
> > >>>>> or
> > >>>>>> Ubuntu so the files and the writer are fine.
> > >>>>>>
> > >>>>>> We use Record Batch Stream Reader/Writer to read/write data to
> > files.
> > >>>>>> Sometimes we use GZIP to compress the streams. The migration to
> > 3.0.0
> > >>> was
> > >>>>>> pretty straight forward with minimal changes to the code
> > >>>>>>
> > >>>>>
> > >>>
> >
> https://github.com/Paradigm4/bridge/commit/03e896e84230ddb41bfef68cde5ed9b21192a0e9
> > >>>>>> We have an extensive test suite and all is good on Ubuntu. On
> CentOS
> > >>> the
> > >>>>>> write works OK but we get a segmentation fault during reading from
> > >>> C++.
> > >>>>> We
> > >>>>>> can successfully read the files using PyArrow. Moreover, the files
> > >>>>> written
> > >>>>>> by CentOS can be successfully read from C++ in Ubuntu.
> > >>>>>>
> > >>>>>> Here is the backtrace I got form gdb when the segmentation fault
> > >>>>> occurred:
> > >>>>>>
> > >>>>>> Program received signal SIGSEGV, Segmentation fault.
> > >>>>>> [Switching to Thread 0x7f548c7fb700 (LWP 2649)]
> > >>>>>> 0x00007f545c003340 in ?? ()
> > >>>>>> (gdb) bt
> > >>>>>> #0  0x00007f545c003340 in ?? ()
> > >>>>>> #1  0x00007f54903377ce in arrow::ipc::ArrayLoader::GetBuffer(int,
> > >>>>>> std::shared_ptr<arrow::Buffer>*) () from /lib64/libarrow.so.300
> > >>>>>> #2  0x00007f549034006c in arrow::Status
> > >>>>>> arrow::VisitTypeInline<arrow::ipc::ArrayLoader>(arrow::DataType
> > >>> const&,
> > >>>>>> arrow::ipc::ArrayLoader*) () from /lib64/libarrow.so.300
> > >>>>>> #3  0x00007f5490340db4 in
> arrow::ipc::ArrayLoader::Load(arrow::Field
> > >>>>>> const*, arrow::ArrayData*) () from /lib64/libarrow.so.300
> > >>>>>> #4  0x00007f5490318b5b in
> > >>>>>>
> > >>>>>
> > >>>
> >
> arrow::ipc::LoadRecordBatchSubset(org::apache::arrow::flatbuf::RecordBatch
> > >>>>>> const*, std::shared_ptr<arrow::Schema> const&, std::vector<bool,
> > >>>>>> std::allocator<bool> > const*, arrow::ipc::DictionaryMemo const*,
> > >>>>>> arrow::ipc::IpcReadOptions const&, arrow::ipc::MetadataVersion,
> > >>>>>> arrow::Compression::type, arrow::io::RandomAccessFile*) () from
> > >>>>>> /lib64/libarrow.so.300
> > >>>>>> #5  0x00007f549031952a in
> > >>>>>>
> arrow::ipc::LoadRecordBatch(org::apache::arrow::flatbuf::RecordBatch
> > >>>>>> const*, std::shared_ptr<arrow::Schema> const&, std::vector<bool,
> > >>>>>> std::allocator<bool> > const&, arrow::ipc::DictionaryMemo const*,
> > >>>>>> arrow::ipc::IpcReadOptions const&, arrow::ipc::MetadataVersion,
> > >>>>>> arrow::Compression::type, arrow::io::RandomAccessFile*) () from
> > >>>>>> /lib64/libarrow.so.300
> > >>>>>> #6  0x00007f54903197ce in
> > >>>>> arrow::ipc::ReadRecordBatchInternal(arrow::Buffer
> > >>>>>> const&, std::shared_ptr<arrow::Schema> const&, std::vector<bool,
> > >>>>>> std::allocator<bool> > const&, arrow::ipc::DictionaryMemo const*,
> > >>>>>> arrow::ipc::IpcReadOptions const&, arrow::io::RandomAccessFile*)
> ()
> > >>> from
> > >>>>>> /lib64/libarrow.so.300
> > >>>>>> #7  0x00007f5490345d9c in
> > >>>>>>
> > >>>>>
> > >>>
> >
> arrow::ipc::RecordBatchStreamReaderImpl::ReadNext(std::shared_ptr<arrow::RecordBatch>*)
> > >>>>>> () from /lib64/libarrow.so.300
> > >>>>>> #8  0x00007f549109b479 in scidb::ArrowReader::readObject
> > >>>>>> (this=this@entry=0x7f548c7f7d80,
> > >>>>>> name="index/0", reuse=reuse@entry=true,
> arrowBatch=std::shared_ptr
> > >>>>> (empty)
> > >>>>>> 0x0) at XIndex.cpp:104
> > >>>>>> #9  0x00007f549109cb0a in scidb::XIndex::load (this=this@entry
> > >>>>> =0x7f545c003ab0,
> > >>>>>> driver=std::shared_ptr (count 3, weak 0) 0x7f545c003e70,
> > >>> query=warning:
> > >>>>>> RTTI symbol not found for class
> > >>>>> 'std::_Sp_counted_ptr_inplace<scidb::Query,
> > >>>>>> std::allocator<scidb::Query>, (__gnu_cxx::_Lock_policy)2>'
> > >>>>>> warning: RTTI symbol not found for class
> > >>>>>> 'std::_Sp_counted_ptr_inplace<scidb::Query,
> > >>> std::allocator<scidb::Query>,
> > >>>>>> (__gnu_cxx::_Lock_policy)2>'
> > >>>>>> std::shared_ptr (count 7, weak 7) 0x7f546c005330) at
> XIndex.cpp:286
> > >>>>>>
> > >>>>>> I also tried Arrow 4.0.0. The code compiled just fine and the
> > behavior
> > >>>>> was
> > >>>>>> the same, with the same backtrace.
> > >>>>>>
> > >>>>>> The code where the segmentation fault occurs is trying to read a
> > GZIP
> > >>>>>> compressed Record Batch Stream. The file is 144 bytes and has only
> > one
> > >>>>>> column with three int64 values.
> > >>>>>>
> > >>>>>>> file 0
> > >>>>>> 0: gzip compressed data, from Unix
> > >>>>>>
> > >>>>>>> stat 0
> > >>>>>>    File: ‘0’
> > >>>>>>    Size: 144       Blocks: 8          IO Block: 4096   regular
> file
> > >>>>>> Device: 10302h/66306d Inode: 33715444    Links: 1
> > >>>>>> Access: (0644/-rw-r--r--)  Uid: ( 1001/   scidb)   Gid: ( 1001/
> > >>>   scidb)
> > >>>>>> Context: unconfined_u:object_r:user_tmp_t:s0
> > >>>>>> Access: 2021-06-08 04:42:28.653548604 +0000
> > >>>>>> Modify: 2021-06-08 04:14:14.638927052 +0000
> > >>>>>> Change: 2021-06-08 04:40:50.221279208 +0000
> > >>>>>>   Birth: -
> > >>>>>>
> > >>>>>> In [29]: s = pyarrow.input_stream('/tmp/bridge/foo/index/0',
> > >>>>>> compression='gzip')
> > >>>>>> In [30]: b = pyarrow.RecordBatchStreamReader(s)
> > >>>>>> In [31]: t = b.read_all()
> > >>>>>> In [32]: t.columns
> > >>>>>> Out[32]:
> > >>>>>> [<pyarrow.lib.ChunkedArray object at 0x7fefb5a552b0>
> > >>>>>>   [
> > >>>>>>     [
> > >>>>>>       0,
> > >>>>>>       5,
> > >>>>>>       10
> > >>>>>>     ]
> > >>>>>>   ]]
> > >>>>>>
> > >>>>>> I removed the GZIP compression in both the writer and the reader
> but
> > >>> the
> > >>>>>> issue persists. So I don't think it is because of the compression.
> > >>>>>>
> > >>>>>> Here is the ldd on the library file which contains the reader and
> > >>> writers
> > >>>>>> that use the Arrow library. It is built on a CentOS 7 with the g++
> > >>> 4.9.2
> > >>>>>> compiler.
> > >>>>>>
> > >>>>>>> ldd libbridge.so
> > >>>>>> linux-vdso.so.1 =>  (0x00007fffe4f10000)
> > >>>>>> libarrow.so.300 => /lib64/libarrow.so.300 (0x00007f8a38908000)
> > >>>>>> libaws-cpp-sdk-s3.so => /opt/aws/lib64/libaws-cpp-sdk-s3.so
> > >>>>>> (0x00007f8a384b3000)
> > >>>>>> libm.so.6 => /lib64/libm.so.6 (0x00007f8a381b1000)
> > >>>>>> librt.so.1 => /lib64/librt.so.1 (0x00007f8a37fa9000)
> > >>>>>> libdl.so.2 => /lib64/libdl.so.2 (0x00007f8a37da5000)
> > >>>>>> libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f8a37a9e000)
> > >>>>>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8a37888000)
> > >>>>>> libc.so.6 => /lib64/libc.so.6 (0x00007f8a374ba000)
> > >>>>>> libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f8a37057000)
> > >>>>>> libssl.so.10 => /lib64/libssl.so.10 (0x00007f8a36de5000)
> > >>>>>> libbrotlienc.so.1 => /lib64/libbrotlienc.so.1 (0x00007f8a36b58000)
> > >>>>>> libbrotlidec.so.1 => /lib64/libbrotlidec.so.1 (0x00007f8a3694b000)
> > >>>>>> libbrotlicommon.so.1 => /lib64/libbrotlicommon.so.1
> > >>> (0x00007f8a3672b000)
> > >>>>>> libutf8proc.so.1 => /lib64/libutf8proc.so.1 (0x00007f8a3647b000)
> > >>>>>> libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f8a3626b000)
> > >>>>>> liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f8a3605c000)
> > >>>>>> libsnappy.so.1 => /lib64/libsnappy.so.1 (0x00007f8a35e56000)
> > >>>>>> libz.so.1 => /lib64/libz.so.1 (0x00007f8a35c40000)
> > >>>>>> libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f8a3593a000)
> > >>>>>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8a3571e000)
> > >>>>>> /lib64/ld-linux-x86-64.so.2 (0x00007f8a39b67000)
> > >>>>>> libaws-cpp-sdk-core.so => /opt/aws/lib64/libaws-cpp-sdk-core.so
> > >>>>>> (0x00007f8a35413000)
> > >>>>>> libaws-c-event-stream.so.0unstable =>
> > >>>>>> /opt/aws/lib64/libaws-c-event-stream.so.0unstable
> > (0x00007f8a3520b000)
> > >>>>>> libaws-c-common.so.0unstable =>
> > >>>>> /opt/aws/lib64/libaws-c-common.so.0unstable
> > >>>>>> (0x00007f8a34fd9000)
> > >>>>>> libaws-checksums.so => /opt/aws/lib64/libaws-checksums.so
> > >>>>>> (0x00007f8a34dce000)
> > >>>>>> libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2
> > (0x00007f8a34b81000)
> > >>>>>> libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f8a34898000)
> > >>>>>> libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f8a34694000)
> > >>>>>> libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f8a34461000)
> > >>>>>> libcurl.so.4 => /opt/curl/lib/libcurl.so.4 (0x00007f8a341ea000)
> > >>>>>> libkrb5support.so.0 => /lib64/libkrb5support.so.0
> > (0x00007f8a33fda000)
> > >>>>>> libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f8a33dd6000)
> > >>>>>> libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f8a33bbc000)
> > >>>>>> libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f8a33995000)
> > >>>>>> libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f8a33733000)
> > >>>>>>
> > >>>>>>> /opt/rh/devtoolset-3/root/usr/bin/g++ --version
> > >>>>>> g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
> > >>>>>>
> > >>>>>> Do all of these ring any bells?
> > >>>>>>
> > >>>>>> Thank you!
> > >>>>>> Rares
> > >>>>>
> > >>>
> >
>

Reply via email to