FWIW, this CI C++ build script contains what I think is a comprehensive list of cmake options supported in Arrow along with the common default values: https://github.com/apache/arrow/blob/master/ci/scripts/cpp_build.sh#L47-L132
Although I am not sure if this set of default values is used to build the official package (e.g., debug mode is enabled), I thought this script might help in some way. ~Eduardo On Fri, Jun 11, 2021 at 7:12 PM Rares Vernica <rvern...@gmail.com> wrote: > Understood. Yes, if I compile it with devtoolset-3 it all works as > expected. We ran our testsuite on it (database + plugin + Arrow) and it > passed. > > Which are the options set in cmake when the official package is built? We > built with all the options set as default and realized that Zlib > compression was not enabled. We went back and enabled it. Which others are > enabled in the official release package? > > Thank you! > Rares > > On Fri, Jun 11, 2021 at 11:18 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > Le 11/06/2021 à 20:10, Wes McKinney a écrit : > > > So this particular toolchain mix seems to be broken, does everything > > > work if you compile Arrow, the plugin, and the core database with > > > devtoolset-3? I think the weak link is Arrow C++ compiled with a > > > non-devtoolset compiler toolchain. > > > > This "toolchain mix" concern seems potentially similar to the issue we > > had with Tensorflow wheels that were built with a different toolchain > > than other manylinux1 wheels, producing crashes when both PyArrow and > > Tensorflow were loaded in memory. > > > > It is probably expected that the Arrow C++ package for CentOS is > > compiled with the default compiler for that CentOS version. > > > > Regards > > > > Antoine. > > > > > > > > If there were the package > > > maintainer bandwidth, having both devtoolset-gcc and system-gcc > > > pre-built RPMs would be potentially interesting (but there are so many > > > devtoolsets, which one should you use?). > > > > > > On Thu, Jun 10, 2021 at 6:04 PM Rares Vernica <rvern...@gmail.com> > > wrote: > > >> > > >> Yes, the pre-built binaries are the official RPM packages. > > >> > > >> I recompilled 4.0.1 with the default gcc-g++ from CentOS 7 and Debug > > flag. > > >> The segmentation fault occurred. See below for the backtrace. > > >> > > >> Please note that the SciDB database as well as the Plug-in where the > > Arrow > > >> library is used are compiled with g++ from devtoolset-3. Maybe this > > problem > > >> is due to the different versions of the g++ compiler being used... > > >> > > >> Also note that the code path that writes Arrow files work fine, it is > > just > > >> the path that reads the files that breaks. > > >> > > >>> g++ --version > > >> g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) > > >> > > >> Program received signal SIGSEGV, Segmentation fault. > > >> [Switching to Thread 0x7fae877fe700 (LWP 16783)] > > >> 0x00007fae8eb1e000 in ?? () > > >> (gdb) bt > > >> #0 0x00007fae8eb1e000 in ?? () > > >> #1 0x00007fae906bd4d0 in arrow::ipc::ArrayLoader::ReadBuffer > > >> (this=0x7fae877fa090, offset=0, length=24, out=0x7fae5c004010) at > > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:163 > > >> #2 0x00007fae906bd7b8 in arrow::ipc::ArrayLoader::GetBuffer > > >> (this=0x7fae877fa090, buffer_index=1, out=0x7fae5c004010) at > > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:199 > > >> #3 0x00007fae906cbfa7 in > > >> arrow::ipc::ArrayLoader::LoadPrimitive<arrow::Int64Type> > > >> (this=0x7fae877fa090, type_id=arrow::Type::INT64) at > > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:241 > > >> #4 0x00007fae906c72c7 in > > arrow::ipc::ArrayLoader::Visit<arrow::Int64Type> > > >> (this=0x7fae877fa090, type=...) at > > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:300 > > >> #5 0x00007fae906c2bbc in > > arrow::VisitTypeInline<arrow::ipc::ArrayLoader> > > >> (type=..., visitor=0x7fae877fa090) at > > >> /apache-arrow-4.0.1/cpp/src/arrow/visitor_inline.h:89 > > >> #6 0x00007fae906bd545 in arrow::ipc::ArrayLoader::LoadType > > >> (this=0x7fae877fa090, type=...) at > > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:166 > > >> #7 0x00007fae906bd5f0 in arrow::ipc::ArrayLoader::Load > > >> (this=0x7fae877fa090, field=0x7fae5c004e38, out=0x7fae5c003f88) at > > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:176 > > >> #8 0x00007fae906b1a92 in arrow::ipc::LoadRecordBatchSubset > > >> (metadata=0x7fae8ea140f4, schema=std::shared_ptr (count 2, weak 0) > > >> 0x7fae5c004ea8, inclusion_mask=0x0, context=..., file=0x7fae5c003e50) > > >> at /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:481 > > >> #9 0x00007fae906b24e7 in arrow::ipc::LoadRecordBatch > > >> (metadata=0x7fae8ea140f4, schema=std::shared_ptr (count 2, weak 0) > > >> 0x7fae5c004ea8, inclusion_mask=std::vector<bool> of length 0, capacity > > 0, > > >> context=..., file=0x7fae5c003e50) > > >> at /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:532 > > >> #10 0x00007fae906b35f3 in arrow::ipc::ReadRecordBatchInternal > > >> (metadata=..., schema=std::shared_ptr (count 2, weak 0) > 0x7fae5c004ea8, > > >> inclusion_mask=std::vector<bool> of length 0, capacity 0, context=..., > > >> file=0x7fae5c003e50) > > >> at /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:630 > > >> #11 0x00007fae906bee31 in > > arrow::ipc::RecordBatchStreamReaderImpl::ReadNext > > >> (this=0x7fae5c007508, batch=0x7fae877face0) at > > >> /apache-arrow-4.0.1/cpp/src/arrow/ipc/reader.cc:837 > > >> #12 0x00007fae912b7349 in scidb::ArrowReader::readObject > > >> (this=this@entry=0x7fae877fad80, > > >> name="index/0", reuse=reuse@entry=true, arrowBatch=std::shared_ptr > > (empty) > > >> 0x0) at XIndex.cpp:104 > > >> #13 0x00007fae912b89da in scidb::XIndex::load (this=this@entry > > =0x7fae5c000c00, > > >> driver=std::shared_ptr (count 3, weak 0) 0x7fae5c003d50, > query=warning: > > >> RTTI symbol not found for class > > 'std::_Sp_counted_ptr_inplace<scidb::Query, > > >> std::allocator<scidb::Query>, (__gnu_cxx::_Lock_policy)2>' > > >> warning: RTTI symbol not found for class > > >> 'std::_Sp_counted_ptr_inplace<scidb::Query, > > std::allocator<scidb::Query>, > > >> (__gnu_cxx::_Lock_policy)2>' > > >> std::shared_ptr (count 7, weak 7) 0x7fae680022d0) at XIndex.cpp:284 > > >> > > >> The plug-in code (i.e., XIndex.cpp) is from here > > >> https://github.com/Paradigm4/bridge/tree/arrow3 > > >> > > >> Thanks! > > >> Rares > > >> > > >> On Wed, Jun 9, 2021 at 9:53 PM Sutou Kouhei <k...@clear-code.com> > wrote: > > >> > > >>> Hi, > > >>> > > >>>> Then I went back to the pre-built binaries for 3.0.0 and 4.0.0 from > > JFrog > > >>>> and the issue reappeared. I can only infer that it has to do with > the > > way > > >>>> the pre-built binaries are generated... > > >>> > > >>> The pre-built binaries are the official RPM packages, right? > > >>> > > >>> They are built with the default gcc-g++ package not g++ from > > >>> devtoolset-3. This may be related. Could you try building > > >>> your program with the default gcc-g++ package? > > >>> > > >>> > > >>> Thanks, > > >>> -- > > >>> kou > > >>> > > >>> In < > calq9kxaxnyayqohuj3n0cknrbp6wbtxvj2pog7hcb0icy2r...@mail.gmail.com > > > > > >>> "Re: C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS > > only" > > >>> on Wed, 9 Jun 2021 21:39:04 -0700, > > >>> Rares Vernica <rvern...@gmail.com> wrote: > > >>> > > >>>> I got the apache-arrow-4.0.1 source and compiled it with the Debug > > flag. > > >>> No > > >>>> segmentation fault occurred. I then removed the Debug flag and still > > no > > >>>> segmentation fault. I then tried the 4.0.0 source. Still no issues. > > >>>> Finally, I tried the 3.0.0 source and still no issues. > > >>>> > > >>>> Then I went back to the pre-built binaries for 3.0.0 and 4.0.0 from > > JFrog > > >>>> and the issue reappeared. I can only infer that it has to do with > the > > way > > >>>> the pre-built binaries are generated... > > >>>> > > >>>> Here is how I compiled the Arrow sources on my CentOS 7. > > >>>> > > >>>> release$ cmake3 -DARROW_WITH_ZLIB=ON > > >>>> -DCMAKE_C_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/gcc > > >>>> -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/g++ .. > > >>>> > > >>>> Thanks, > > >>>> Rares > > >>>> > > >>>> On Tue, Jun 8, 2021 at 5:37 PM Sutou Kouhei <k...@clear-code.com> > > wrote: > > >>>> > > >>>>> Hi, > > >>>>> > > >>>>> Could you try building Apache Arrow C++ with > > >>>>> -DCMAKE_BUILD_TYPE=Debug and get backtrace again? It will > > >>>>> show the source location on segmentation fault. > > >>>>> > > >>>>> Thanks, > > >>>>> -- > > >>>>> kou > > >>>>> > > >>>>> In < > > calq9kxa8sh07shuckhka9fuzu2n87tbydlp--aahgcwkfwo...@mail.gmail.com> > > >>>>> "C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS > > only" on > > >>>>> Tue, 8 Jun 2021 12:01:27 -0700, > > >>>>> Rares Vernica <rvern...@gmail.com> wrote: > > >>>>> > > >>>>>> Hello, > > >>>>>> > > >>>>>> We recently migrated our C++ Arrow code from 0.16 to 3.0.0. The > code > > >>>>> works > > >>>>>> fine on Ubuntu, but we get a segmentation fault in CentOS while > > >>> reading > > >>>>>> Arrow Record Batch files. We can successfully read the files from > > >>> Python > > >>>>> or > > >>>>>> Ubuntu so the files and the writer are fine. > > >>>>>> > > >>>>>> We use Record Batch Stream Reader/Writer to read/write data to > > files. > > >>>>>> Sometimes we use GZIP to compress the streams. The migration to > > 3.0.0 > > >>> was > > >>>>>> pretty straight forward with minimal changes to the code > > >>>>>> > > >>>>> > > >>> > > > https://github.com/Paradigm4/bridge/commit/03e896e84230ddb41bfef68cde5ed9b21192a0e9 > > >>>>>> We have an extensive test suite and all is good on Ubuntu. On > CentOS > > >>> the > > >>>>>> write works OK but we get a segmentation fault during reading from > > >>> C++. > > >>>>> We > > >>>>>> can successfully read the files using PyArrow. Moreover, the files > > >>>>> written > > >>>>>> by CentOS can be successfully read from C++ in Ubuntu. > > >>>>>> > > >>>>>> Here is the backtrace I got form gdb when the segmentation fault > > >>>>> occurred: > > >>>>>> > > >>>>>> Program received signal SIGSEGV, Segmentation fault. > > >>>>>> [Switching to Thread 0x7f548c7fb700 (LWP 2649)] > > >>>>>> 0x00007f545c003340 in ?? () > > >>>>>> (gdb) bt > > >>>>>> #0 0x00007f545c003340 in ?? () > > >>>>>> #1 0x00007f54903377ce in arrow::ipc::ArrayLoader::GetBuffer(int, > > >>>>>> std::shared_ptr<arrow::Buffer>*) () from /lib64/libarrow.so.300 > > >>>>>> #2 0x00007f549034006c in arrow::Status > > >>>>>> arrow::VisitTypeInline<arrow::ipc::ArrayLoader>(arrow::DataType > > >>> const&, > > >>>>>> arrow::ipc::ArrayLoader*) () from /lib64/libarrow.so.300 > > >>>>>> #3 0x00007f5490340db4 in > arrow::ipc::ArrayLoader::Load(arrow::Field > > >>>>>> const*, arrow::ArrayData*) () from /lib64/libarrow.so.300 > > >>>>>> #4 0x00007f5490318b5b in > > >>>>>> > > >>>>> > > >>> > > > arrow::ipc::LoadRecordBatchSubset(org::apache::arrow::flatbuf::RecordBatch > > >>>>>> const*, std::shared_ptr<arrow::Schema> const&, std::vector<bool, > > >>>>>> std::allocator<bool> > const*, arrow::ipc::DictionaryMemo const*, > > >>>>>> arrow::ipc::IpcReadOptions const&, arrow::ipc::MetadataVersion, > > >>>>>> arrow::Compression::type, arrow::io::RandomAccessFile*) () from > > >>>>>> /lib64/libarrow.so.300 > > >>>>>> #5 0x00007f549031952a in > > >>>>>> > arrow::ipc::LoadRecordBatch(org::apache::arrow::flatbuf::RecordBatch > > >>>>>> const*, std::shared_ptr<arrow::Schema> const&, std::vector<bool, > > >>>>>> std::allocator<bool> > const&, arrow::ipc::DictionaryMemo const*, > > >>>>>> arrow::ipc::IpcReadOptions const&, arrow::ipc::MetadataVersion, > > >>>>>> arrow::Compression::type, arrow::io::RandomAccessFile*) () from > > >>>>>> /lib64/libarrow.so.300 > > >>>>>> #6 0x00007f54903197ce in > > >>>>> arrow::ipc::ReadRecordBatchInternal(arrow::Buffer > > >>>>>> const&, std::shared_ptr<arrow::Schema> const&, std::vector<bool, > > >>>>>> std::allocator<bool> > const&, arrow::ipc::DictionaryMemo const*, > > >>>>>> arrow::ipc::IpcReadOptions const&, arrow::io::RandomAccessFile*) > () > > >>> from > > >>>>>> /lib64/libarrow.so.300 > > >>>>>> #7 0x00007f5490345d9c in > > >>>>>> > > >>>>> > > >>> > > > arrow::ipc::RecordBatchStreamReaderImpl::ReadNext(std::shared_ptr<arrow::RecordBatch>*) > > >>>>>> () from /lib64/libarrow.so.300 > > >>>>>> #8 0x00007f549109b479 in scidb::ArrowReader::readObject > > >>>>>> (this=this@entry=0x7f548c7f7d80, > > >>>>>> name="index/0", reuse=reuse@entry=true, > arrowBatch=std::shared_ptr > > >>>>> (empty) > > >>>>>> 0x0) at XIndex.cpp:104 > > >>>>>> #9 0x00007f549109cb0a in scidb::XIndex::load (this=this@entry > > >>>>> =0x7f545c003ab0, > > >>>>>> driver=std::shared_ptr (count 3, weak 0) 0x7f545c003e70, > > >>> query=warning: > > >>>>>> RTTI symbol not found for class > > >>>>> 'std::_Sp_counted_ptr_inplace<scidb::Query, > > >>>>>> std::allocator<scidb::Query>, (__gnu_cxx::_Lock_policy)2>' > > >>>>>> warning: RTTI symbol not found for class > > >>>>>> 'std::_Sp_counted_ptr_inplace<scidb::Query, > > >>> std::allocator<scidb::Query>, > > >>>>>> (__gnu_cxx::_Lock_policy)2>' > > >>>>>> std::shared_ptr (count 7, weak 7) 0x7f546c005330) at > XIndex.cpp:286 > > >>>>>> > > >>>>>> I also tried Arrow 4.0.0. The code compiled just fine and the > > behavior > > >>>>> was > > >>>>>> the same, with the same backtrace. > > >>>>>> > > >>>>>> The code where the segmentation fault occurs is trying to read a > > GZIP > > >>>>>> compressed Record Batch Stream. The file is 144 bytes and has only > > one > > >>>>>> column with three int64 values. > > >>>>>> > > >>>>>>> file 0 > > >>>>>> 0: gzip compressed data, from Unix > > >>>>>> > > >>>>>>> stat 0 > > >>>>>> File: ‘0’ > > >>>>>> Size: 144 Blocks: 8 IO Block: 4096 regular > file > > >>>>>> Device: 10302h/66306d Inode: 33715444 Links: 1 > > >>>>>> Access: (0644/-rw-r--r--) Uid: ( 1001/ scidb) Gid: ( 1001/ > > >>> scidb) > > >>>>>> Context: unconfined_u:object_r:user_tmp_t:s0 > > >>>>>> Access: 2021-06-08 04:42:28.653548604 +0000 > > >>>>>> Modify: 2021-06-08 04:14:14.638927052 +0000 > > >>>>>> Change: 2021-06-08 04:40:50.221279208 +0000 > > >>>>>> Birth: - > > >>>>>> > > >>>>>> In [29]: s = pyarrow.input_stream('/tmp/bridge/foo/index/0', > > >>>>>> compression='gzip') > > >>>>>> In [30]: b = pyarrow.RecordBatchStreamReader(s) > > >>>>>> In [31]: t = b.read_all() > > >>>>>> In [32]: t.columns > > >>>>>> Out[32]: > > >>>>>> [<pyarrow.lib.ChunkedArray object at 0x7fefb5a552b0> > > >>>>>> [ > > >>>>>> [ > > >>>>>> 0, > > >>>>>> 5, > > >>>>>> 10 > > >>>>>> ] > > >>>>>> ]] > > >>>>>> > > >>>>>> I removed the GZIP compression in both the writer and the reader > but > > >>> the > > >>>>>> issue persists. So I don't think it is because of the compression. > > >>>>>> > > >>>>>> Here is the ldd on the library file which contains the reader and > > >>> writers > > >>>>>> that use the Arrow library. It is built on a CentOS 7 with the g++ > > >>> 4.9.2 > > >>>>>> compiler. > > >>>>>> > > >>>>>>> ldd libbridge.so > > >>>>>> linux-vdso.so.1 => (0x00007fffe4f10000) > > >>>>>> libarrow.so.300 => /lib64/libarrow.so.300 (0x00007f8a38908000) > > >>>>>> libaws-cpp-sdk-s3.so => /opt/aws/lib64/libaws-cpp-sdk-s3.so > > >>>>>> (0x00007f8a384b3000) > > >>>>>> libm.so.6 => /lib64/libm.so.6 (0x00007f8a381b1000) > > >>>>>> librt.so.1 => /lib64/librt.so.1 (0x00007f8a37fa9000) > > >>>>>> libdl.so.2 => /lib64/libdl.so.2 (0x00007f8a37da5000) > > >>>>>> libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f8a37a9e000) > > >>>>>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8a37888000) > > >>>>>> libc.so.6 => /lib64/libc.so.6 (0x00007f8a374ba000) > > >>>>>> libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f8a37057000) > > >>>>>> libssl.so.10 => /lib64/libssl.so.10 (0x00007f8a36de5000) > > >>>>>> libbrotlienc.so.1 => /lib64/libbrotlienc.so.1 (0x00007f8a36b58000) > > >>>>>> libbrotlidec.so.1 => /lib64/libbrotlidec.so.1 (0x00007f8a3694b000) > > >>>>>> libbrotlicommon.so.1 => /lib64/libbrotlicommon.so.1 > > >>> (0x00007f8a3672b000) > > >>>>>> libutf8proc.so.1 => /lib64/libutf8proc.so.1 (0x00007f8a3647b000) > > >>>>>> libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f8a3626b000) > > >>>>>> liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f8a3605c000) > > >>>>>> libsnappy.so.1 => /lib64/libsnappy.so.1 (0x00007f8a35e56000) > > >>>>>> libz.so.1 => /lib64/libz.so.1 (0x00007f8a35c40000) > > >>>>>> libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f8a3593a000) > > >>>>>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8a3571e000) > > >>>>>> /lib64/ld-linux-x86-64.so.2 (0x00007f8a39b67000) > > >>>>>> libaws-cpp-sdk-core.so => /opt/aws/lib64/libaws-cpp-sdk-core.so > > >>>>>> (0x00007f8a35413000) > > >>>>>> libaws-c-event-stream.so.0unstable => > > >>>>>> /opt/aws/lib64/libaws-c-event-stream.so.0unstable > > (0x00007f8a3520b000) > > >>>>>> libaws-c-common.so.0unstable => > > >>>>> /opt/aws/lib64/libaws-c-common.so.0unstable > > >>>>>> (0x00007f8a34fd9000) > > >>>>>> libaws-checksums.so => /opt/aws/lib64/libaws-checksums.so > > >>>>>> (0x00007f8a34dce000) > > >>>>>> libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 > > (0x00007f8a34b81000) > > >>>>>> libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f8a34898000) > > >>>>>> libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f8a34694000) > > >>>>>> libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f8a34461000) > > >>>>>> libcurl.so.4 => /opt/curl/lib/libcurl.so.4 (0x00007f8a341ea000) > > >>>>>> libkrb5support.so.0 => /lib64/libkrb5support.so.0 > > (0x00007f8a33fda000) > > >>>>>> libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f8a33dd6000) > > >>>>>> libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f8a33bbc000) > > >>>>>> libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f8a33995000) > > >>>>>> libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f8a33733000) > > >>>>>> > > >>>>>>> /opt/rh/devtoolset-3/root/usr/bin/g++ --version > > >>>>>> g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6) > > >>>>>> > > >>>>>> Do all of these ring any bells? > > >>>>>> > > >>>>>> Thank you! > > >>>>>> Rares > > >>>>> > > >>> > > >