[ https://issues.apache.org/jira/browse/KUDU-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855968#comment-17855968 ]
ASF subversion and git services commented on KUDU-3580: ------------------------------------------------------- Commit 1474380f5ccfd2f7e78756488d12eb52d2664132 in kudu's branch refs/heads/master from Yingchun Lai [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=1474380f5 ] KUDU-3580 Fix the crash caused when binaries run on older CPU machines After Kudu linking rocksdb, the Kudu binaries may crash with error "Illegal instruction" when running on machines which don't support newer CPU instruction (e.g. AVX512) but were built on a machine which supports. This patch enables the PORTABLE [1] option when building librocksdb to fix the issue. It should be noted that portable libraries may cause a slight performance degradation, it's recommend to disable portable option (by setting PORTABLE environment variable to OFF when build Kudu thirdparties) if there is no port requirements. The PORTABLE option only takes effect on librocksdb currently, the following content shows the comparation of the 'db_bench' tool of RocksDB with the '-DPORTABLE' option enabled and disabled benchmark results: - The test is similar to Kudu use case, random write and sequential read, key and value size is about 40 bytes. - The tests ran 3 times. - The binaries are built and run on the same machine which supports newer CPU instruction (e.g. AVX512). PORTABLE: $ ./db_bench -benchmarks=fillrandom,readseq -num=10000000 -key_size=40 -value_size=40 1. fillrandom : 5.237 micros/op 190954 ops/sec 52.369 seconds 10000000 operations; 14.6 MB/s readseq : 0.448 micros/op 2231382 ops/sec 2.833 seconds 6322271 operations; 170.2 MB/s 2. fillrandom : 5.236 micros/op 190981 ops/sec 52.361 seconds 10000000 operations; 14.6 MB/s readseq : 0.444 micros/op 2252646 ops/sec 2.806 seconds 6321658 operations; 171.9 MB/s 3. fillrandom : 5.182 micros/op 192960 ops/sec 51.824 seconds 10000000 operations; 14.7 MB/s readseq : 0.444 micros/op 2252317 ops/sec 2.807 seconds 6323209 operations; 171.8 MB/s NON-PORTABLE: $ ./db_bench -benchmarks=fillrandom,readseq -num=10000000 -key_size=40 -value_size=40 1. fillrandom : 5.190 micros/op 192676 ops/sec 51.900 seconds 10000000 operations; 14.7 MB/s readseq : 0.391 micros/op 2560051 ops/sec 2.470 seconds 6322786 operations; 195.3 MB/s 2. fillrandom : 5.156 micros/op 193945 ops/sec 51.561 seconds 10000000 operations; 14.8 MB/s readseq : 0.404 micros/op 2477956 ops/sec 2.551 seconds 6320644 operations; 189.1 MB/s 3. fillrandom : 5.527 micros/op 180940 ops/sec 55.267 seconds 10000000 operations; 13.8 MB/s readseq : 0.407 micros/op 2458297 ops/sec 2.571 seconds 6320885 operations; 187.6 MB/s 1. https://github.com/facebook/rocksdb/blob/v7.7.3/CMakeLists.txt#L248 Change-Id: Id30ae995c41a592fccbdb822bc1f457c5e6878ac Reviewed-on: http://gerrit.cloudera.org:8080/21287 Tested-by: Alexey Serbin <ale...@apache.org> Reviewed-by: Alexey Serbin <ale...@apache.org> > Kudu servers and tests crash after linking RocksDB library > ---------------------------------------------------------- > > Key: KUDU-3580 > URL: https://issues.apache.org/jira/browse/KUDU-3580 > Project: Kudu > Issue Type: Bug > Components: master, test, tserver > Reporter: Yingchun Lai > Priority: Critical > > After this commit [1] is merged, it's reported that the binaries (both test > binaries, {{{}kudu{}}}, {{{}kudu-tserver{}}}, {{kudu-master}} results in > SIGILL with coredumps). > > GDB shows the following stack: > {code:java} > (gdb) run > Starting program: /home/aserbin/tmp/kudu > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Program received signal SIGILL, Illegal instruction. > std::function<rocksdb::Status (rocksdb::ConfigOptions const&, std::string > const&, std::string const&, void*)>::swap(std::function<rocksdb::Status > (rocksdb::ConfigOptions const&, std::string const&, std::string const&, > void*)>&) (__x=..., > this=0x7fffffffe0e0) > at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 > 548 /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h: No such > file or directory. > (gdb) bt > #0 std::function<rocksdb::Status (rocksdb::ConfigOptions const&, std::string > const&, std::string const&, void*)>::swap(std::function<rocksdb::Status > (rocksdb::ConfigOptions const&, std::string const&, std::string const&, > void*)>&) ( > __x=..., this=0x7fffffffe0e0) > at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 > #1 std::function<rocksdb::Status (rocksdb::ConfigOptions const&, std::string > const&, std::string const&, void*)>::operator=(std::function<rocksdb::Status > (rocksdb::ConfigOptions const&, std::string const&, std::string const&, > void*)> const&) (__x=..., this=0x7fffffffe108) > at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:463 > #2 rocksdb::OptionTypeInfo::SetParseFunc(std::function<rocksdb::Status > (rocksdb::ConfigOptions const&, std::string const&, std::string const&, > void*)> const&) > (f=..., this=0x7fffffffe100) > at > /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:591 > #3 rocksdb::OptionTypeInfo::AsCustomSharedPtr<rocksdb::SystemClock> ( > offset=offset@entry=0, > ovt=ovt@entry=rocksdb::OptionVerificationType::kByName, > flags=flags@entry=rocksdb::OptionTypeFlags::kDontSerialize) > at > /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:497 > #4 0x0000000000ee8c5e in __static_initialization_and_destruction_0(int, int) > [clone .constprop.449] () > at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/env/env.cc:1267 > #5 0x0000000003ca23cd in __libc_csu_init () > #6 0x00007ffff5a69c18 in __libc_start_main (main=0xed8de0 <main>, argc=1, > argv=0x7fffffffe4f8, init=0x3ca2380 <__libc_csu_init>, > fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe4e8) > at ../csu/libc-start.c:266 > #7 0x0000000000f8f4c4 in _start () > at /root/Projects/kudu/src/kudu/tools/tool_main.cc:306 > (gdb) {code} > And an example of results where SIGILL is observed (just built the binaries > with the top of the master branch at > 634d967a0c620db2b3932c09b1fe13be1dc70f44): > [http://dist-test.cloudera.org/job?job_id=root.1712768932.261750] > > 1. > [https://github.com/apache/kudu/commit/4da8b20070a7c0070a1829dfd50fdc78cad88b6a] -- This message was sent by Atlassian Jira (v8.20.10#820010)