[ https://issues.apache.org/jira/browse/KUDU-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835086#comment-17835086 ]
Alexey Serbin edited comment on KUDU-3545 at 4/9/24 1:36 AM: ------------------------------------------------------------- I haven't tried to track down the exact root cause behind the crash, but I suspect the root cause is something described in KUDU-2068, i.e. ABI incompatibilities between GCC toolchains of different versions. In essence, Kudu's third-party CLANG (used to generate {{precompiled.ll}}) picks up a toolchain of the latest version available at the build machine, but the rest of Kudu is built with a toolchain of different version (e.g., think of GCC7-based and GCC13-based toolchains on SLES15). If there is an ABI incompatibility on the size of an STL-based type or anything else that's being passed between auto-generated code derived from {{precompiled.ll}} and the rest of the {{kudu-tserver}} runtime, there is a risk of either a memory corruption or, if you are lucky, an immediate crash of the {{kudu-tserver}} process or even a crash of the {{codegen-test}}. The CLANG's behavior of picking up the latest available version of GCC toolchain that it can find is described in [its documentation|https://clang.llvm.org/docs/ClangCommandLineReference.html#dumping-preprocessor-state], see the paragraph for the {{\-\-gcc-toolchain}} option. In newer versions of CLANG (starting with 16.0.0) there is a better alternative to the {{\-\-gcc-toolchain}} flag: {{\-\-gcc-install-dir}} (see [this e-mail thread|https://discourse.llvm.org/t/add-gcc-install-dir-deprecate-gcc-toolchain-and-remove-gcc-install-prefix/65091] for more details). I guess we should employ this option once we upgrade Kudu's thirdparty LLVM at least to 16.0.0 version or newer (it's 11.0.0 as of April 2024). was (Author: aserbin): I haven't tried to track down the exact root cause behind the crash, but I suspect the root cause is something described in KUDU-2068, i.e. ABI incompatibilities between GCC toolchains of different versions. In essence, Kudu's third-party CLANG (used to generate {{precompiled.ll}}) picks up a toolchain of the latest version available at the build machine, but the rest of Kudu is built with a toolchain of different version (e.g., think of GCC7-based and GCC13-based toolchains on SLES15). If there is an ABI incompatibility on the size of an STL-based type or anything else that's being passing between auto-generated code derived from {{precompiled.ll}} and the rest of the {{kudu-tserver}} runtime, there is a risk of either a memory corruption or, if you are lucky, an immediate crash of the {{kudu-tserver}} process or even a crash of the {{codegen-test}}. The CLANG's behavior of picking up the latest available version of GCC toolchain that it can find is described in [its documentation|https://clang.llvm.org/docs/ClangCommandLineReference.html#dumping-preprocessor-state], see the paragraph for the {{\-\-gcc-toolchain}} option. In newer versions of CLANG (starting with 16.0.0) there is a better alternative to the {{\-\-gcc-toolchain}} flag: {{\-\-gcc-install-dir}} (see [this e-mail thread|https://discourse.llvm.org/t/add-gcc-install-dir-deprecate-gcc-toolchain-and-remove-gcc-install-prefix/65091] for more details). I guess we should employ this option once we upgrade Kudu's thirdparty LLVM at least to 16.0.0 version or newer (it's 11.0.0 as of April 2024). > codegen test fails on SLES with higher libgcc version > ----------------------------------------------------- > > Key: KUDU-3545 > URL: https://issues.apache.org/jira/browse/KUDU-3545 > Project: Kudu > Issue Type: Bug > Components: codegen > Reporter: Ashwani Raina > Priority: Minor > > On a SLES 15 withlibgcc_s1-13.2.1+git7813-150000.1.6.1.x86_64 version, > codegen-test fails with following crash: > {noformat} > *** SIGABRT (@0x3162e) received by PID 202286 (TID 0x7f71d1bfe700) from PID > 202286; stack trace: *** > @ 0x7f71d41f5910 (unknown) > @ 0x7f71d2725d2b __GI_raise > @ 0x7f71d27273e5 __GI_abort > @ 0x7f71d28d78d7 (unknown) > @ 0x7f71d28f1009 __deregister_frame > @ 0x7f71d4d6c9e0 llvm::RTDyldMemoryManager::deregisterEHFrames() > @ 0x7f71d4976b02 llvm::MCJIT::~MCJIT() > @ 0x7f71d4977241 llvm::MCJIT::~MCJIT() > @ 0x7f71d481c222 std::default_delete<>::operator()() > @ 0x7f71d481c12d std::unique_ptr<>::~unique_ptr() > @ 0x7f71d481bfaf kudu::codegen::JITWrapper::~JITWrapper() > @ 0x7f71d4835f34 > kudu::codegen::RowProjectorFunctions::~RowProjectorFunctions() > @ 0x7f71d4835f50 > kudu::codegen::RowProjectorFunctions::~RowProjectorFunctions() > @ 0x46297c kudu::RefCountedThreadSafe<>::DeleteInternal() > @ 0x45f3d1 kudu::DefaultRefCountedThreadSafeTraits<>::Destruct() > @ 0x45acb0 kudu::RefCountedThreadSafe<>::Release() > @ 0x7f71d480c191 > kudu::codegen::CodeCache::EvictionCallback::EvictedEntry() > @ 0x7f71d3c5e4bb kudu::(anonymous > namespace)::CacheShard<>::FreeEntry() > @ 0x7f71d3c60b31 kudu::(anonymous namespace)::CacheShard<>::Insert() > @ 0x7f71d3c5fb73 kudu::(anonymous namespace)::ShardedCache<>::Insert() > @ 0x7f71d480bab6 kudu::codegen::CodeCache::AddEntry() > @ 0x7f71d4811fea kudu::codegen::(anonymous > namespace)::CompilationTask::RunWithStatus() > @ 0x7f71d4811a64 kudu::codegen::(anonymous > namespace)::CompilationTask::Run() > @ 0x7f71d481288a > _ZZN4kudu7codegen18CompilationManager19RequestRowProjectorEPKNS_6SchemaES4_PSt10unique_ptrINS0_12RowProjectorESt14default_deleteIS6_EEENKUlvE_clEv > @ 0x7f71d4813e72 > _ZNSt17_Function_handlerIFvvEZN4kudu7codegen18CompilationManager19RequestRowProjectorEPKNS1_6SchemaES6_PSt10unique_ptrINS2_12RowProjectorESt14default_deleteIS8_EEEUlvE_E9_M_invokeERKSt9_Any_data > @ 0x452430 std::function<>::operator()() > @ 0x7f71d3d98648 kudu::ThreadPool::DispatchThread() > @ 0x7f71d3d98ee9 _ZZN4kudu10ThreadPool12CreateThreadEvENKUlvE_clEv > @ 0x7f71d3d9a6a0 > _ZNSt17_Function_handlerIFvvEZN4kudu10ThreadPool12CreateThreadEvEUlvE_E9_M_invokeERKSt9_Any_data > @ 0x452430 std::function<>::operator()() > @ 0x7f71d3d89482 kudu::Thread::SuperviseThread() > @ 0x7f71d41e96ea start_thread > {noformat} > From the stack frame, it seems that __deregister_frame is probably being fed > some invalid input that is already de-initialised before calling the > __deregister_frame. > We seem to be hitting this assert: > [https://github.com/gcc-mirror/gcc/blob/65e2c932019b4e36d7c1d49952dc006fa7419a3d/libgcc/unwind-dw2-fde.c#L291C11-L291C11] > gcc_assert (in_shutdown || ob); -- This message was sent by Atlassian Jira (v8.20.10#820010)