[ 
https://issues.apache.org/jira/browse/IMPALA-14465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-14465.
------------------------------------
    Fix Version/s: Impala 5.0.0
       Resolution: Fixed

> Kudu cannot start up on Redhat8 ARM64 with HEAPCHECK set in environment
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-14465
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14465
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>             Fix For: Impala 5.0.0
>
>
> Nightly jobs running on Redhat8 ARM64 have been seeing failures in Kudu 
> custom cluster tests like TestKuduHMSIntegration. These tests restart the 
> Kudu service to apply different startup options, but the Kudu service is 
> unusable and all operations fails. e.g.
> {noformat}
> E   impala.error.HiveServer2Error: Query aa48dd645b659d95:060aac6c00000000 
> failed:
> E   AnalysisException: Cannot analyze Kudu table 't': Error determining if 
> Kudu's integration with the Hive Metastore is enabled: cannot complete before 
> timeout: KuduRpc(method=getHiveMetastoreConfig, tablet=null, attempt=95, 
> TimeoutTracker(timeout=180000, elapsed=179251), Trace Summary(177060 ms): 
> Sent(0), Received(0), Delayed(94), MasterRefresh(0), AuthRefresh(0), 
> Truncated: false
> E    Delayed: (UNKNOWN, [ getHiveMetastoreConfig, 94 ])){noformat}
> When the tests restart the Kudu cluster, the restart command inherits 
> environment variables:
> {noformat}
>   def _restart_kudu_service(kudu_args=None):
>     kudu_env = dict(os.environ)
>     if kudu_args is not None:
>       kudu_env["IMPALA_KUDU_STARTUP_FLAGS"] = kudu_args
>     call = subprocess.Popen(
>         ['/bin/bash', '-c', os.path.join(IMPALA_HOME,
>                                          'testdata/cluster/admin restart 
> kudu')],
>         env=kudu_env)
>     call.wait()
>     if call.returncode != 0:
>       raise RuntimeError("Unable to restart Kudu"){noformat}
> Comparing the environment between regular Kudu minicluster startup vs the 
> restart triggered by the custom cluster test showed several differences. 
> After trial and error, the significant difference is that the test runs with 
> HEAPCHECK set (but empty). Somehow that causes problems, and the Kudu 
> processes get stuck in this stack:
> {noformat}
> #0  0x0000ffffa55937e4 in syscall () from /lib64/libc.so.6
> #1  0x00000000036a6878 in munmap ()
> #2  0x0000000000f3a658 in locate_debug_info ()
> #3  0x0000000000f3a7dc in _ULaarch64_dwarf_find_debug_frame ()
> #4  0x0000000000f3aaec in _ULaarch64_dwarf_callback ()
> #5  0x0000ffffa567cf88 in dl_iterate_phdr () from /lib64/libc.so.6
> #6  0x0000000003426cc0 in dl_iterate_phdr (callback=0xf3a924 
> <_ULaarch64_dwarf_callback>, data=0xffffea532d48) at 
> /mnt/source/kudu/kudu-54f3bd31c/src/kudu/util/debug/unwind_safeness.cc:160
> #7  0x0000000000f3aff0 in _ULaarch64_dwarf_find_proc_info ()
> #8  0x0000000000f3754c in _ULaarch64_dwarf_step ()
> #9  0x0000000000f359bc in _ULaarch64_step ()
> #10 0x0000000000f5f56c in GetStackTrace_libunwind(void**, int, int) ()
> #11 0x0000000000f60304 in GetStackTrace(void**, int, int) ()
> #12 0x0000000000f597fc in MallocHook_GetCallerStackTrace ()
> #13 0x0000000000f62258 in NewHook(void const*, unsigned long) ()
> #14 0x0000000000f59568 in MallocHook::InvokeNewHookSlow(void const*, unsigned 
> long) ()
> #15 0x00000000036a5648 in tcmalloc::allocate_full_cpp_throw_oom(unsigned 
> long) ()
> #16 0x00000000035caddc in google::protobuf::DescriptorProto* 
> google::protobuf::Arena::CreateMaybeMessage<google::protobuf::DescriptorProto>(google::protobuf::Arena*)
>  ()
> #17 0x00000000035cf7f8 in 
> google::protobuf::FileDescriptorProto::_InternalParse(char const*, 
> google::protobuf::internal::ParseContext*) ()
> #18 0x000000000355912c in bool 
> google::protobuf::internal::MergeFromImpl<false>(google::protobuf::stringpiece_internal::StringPiece,
>  google::protobuf::MessageLite*, google::protobuf::MessageLite::ParseFlags) ()
> #19 0x00000000035e841c in 
> google::protobuf::EncodedDescriptorDatabase::Add(void const*, int) ()
> #20 0x0000000003588f90 in 
> google::protobuf::DescriptorPool::InternalAddGeneratedFile(void const*, int) 
> ()
> #21 0x00000000035f482c in google::protobuf::(anonymous 
> namespace)::AddDescriptorsImpl(google::protobuf::internal::DescriptorTable 
> const*) ()
> #22 0x00000000035f4f3c in 
> google::protobuf::internal::AddDescriptorsRunner::AddDescriptorsRunner(google::protobuf::internal::DescriptorTable
>  const*) ()
> #23 0x00000000036a3630 in __libc_csu_init ()
> #24 0x0000ffffa559432c in __libc_start_main () from /lib64/libc.so.6
> #25 0x0000000000e33e60 in _start (){noformat}
> Unsetting HEAPCHECK causes the Kudu startup to work normally. For some 
> reason, this is only a problem on Redhat8 ARM64.
> We should unset HEAPCHECK for this restart case (and look into removing the 
> "export HEAPCHECK=" statements).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to