Alexey Serbin created KUDU-3517:
-----------------------------------

             Summary: Kudu servers crash on Graviton3 (aarch64) instances in EC2
                 Key: KUDU-3517
                 URL: https://issues.apache.org/jira/browse/KUDU-3517
             Project: Kudu
          Issue Type: Bug
          Components: CLI, client, master, tserver
    Affects Versions: 1.17.0
         Environment: Graviton3 instances in EC2
            Reporter: Alexey Serbin


Kudu masters and tablet servers built from the source code released with Kudu 
1.17.0 crash with SIGSEGV when running on Graviton3 (aarch64) instances in EC2.

Upon closer examination, it turned out the problem happens when StackCollector 
tries to symbolize a thread's stack, and an example of the trace looked like 
below.  The stack trace has been collected under GDB when running a smoke test 
with the kudu CLI tool: {{kudu perf loadgen <master_rpc_addr> 
\-\-table_num_replicas=3 \-\-num_rows_per_thread=1000000}}:

{noformat}
#0  access_mem (as=0x3304418 <local_addr_space>, addr=7745970402396146688, 
    val=0xfffff325ca18, write=0, arg=0xfffff325ce70)
    at 
/root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Ginit.c:337
#1  0x0000000000a97ac0 in is_plt_entry (c=0xfffff325ce70)
    at /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:43
#2  0x0000000000a97fdc in _ULaarch64_step (cursor=0xfffff325ce70)
    at 
/root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:171
#3  0x00000000025050c8 in kudu::StackTrace::Collect (
    this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0)
    at /root/Projects/kudu/src/kudu/util/debug-util.cc:612
#4  0x0000000002507f64 in kudu::StackTrace::Collect (
    this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0)
    at /root/Projects/kudu/src/kudu/util/debug-util.cc:579
#5  0x000000000259c390 in kudu::(anonymous 
namespace)::SubmitSpinLockProfileData (contendedlock=0x4ed8a220, 
wait_cycles=2966400)
    at /root/Projects/kudu/src/kudu/util/spinlock_profiling.cc:229
{noformat}

The crash happens with SIGSEGV somewhere in the libunwind code, and that looks 
very similar to what's reported in [this github 
issue|https://github.com/libunwind/libunwind/issues/260].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to