[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125850#comment-16125850
 ] 

Ted Yu edited comment on HBASE-18541 at 8/14/17 3:49 PM:
---------------------------------------------------------

Installed openjdk-8-dbg
When loading core dump in gdb, I got:
{code}
#0  0x00007ff8b11fa421 in os::write_memory_serialize_page (thread=0x1d91000) at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/os.hpp:419
419     
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/os.hpp:
 No such file or directory.
[Current thread is 1 (Thread 0x7ff8b1aa2840 (LWP 10680))]
Installing openjdk unwinder
(gdb) bt
#0  0x00007ff8b11fa421 in ObjectMonitor::wait(long, bool, Thread*) 
(thread=0x1d91000)
    at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/os.hpp:419
#1  0x00007ff8b11fa421 in ObjectMonitor::wait(long, bool, Thread*) 
(thread=0x1d91000)
    at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/os/linux/vm/interfaceSupport_linux.hpp:31
#2  0x00007ff8b11fa421 in ObjectMonitor::wait(long, bool, Thread*) 
(from=_thread_blocked, to=_thread_in_vm, thread=0x1d91000)
    at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/interfaceSupport.hpp:179
#3  0x00007ff8b11fa421 in ObjectMonitor::wait(long, bool, Thread*) 
(to=_thread_in_vm, from=_thread_blocked, this=<synthetic pointer>)
    at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/interfaceSupport.hpp:232
#4  0x00007ff8b11fa421 in ObjectMonitor::wait(long, bool, Thread*) 
(this=<synthetic pointer>, __in_chrg=<optimized out>)
    at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/interfaceSupport.hpp:314
#5  0x00007ff8b11fa421 in ObjectMonitor::wait(long, bool, Thread*) 
(this=this@entry=0x7ff8140b4e18, millis=millis@entry=0, 
interruptible=interruptible@entry=true, 
__the_thread__=__the_thread__@entry=0x1d91000) at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/objectMonitor.cpp:1546
#6  0x00007ff8b132a1bf in ObjectSynchronizer::wait(Handle, long, Thread*) 
(obj=obj@entry=..., millis=millis@entry=0, 
__the_thread__=__the_thread__@entry=0x1d91000)
    at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/synchronizer.cpp:389
#7  0x00007ff8b0ffb4a3 in JVM_MonitorWait(JNIEnv*, jobject, jlong) 
(env=<optimized out>, handle=<optimized out>, ms=0)
    at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/prims/jvm.cpp:562
#8  0x00007ff89dc09c28 in [native offset=0xa8] java.lang.Object.wait(long) () 
at java/lang/Object.java
#9  0x00007ff89df5e304 in [compiled offset=0x104] java.lang.Object.wait() () at 
java/lang/Object.java:502
#10 0x00007ff89d9d1ffd in [interpreted: bc = 42] 
org.apache.zookeeper.ClientCnxn.submitRequest(org.apache.zookeeper.proto.RequestHeader,org.apache.jute.Record,org.apache.jute.Record,org.apache.zookeeper.ZooKeeper$WatchRegistration)
 () at org/apache/zookeeper/ClientCnxn.java:1408
#11 0x00007ff89d9d1d80 in [interpreted: bc = 75] 
org.apache.zookeeper.ZooKeeper.delete(java.lang.String,int) () at 
org/apache/zookeeper/ZooKeeper.java:872
#12 0x00007ff89d9d1ffd in [interpreted: bc = 26] 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(java.lang.String,int)
 ()
    at org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java:205
#13 0x00007ff89d9d1ffd in [interpreted: bc = 6] 
org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher,java.lang.String,int)
 ()
    at org/apache/hadoop/hbase/zookeeper/ZKUtil.java:1236
#14 0x00007ff89d9d17d0 in [interpreted: bc = 3] 
org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher,java.lang.String)
 ()
    at org/apache/hadoop/hbase/zookeeper/ZKUtil.java:1225
#15 0x00007ff89d9d1ffd in [interpreted: bc = 11] 
org.apache.hadoop.hbase.zookeeper.ClusterStatusTracker.setClusterDown() ()
    at org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java:86
#16 0x00007ff89d9d1ffd in [interpreted: bc = 55] 
org.apache.hadoop.hbase.master.HMaster.shutdown() () at 
org/apache/hadoop/hbase/master/HMaster.java:2315
#17 0x00007ff89d9d1ffd in [interpreted: bc = 79] 
org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(java.util.List,java.util.List)
 ()
    at org/apache/hadoop/hbase/util/JVMClusterUtil.java:257
#18 0x00007ff89d9d1ffd in [interpreted: bc = 8] 
org.apache.hadoop.hbase.LocalHBaseCluster.shutdown() () at 
org/apache/hadoop/hbase/LocalHBaseCluster.java:418
{code}
The crash didn't involve table creation.


was (Author: [email protected]):
Installed openjdk-8-dbg
When loading core dump in gdb, I got:
{code}
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by 
`/usr/src/hbase/hbase-native-client/buck-out/gen/core/retry-test 
--gtest_color=n'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1a735338d5 in ?? ()
[Current thread is 1 (Thread 0x7f1a8701d840 (LWP 12922))]
Installing openjdk unwinder
(gdb) bt
#0  0x00007f1a735338d5 in  ()
#1  0x00007ffe78f88ba8 in  ()
#2  0x00007f19e4d572c8 in  ()
#3  0x0000000000000000 in  ()
{code}
There was no detail for the seg fault.

> [C++] Segfaults from JNI
> ------------------------
>
>                 Key: HBASE-18541
>                 URL: https://issues.apache.org/jira/browse/HBASE-18541
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to