On Fri, 16 Jun 2023 11:52:51 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:
> Generational ZGC changed the location where the heap object iterator called > the visit function. It used to be called when the objects were popped and > then followed, but it was changed so that the function where called > immediately when the objects are marked. This gave the benefit that the > verification code code show exactly what root had a broken object. > Unfortunately, this change started to apply non-GC closure during the root > iteration. This code lead to lock order issues, just like the one described > in the JBS entry. > > This fix moves the visit calls for non-GC closures back to the follow calls. It was quite hard to provoke this problem, and this bug was found in one of Oracle's internal stress tests. The stack trace looked like this: # assert(false) failed: Attempting to acquire lock ObjectSynchronizer Weak alloc lock/oopstorage out of order with lock Service_lock/service -- possible deadlock ... V [libjvm.so+0x13fcc3c] Mutex::check_rank(Thread*)+0x36c (mutex.cpp:447) V [libjvm.so+0x13fdee0] Mutex::lock_without_safepoint_check()+0x50 (mutex.cpp:145) V [libjvm.so+0x14819b9] OopStorage::allocate()+0x29 (mutexLocker.hpp:199) V [libjvm.so+0x18a8bd0] WeakHandle::WeakHandle(OopStorage*, oop)+0x20 (weakHandle.cpp:37) V [libjvm.so+0x144e4f2] ObjectMonitor::ObjectMonitor(oop)+0x62 (objectMonitor.cpp:261) V [libjvm.so+0x17337f5] ObjectSynchronizer::inflate(Thread*, oop, ObjectSynchronizer::InflateCause)+0x675 (synchronizer.cpp:1425) V [libjvm.so+0x1736036] ObjectSynchronizer::FastHashCode(Thread*, oop)+0x146 (synchronizer.cpp:1001) V [libjvm.so+0x146491f] oopDesc::slow_identity_hash()+0x6f (oop.cpp:112) V [libjvm.so+0x11bf2c5] JvmtiTagMapTable::remove(oop)+0x265 (oop.inline.hpp:366) V [libjvm.so+0x11b6e0f] CallbackWrapper::~CallbackWrapper()+0xef (jvmtiTagMap.cpp:248) V [libjvm.so+0x11b290c] IterateThroughHeapObjectClosure::do_object(oop)+0x59c (jvmtiTagMap.cpp:1089) V [libjvm.so+0x1999518] ZHeapIteratorContext::mark_and_push(oop) const [clone .part.0]+0x58 (zHeapIterator.cpp:81) V [libjvm.so+0x199c6d4] ZHeapIteratorUncoloredRootOopClosure::do_oop(oop*)+0x154 (zHeapIterator.cpp:161) V [libjvm.so+0x19bda57] ZNMethod::nmethod_oops_do_inner(nmethod*, OopClosure*)+0x47 (zNMethod.cpp:264) V [libjvm.so+0x19bdc1e] ZNMethod::nmethod_oops_do(nmethod*, OopClosure*)+0x8e (zNMethod.cpp:254) V [libjvm.so+0x1196195] JvmtiDeferredEventQueue::oops_do(OopClosure*, CodeBlobClosure*)+0x45 (jvmtiImpl.cpp:1027) V [libjvm.so+0x15e5dd9] ServiceThread::oops_do_no_frames(OopClosure*, CodeBlobClosure*)+0x59 (serviceThread.cpp:220) V [libjvm.so+0x1796ac6] Thread::oops_do(OopClosure*, CodeBlobClosure*)+0x76 (thread.cpp:431) V [libjvm.so+0x19fad72] ZJavaThreadsIterator::apply(ThreadClosure*)+0x102 (zRootsIterator.cpp:136) V [libjvm.so+0x19fb135] ZRootsIteratorStrongUncolored::apply(ThreadClosure*, NMethodClosure*)+0x35 (zRootsIterator.cpp:96) V [libjvm.so+0x1997bed] ZHeapIterator::push_strong_roots(ZHeapIteratorContext const&)+0x10d (zHeapIterator.cpp:376) V [libjvm.so+0x199a37c] ZHeapIterator::object_iterate(ObjectClosure*, unsigned int)+0x7c (zHeapIterator.cpp:388) V [libjvm.so+0x199476e] ZHeap::object_iterate(ObjectClosure*, bool)+0x5e (zHeap.cpp:297) V [libjvm.so+0x11b787c] VM_HeapIterateOperation::doit()+0x5c (jvmtiTagMap.cpp:877) You had to provoke the ObjectMonitor inflate code path via the identify_hash call from the JVMTI tagmap code, while the ServiceThread had a registered deferred event. Note that the Service_lock is taken inside ServiceThread::oops_do_no_frames. To more easily provoke the bug I temporarily added a oopstorage-ranked lock to IterateThroughHeapObjectClosure: diff --git a/src/hotspot/share/prims/jvmtiTagMap.cpp b/src/hotspot/share/prims/jvmtiTagMap.cpp index 6908e1994ba..a0e031eec31 100644 --- a/src/hotspot/share/prims/jvmtiTagMap.cpp +++ b/src/hotspot/share/prims/jvmtiTagMap.cpp @@ -967,6 +967,8 @@ class IterateThroughHeapObjectClosure: public ObjectClosure { const jvmtiHeapCallbacks* _callbacks; const void* _user_data; + Mutex _lock; + // accessor functions JvmtiTagMap* tag_map() const { return _tag_map; } int heap_filter() const { return _heap_filter; } @@ -1000,6 +1002,7 @@ class IterateThroughHeapObjectClosure: public ObjectClosure { _heap_filter(heap_filter), _callbacks(heap_callbacks), _user_data(user_data), + _lock(Mutex::oopstorage, "Test lock"), _iteration_aborted(false) { } @@ -1009,6 +1012,10 @@ class IterateThroughHeapObjectClosure: public ObjectClosure { // invoked for each object in the heap void IterateThroughHeapObjectClosure::do_object(oop obj) { + { + MutexLocker locker(&_lock, Mutex::_no_safepoint_check_flag); + } + // check if iteration has been halted if (is_iteration_aborted()) return; With this I could consistently get the problem to reproduce within a few minutes. After moving the call to the visitor (IterateThroughHeapObjectClosure) to outside the root iterator I've been able to run the reproducer for a few hours without any asserts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14518#issuecomment-1594566234