clayborg created this revision.
clayborg added reviewers: labath, JDevlieghere, GeorgeHuyubo, yinghuitan, 
kusmour.
Herald added a project: All.
clayborg requested review of this revision.
Herald added a project: LLDB.
Herald added a subscriber: lldb-commits.

We ran into a case where shared libraries would fail to load in some processes 
on linux. The issue turned out to be if the main executable or a shared library 
defined a symbol named "_r_debug", then it would cause problems once the 
executable that contained it was loaded into the process. The "_r_debug" 
structure is currently found by looking through the .dynamic section in the 
main executable and finding the DT_DEBUG entry which points to this structure. 
The dynamic loader will update this structure as shared libraries are loaded 
and LLDB watches the contents of this structure as the dyld breakpoint is hit. 
Currently we expect the "state" in this structure to change as things happen. 
An issue comes up if someone defines another "_r_debug" struct in their program:

  r_debug _r_debug;

If this code is included, a new "_r_debug" structure is created and it causes 
problems once the executable is loaded. This is because of the way symbol 
lookups happen in linux: they use the shared library list in the order it 
created and the dynamic loader is always last. So at some point the dynamic 
loader will start updating this other copy of "_r_debug", yet LLDB is only 
watching the copy inside of the dynamic loader.

Steps that show the problem are:

- lldb finds the "_r_debug" structure via the DT_DEBUG entry in the .dynamic 
section and this points to the "_r_debug" in ld.so
- ld.so modifies its copy of "_r_debug" with "state = eAdd" before it loads the 
shared libraries and calls the dyld function that LLDB has set a breakpoint on 
and we find this state and do nothing (we are waiting for a state of 
eConsistent to tell us the shared libraries have been fully loaded)
- ld.so loads the main executable and any dependent shared libraries and wants 
to update the "_r_debug" structure, but it now finds "_r_debug" in the a.out 
program and updates the state in this other copy
- lldb hits the notification breakpoint and checks the ld.so copy of "_r_debug" 
which still has a state of "eAdd". LLDB wants the new "eConsistent" state which 
will trigger the shared libraries to load, but it gets stale data and doesn't 
do anyhing and library load is missed. The "_r_debug" in a.out has the state 
set correctly, but we don't know which "_r_debug" is the right one.

The new fix detects the two "eAdd" states and loads shared libraries and will 
emit a log message in the "log enable lldb dyld" log channel which states there 
might be multiple "_r_debug" structs.

The correct solution is that no one should be adding a duplicate "_r_debug" 
symbol to their binaries, but we have programs that are doing this already and 
since it can be done, we should be able to work with this and keep debug 
sessions working as expected. If a user #includes the <link.h> file, they can 
just use the existing "_r_debug" structure as it is defined in this header file 
as "extern struct r_debug _r_debug;" and no local copies need to be made.

If your ld.so has debug info, you can easily see the duplicate "_r_debug" 
structs by doing:

  (lldb) target variable _r_debug --raw
  (r_debug) _r_debug = {
    r_version = 1
    r_map = 0x00007ffff7e30210
    r_brk = 140737349972416
    r_state = RT_CONSISTENT
    r_ldbase = 0
  }
  (r_debug) _r_debug = {
    r_version = 1
    r_map = 0x00007ffff7e30210
    r_brk = 140737349972416
    r_state = RT_ADD
    r_ldbase = 140737349943296
  }
  (lldb) target variable &_r_debug
  (r_debug *) &_r_debug = 0x0000555555601040
  (r_debug *) &_r_debug = 0x00007ffff7e301e0

And if you do a "image lookup --address <addr>" in the addresses, you can see 
one is in the a.out and one in the ld.so.

Adding more logging to print out the m_previous and m_current Rendezvous 
structures to make things more clear. Also added a log when we detect multiple 
eAdd states in a row to detect this problem in logs.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D158583

Files:
  lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
  lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h
  lldb/test/API/functionalities/dyld-multiple-rdebug/Makefile
  
lldb/test/API/functionalities/dyld-multiple-rdebug/TestDyldWithMultupleRDebug.py
  lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.cpp
  lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.h
  lldb/test/API/functionalities/dyld-multiple-rdebug/main.cpp

Index: lldb/test/API/functionalities/dyld-multiple-rdebug/main.cpp
===================================================================
--- /dev/null
+++ lldb/test/API/functionalities/dyld-multiple-rdebug/main.cpp
@@ -0,0 +1,32 @@
+#include "library_file.h"
+#include <link.h>
+#include <stdio.h>
+// Make a duplicate "_r_debug" symbol that is visible. This is the global
+// variable name that the dynamic loader uses to communicate changes in shared
+// libraries that get loaded and unloaded. LLDB finds the address of this
+// variable by reading the DT_DEBUG entry from the .dynamic section of the main
+// executable.
+// What will happen is the dynamic loader will use the "_r_debug" symbol from
+// itself until the a.out executable gets loaded. At this point the new
+// "_r_debug" symbol will take precedence over the orignal "_r_debug" symbol
+// from the dynamic loader and the copy below will get updated with shared
+// library state changes while the version that LLDB checks in the dynamic
+// loader stays the same for ever after this.
+//
+// When our DYLDRendezvous.cpp tries to check the state in the _r_debug
+// structure, it will continue to get the last eAdd as the state before the
+// switch in symbol resolution.
+//
+// Before a fix in LLDB, this would mean that we wouldn't ever load any shared
+// libraries since DYLDRendezvous was waiting to see a eAdd state followed by a
+// eConsistent state which would trigger the adding of shared libraries, but we
+// would never see this change because the local copy below is actually what
+// would get updated. Now if DYLDRendezvous detects two eAdd states in a row,
+// it will load the shared libraries instead of doing nothing and a log message
+// will be printed out if "log enable lldb dyld" is active.
+r_debug _r_debug;
+
+int main() {
+  library_function(); // Break here
+  return 0;
+}
Index: lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.h
===================================================================
--- /dev/null
+++ lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.h
@@ -0,0 +1 @@
+int library_function();
Index: lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.cpp
===================================================================
--- /dev/null
+++ lldb/test/API/functionalities/dyld-multiple-rdebug/library_file.cpp
@@ -0,0 +1,7 @@
+#include "library_file.h"
+#include <stdio.h>
+
+int library_function(void) {
+  puts(__FUNCTION__); // Library break here
+  return 0;
+}
Index: lldb/test/API/functionalities/dyld-multiple-rdebug/TestDyldWithMultupleRDebug.py
===================================================================
--- /dev/null
+++ lldb/test/API/functionalities/dyld-multiple-rdebug/TestDyldWithMultupleRDebug.py
@@ -0,0 +1,64 @@
+"""
+Test that LLDB can launch a linux executable through the dynamic loader where
+the main executable has an extra exported "_r_debug" symbol that used to mess
+up shared library loading with DYLDRendezvous and the POSIX dynamic loader
+plug-in. What used to happen is that any shared libraries other than the main
+executable and the dynamic loader and VSDO would not get loaded. This test
+checks to make sure that we still load libraries correctly when we have
+multiple "_r_debug" symbols. See comments in the main.cpp source file for full
+details on what the problem is.
+"""
+
+import lldb
+import os
+
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+
+
+class TestDyldWithMultipleRDebug(TestBase):
+    @skipIf(oslist=no_match(["linux"]))
+    @no_debug_info_test
+    @skipIf(oslist=["linux"], archs=["arm"])
+    def test(self):
+        self.build()
+
+        # Extracts path of the interpreter.
+        exe = self.getBuildArtifact("a.out")
+        print(exe)
+        target = self.dbg.CreateTarget(exe)
+        self.assertTrue(target, VALID_TARGET)
+
+        # Set breakpoints both on shared library function as well as on
+        # main. Both of them will be pending breakpoints.
+        breakpoint_main = target.BreakpointCreateBySourceRegex(
+            "// Break here", lldb.SBFileSpec("main.cpp")
+        )
+        breakpoint_shared_library = target.BreakpointCreateBySourceRegex(
+            "// Library break here", lldb.SBFileSpec("library_file.cpp")
+        )
+        args = []
+        launch_info = lldb.SBLaunchInfo(args)
+        cwd = self.get_process_working_directory()
+        launch_info.SetWorkingDirectory(cwd)
+        launch_info.SetEnvironmentEntries(['LD_LIBRARY_PATH=' + cwd], True)
+        error = lldb.SBError()
+        process = target.Launch(launch_info, error)
+        self.assertSuccess(error)
+
+        # Stopped on main here. This ensures that we were able to load the
+        # main executable and resolve breakpoints within it.
+        self.assertState(process.GetState(), lldb.eStateStopped)
+        thread = process.GetSelectedThread()
+        self.assertIn("main",
+                      thread.GetFrameAtIndex(0).GetDisplayFunctionName())
+        process.Continue()
+
+        # Make sure we stop next a the library breakpoint. If the dynamic
+        # loader doesn't load the library correctly, this breakpoint won't get
+        # hit
+        self.assertState(process.GetState(), lldb.eStateStopped)
+        self.assertIn(
+            "library_function",
+            thread.GetFrameAtIndex(0).GetDisplayFunctionName()
+        )
Index: lldb/test/API/functionalities/dyld-multiple-rdebug/Makefile
===================================================================
--- /dev/null
+++ lldb/test/API/functionalities/dyld-multiple-rdebug/Makefile
@@ -0,0 +1,4 @@
+CXX_SOURCES := main.cpp
+DYLIB_NAME := testlib
+DYLIB_CXX_SOURCES := library_file.cpp
+include Makefile.rules
Index: lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h
===================================================================
--- lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h
+++ lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h
@@ -21,6 +21,7 @@
 using lldb_private::LoadedModuleInfoList;
 
 namespace lldb_private {
+class Log;
 class Process;
 }
 
@@ -45,6 +46,8 @@
     lldb::addr_t ldbase = 0;
 
     Rendezvous() = default;
+
+    void DumpToLog(lldb_private::Log *log, const char *label);
   };
 
   /// Locates the address of the rendezvous structure.  It updates
@@ -276,6 +279,9 @@
     eRemoveModules
   };
 
+  static const char *StateToCStr(RendezvousState state);
+  static const char *ActionToCStr(RendezvousAction action);
+
   /// Returns the current action to be taken given the current and previous
   /// state
   RendezvousAction GetAction() const;
Index: lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
===================================================================
--- lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
+++ lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp
@@ -25,6 +25,32 @@
 using namespace lldb;
 using namespace lldb_private;
 
+const char *DYLDRendezvous::StateToCStr(RendezvousState state) {
+  switch (state) {
+    case DYLDRendezvous::eConsistent:
+      return "eConsistent";
+    case DYLDRendezvous::eAdd:
+      return "eAdd";
+    case DYLDRendezvous::eDelete:
+      return "eDelete";
+  }
+  return "<invalid RendezvousState>";
+}
+
+const char *DYLDRendezvous::ActionToCStr(RendezvousAction action) {
+  switch (action) {
+  case DYLDRendezvous::RendezvousAction::eTakeSnapshot:
+    return "eTakeSnapshot";
+  case DYLDRendezvous::RendezvousAction::eAddModules:
+    return "eAddModules";
+  case DYLDRendezvous::RendezvousAction::eRemoveModules:
+    return "eRemoveModules";
+  case DYLDRendezvous::RendezvousAction::eNoAction:
+    return "eNoAction";
+  }
+  return "<invalid RendezvousAction>";
+}
+
 DYLDRendezvous::DYLDRendezvous(Process *process)
     : m_process(process), m_rendezvous_addr(LLDB_INVALID_ADDRESS),
       m_executable_interpreter(false), m_current(), m_previous(),
@@ -129,6 +155,13 @@
   }
 }
 
+void DYLDRendezvous::Rendezvous::DumpToLog(Log *log, const char *label) {
+  LLDB_LOGF(log, "%s Rendezvous: version = %" PRIu64 ", map_addr = 0x%16.16"
+            PRIx64 ", brk = 0x%16.16" PRIx64 ", state = %" PRIu64
+            " (%s), ldbase = 0x%16.16" PRIx64, label ? label : "", version,
+            map_addr, brk, state, StateToCStr((RendezvousState)state), ldbase);
+}
+
 bool DYLDRendezvous::Resolve() {
   Log *log = GetLog(LLDBLog::DynamicLoader);
 
@@ -176,6 +209,9 @@
   m_previous = m_current;
   m_current = info;
 
+  m_previous.DumpToLog(log, "m_previous");
+  m_current.DumpToLog(log, "m_current ");
+
   if (m_current.map_addr == 0)
     return false;
 
@@ -217,6 +253,26 @@
     break;
 
   case eAdd:
+    // Watch out for two eAdd states in a row and if we see this, then add
+    // modules. If any executable or shared library defines a symbol named
+    // "_r_debug", then the dynamic loader will start using its own copy for
+    // the first notification and the DT_DEBUG will point to the version from
+    // the dynamic loader. The problem happens as soon as the executable or
+    // shared library that exports another "_r_debug" is loaded by the dynamic
+    // loader due to the way symbols are found using the shared library search
+    // paths will mean that the new copy takes precedence over the one in the
+    // dynamic loader and the dynamic loader will be updating the other copy
+    // somewhere else that we won't find and that copy will have the
+    // eConsistent state.
+    if (m_previous.state == eAdd) {
+      Log *log = GetLog(LLDBLog::DynamicLoader);
+      LLDB_LOG(log, "DYLDRendezvous::GetAction() found two eAdd states in a "
+               "row, check process for multiple \"_r_debug\" symbols. "
+               "Returning eAddModules to ensure shared libraries get loaded "
+               "correctly");
+      return eAddModules;
+    }
+    return eNoAction;
   case eDelete:
     return eNoAction;
   }
@@ -225,7 +281,9 @@
 }
 
 bool DYLDRendezvous::UpdateSOEntriesFromRemote() {
-  auto action = GetAction();
+  const auto action = GetAction();
+  Log *log = GetLog(LLDBLog::DynamicLoader);
+  LLDB_LOG(log, "{0} action = {1}", __PRETTY_FUNCTION__, ActionToCStr(action));
 
   if (action == eNoAction)
     return false;
@@ -263,7 +321,10 @@
 bool DYLDRendezvous::UpdateSOEntries() {
   m_added_soentries.clear();
   m_removed_soentries.clear();
-  switch (GetAction()) {
+  const auto action = GetAction();
+  Log *log = GetLog(LLDBLog::DynamicLoader);
+  LLDB_LOG(log, "{0} action = {1}", __PRETTY_FUNCTION__, ActionToCStr(action));
+  switch (action) {
   case eTakeSnapshot:
     m_soentries.clear();
     return TakeSnapshot(m_soentries);
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to