Author: Tom Yang
Date: 2025-03-31T13:29:31-07:00
New Revision: a8d2d169c7add4b0106ae76e186cf815c0b84825

URL: 
https://github.com/llvm/llvm-project/commit/a8d2d169c7add4b0106ae76e186cf815c0b84825
DIFF: 
https://github.com/llvm/llvm-project/commit/a8d2d169c7add4b0106ae76e186cf815c0b84825.diff

LOG: Parallelize module loading in POSIX dyld code (#130912)

This patch improves LLDB launch time on Linux machines for **preload
scenarios**, particularly for executables with a lot of shared library
dependencies (or modules). Specifically:
* Launching a binary with `target.preload-symbols = true` 
* Attaching to a process with `target.preload-symbols = true`.
It's completely controlled by a new flag added in the first commit
`plugin.dynamic-loader.posix-dyld.parallel-module-load`, which *defaults
to false*. This was inspired by similar work on Darwin #110646.

Some rough numbers to showcase perf improvement, run on a very beefy
machine:
* Executable with ~5600 modules: baseline 45s, improvement 15s
* Executable with ~3800 modules: baseline 25s,  improvement 10s
* Executable with ~6650 modules: baseline 67s, improvement 20s
* Executable with ~12500 modules: baseline 185s, improvement 85s
* Executable with ~14700 modules: baseline 235s, improvement 120s
A lot of targets we deal with have a *ton* of modules, and unfortunately
we're unable to convince other folks to reduce the number of modules, so
performance improvements like this can be very impactful for user
experience.

This patch achieves the performance improvement by parallelizing
`DynamicLoaderPOSIXDYLD::RefreshModules` for the launch scenario, and
`DynamicLoaderPOSIXDYLD::LoadAllCurrentModules` for the attach scenario.
The commits have some context on their specific changes as well --
hopefully this helps the review.

# More context on implementation

We discovered the bottlenecks by via `perf record -g -p <lldb's pid>` on
a Linux machine. With an executable known to have 1000s of shared
library dependencies, I ran
```
(lldb) b main
(lldb) r
# taking a while
```
and showed the resulting perf trace (snippet shown)
```
Samples: 85K of event 'cycles:P', Event count (approx.): 54615855812
  Children      Self  Command          Shared Object              Symbol
-   93.54%     0.00%  intern-state     libc.so.6                  [.] clone3
     clone3
     start_thread
     lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*)          
                                                                 r
     std::_Function_handler<void* (), 
lldb_private::Process::StartPrivateStateThread(bool)::$_0>::_M_invoke(std::_Any_data
 const&)
     lldb_private::Process::RunPrivateStateThread(bool)                         
                                                                 n
   - 
lldb_private::Process::HandlePrivateEvent(std::shared_ptr<lldb_private::Event>&)
      - 93.54% lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*)
         - 93.54% lldb_private::ThreadList::ShouldStop(lldb_private::Event*)
            - lldb_private::Thread::ShouldStop(lldb_private::Event*)            
                                                                 *
               - 93.53% 
lldb_private::StopInfoBreakpoint::ShouldStopSynchronous(lldb_private::Event*)   
                                         t
                  - 93.52% 
lldb_private::BreakpointSite::ShouldStop(lldb_private::StoppointCallbackContext*)
                                     i
                       
lldb_private::BreakpointLocationCollection::ShouldStop(lldb_private::StoppointCallbackContext*)
                           k
                       
lldb_private::BreakpointLocation::ShouldStop(lldb_private::StoppointCallbackContext*)
                                     b
                       
lldb_private::BreakpointOptions::InvokeCallback(lldb_private::StoppointCallbackContext*,
 unsigned long, unsigned long)    i
                       DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit(void*, 
lldb_private::StoppointCallbackContext*, unsigned long, unsigned lo
                     - DynamicLoaderPOSIXDYLD::RefreshModules()                 
                                                                 O
                        - 93.42% 
DynamicLoaderPOSIXDYLD::RefreshModules()::$_0::operator()(DYLDRendezvous::SOEntry
 const&) const                 u
                           - 93.40% 
DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(lldb_private::FileSpec const&, 
unsigned long, unsigned long, bools
                              - 
lldb_private::DynamicLoader::LoadModuleAtAddress(lldb_private::FileSpec const&, 
unsigned long, unsigned long, boos
                                 - 83.90% 
lldb_private::DynamicLoader::FindModuleViaTarget(lldb_private::FileSpec const&) 
                       o
                                    - 83.01% 
lldb_private::Target::GetOrCreateModule(lldb_private::ModuleSpec const&, bool, 
lldb_private::Status*
                                       - 77.89% 
lldb_private::Module::PreloadSymbols()
                                          - 44.06% 
lldb_private::Symtab::PreloadSymbols()
                                             - 43.66% 
lldb_private::Symtab::InitNameIndexes()
...
```
We saw that majority of time was spent in `RefreshModules`, with the
main culprit within it `LoadModuleAtAddress` which eventually calls
`PreloadSymbols`.

At first, `DynamicLoaderPOSIXDYLD::LoadModuleAtAddress` appears fairly
independent -- most of it deals with different files and then getting or
creating Modules from these files. The portions that aren't independent
seem to deal with ModuleLists, which appear concurrency safe. There were
members of `DynamicLoaderPOSIXDYLD` I had to synchronize though: namely
`m_loaded_modules` which `DynamicLoaderPOSIXDYLD` maintains to map its
loaded modules to their link addresses. Without synchronizing this, I
ran into SEGFAULTS and other issues when running `check-lldb`. I also
locked the assignment and comparison of `m_interpreter_module`, which
may be unnecessary.

# Alternate implementations

When creating this patch, another implementation I considered was
directly background-ing the call to `Module::PreloadSymbol` in
`Target::GetOrCreateModule`. It would have the added benefit of working
across platforms generically, and appeared to be concurrency safe. It
was done via `Debugger::GetThreadPool().async` directly. However, there
were a ton of concurrency issues, so I abandoned that approach for now.

# Testing

With the feature active, I tested via `ninja check-lldb` on both Debug
and Release builds several times (~5 or 6 altogether?), and didn't spot
additional failing or flaky tests.

I also tested manually on several different binaries, some with around
14000 modules, but just basic operations: launching, reaching main,
setting breakpoint, stepping, showing some backtraces.

I've also tested with the flag off just to make sure things behave
properly synchronously.

Added: 
    

Modified: 
    lldb/include/lldb/Target/Target.h
    lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
    lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
    lldb/source/Target/Target.cpp
    lldb/source/Target/TargetProperties.td

Removed: 
    


################################################################################
diff  --git a/lldb/include/lldb/Target/Target.h 
b/lldb/include/lldb/Target/Target.h
index 3cdbe9221a0bc..29183cc267721 100644
--- a/lldb/include/lldb/Target/Target.h
+++ b/lldb/include/lldb/Target/Target.h
@@ -118,6 +118,8 @@ class TargetProperties : public Properties {
 
   llvm::StringRef GetLaunchWorkingDirectory() const;
 
+  bool GetParallelModuleLoad() const;
+
   const char *GetDisassemblyFlavor() const;
 
   const char *GetDisassemblyCPU() const;

diff  --git 
a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp 
b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
index 53ba11ac21bd3..326b6910b5267 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
@@ -10,6 +10,7 @@
 #include "DynamicLoaderPOSIXDYLD.h"
 
 #include "lldb/Breakpoint/BreakpointLocation.h"
+#include "lldb/Core/Debugger.h"
 #include "lldb/Core/Module.h"
 #include "lldb/Core/ModuleSpec.h"
 #include "lldb/Core/PluginManager.h"
@@ -25,6 +26,7 @@
 #include "lldb/Utility/LLDBLog.h"
 #include "lldb/Utility/Log.h"
 #include "lldb/Utility/ProcessInfo.h"
+#include "llvm/Support/ThreadPool.h"
 
 #include <memory>
 #include <optional>
@@ -184,16 +186,37 @@ void DynamicLoaderPOSIXDYLD::DidLaunch() {
 
 Status DynamicLoaderPOSIXDYLD::CanLoadImage() { return Status(); }
 
+void DynamicLoaderPOSIXDYLD::SetLoadedModule(const ModuleSP &module_sp,
+                                             addr_t link_map_addr) {
+  llvm::sys::ScopedWriter lock(m_loaded_modules_rw_mutex);
+  m_loaded_modules[module_sp] = link_map_addr;
+}
+
+void DynamicLoaderPOSIXDYLD::UnloadModule(const ModuleSP &module_sp) {
+  llvm::sys::ScopedWriter lock(m_loaded_modules_rw_mutex);
+  m_loaded_modules.erase(module_sp);
+}
+
+std::optional<lldb::addr_t>
+DynamicLoaderPOSIXDYLD::GetLoadedModuleLinkAddr(const ModuleSP &module_sp) {
+  llvm::sys::ScopedReader lock(m_loaded_modules_rw_mutex);
+  auto it = m_loaded_modules.find(module_sp);
+  if (it != m_loaded_modules.end())
+    return it->second;
+  return std::nullopt;
+}
+
 void DynamicLoaderPOSIXDYLD::UpdateLoadedSections(ModuleSP module,
                                                   addr_t link_map_addr,
                                                   addr_t base_addr,
                                                   bool base_addr_is_offset) {
-  m_loaded_modules[module] = link_map_addr;
+  SetLoadedModule(module, link_map_addr);
+
   UpdateLoadedSectionsCommon(module, base_addr, base_addr_is_offset);
 }
 
 void DynamicLoaderPOSIXDYLD::UnloadSections(const ModuleSP module) {
-  m_loaded_modules.erase(module);
+  UnloadModule(module);
 
   UnloadSectionsCommon(module);
 }
@@ -401,7 +424,7 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
   // The rendezvous class doesn't enumerate the main module, so track that
   // ourselves here.
   ModuleSP executable = GetTargetExecutable();
-  m_loaded_modules[executable] = m_rendezvous.GetLinkMapAddress();
+  SetLoadedModule(executable, m_rendezvous.GetLinkMapAddress());
 
   DYLDRendezvous::iterator I;
   DYLDRendezvous::iterator E;
@@ -423,34 +446,70 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
       E = m_rendezvous.end();
       m_initial_modules_added = true;
     }
-    for (; I != E; ++I) {
-      // Don't load a duplicate copy of ld.so if we have already loaded it
-      // earlier in LoadInterpreterModule. If we instead loaded then unloaded 
it
-      // later, the section information for ld.so would be removed. That
-      // information is required for placing breakpoints on Arm/Thumb systems.
-      if ((m_interpreter_module.lock() != nullptr) &&
-          (I->base_addr == m_interpreter_base))
-        continue;
-
-      ModuleSP module_sp =
-          LoadModuleAtAddress(I->file_spec, I->link_addr, I->base_addr, true);
-      if (!module_sp.get())
-        continue;
-
-      if (module_sp->GetObjectFile()->GetBaseAddress().GetLoadAddress(
-              &m_process->GetTarget()) == m_interpreter_base) {
-        ModuleSP interpreter_sp = m_interpreter_module.lock();
-        if (m_interpreter_module.lock() == nullptr) {
-          m_interpreter_module = module_sp;
-        } else if (module_sp == interpreter_sp) {
-          // Module already loaded.
-          continue;
-        }
-      }
 
-      loaded_modules.AppendIfNeeded(module_sp);
-      new_modules.Append(module_sp);
+    // Synchronize reading and writing of `m_interpreter_module`.
+    std::mutex interpreter_module_mutex;
+    // We should be able to take SOEntry as reference since the data
+    // exists for the duration of this call in `m_rendezvous`.
+    auto load_module_fn =
+        [this, &loaded_modules, &new_modules,
+         &interpreter_module_mutex](const DYLDRendezvous::SOEntry &so_entry) {
+          // Don't load a duplicate copy of ld.so if we have already loaded it
+          // earlier in LoadInterpreterModule. If we instead loaded then
+          // unloaded it later, the section information for ld.so would be
+          // removed. That information is required for placing breakpoints on
+          // Arm/Thumb systems.
+          {
+            // `m_interpreter_module` may be modified by another thread at the
+            // same time, so we guard the access here.
+            std::lock_guard<std::mutex> lock(interpreter_module_mutex);
+            if ((m_interpreter_module.lock() != nullptr) &&
+                (so_entry.base_addr == m_interpreter_base))
+              return;
+          }
+
+          ModuleSP module_sp = LoadModuleAtAddress(
+              so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, 
true);
+          if (!module_sp.get())
+            return;
+
+          {
+            // `m_interpreter_module` may be modified by another thread at the
+            // same time, so we guard the access here.
+            std::lock_guard<std::mutex> lock(interpreter_module_mutex);
+            // Set the interpreter module, if this is the interpreter.
+            if (module_sp->GetObjectFile()->GetBaseAddress().GetLoadAddress(
+                    &m_process->GetTarget()) == m_interpreter_base) {
+              ModuleSP interpreter_sp = m_interpreter_module.lock();
+              if (m_interpreter_module.lock() == nullptr) {
+                m_interpreter_module = module_sp;
+              } else if (module_sp == interpreter_sp) {
+                // Module already loaded.
+                return;
+              }
+            }
+          }
+
+          // Note: in a multi-threaded environment, these module lists may be
+          // appended to out-of-order. This is fine, since there's no
+          // expectation for `loaded_modules` or `new_modules` to be in any
+          // particular order, and appending to each module list is 
thread-safe.
+          // Also, `new_modules` is only used for the `ModulesDidLoad` call at
+          // the end of this function.
+          loaded_modules.AppendIfNeeded(module_sp);
+          new_modules.Append(module_sp);
+        };
+
+    if (m_process->GetTarget().GetParallelModuleLoad()) {
+      llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+      for (; I != E; ++I)
+        task_group.async(load_module_fn, *I);
+      task_group.wait();
+    } else {
+      for (; I != E; ++I)
+        load_module_fn(*I);
     }
+
     m_process->GetTarget().ModulesDidLoad(new_modules);
   }
 
@@ -636,7 +695,7 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
   // The rendezvous class doesn't enumerate the main module, so track that
   // ourselves here.
   ModuleSP executable = GetTargetExecutable();
-  m_loaded_modules[executable] = m_rendezvous.GetLinkMapAddress();
+  SetLoadedModule(executable, m_rendezvous.GetLinkMapAddress());
 
   std::vector<FileSpec> module_names;
   for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I)
@@ -644,19 +703,31 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
   m_process->PrefetchModuleSpecs(
       module_names, m_process->GetTarget().GetArchitecture().GetTriple());
 
-  for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I) {
-    ModuleSP module_sp =
-        LoadModuleAtAddress(I->file_spec, I->link_addr, I->base_addr, true);
+  auto load_module_fn = [this, &module_list,
+                         &log](const DYLDRendezvous::SOEntry &so_entry) {
+    ModuleSP module_sp = LoadModuleAtAddress(
+        so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
     if (module_sp.get()) {
       LLDB_LOG(log, "LoadAllCurrentModules loading module: {0}",
-               I->file_spec.GetFilename());
+               so_entry.file_spec.GetFilename());
       module_list.Append(module_sp);
     } else {
       Log *log = GetLog(LLDBLog::DynamicLoader);
       LLDB_LOGF(
           log,
           "DynamicLoaderPOSIXDYLD::%s failed loading module %s at 0x%" PRIx64,
-          __FUNCTION__, I->file_spec.GetPath().c_str(), I->base_addr);
+          __FUNCTION__, so_entry.file_spec.GetPath().c_str(),
+          so_entry.base_addr);
+    }
+  };
+  if (m_process->GetTarget().GetParallelModuleLoad()) {
+    llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+    for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I)
+      task_group.async(load_module_fn, *I);
+    task_group.wait();
+  } else {
+    for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I) {
+      load_module_fn(*I);
     }
   }
 
@@ -728,15 +799,15 @@ DynamicLoaderPOSIXDYLD::GetThreadLocalData(const 
lldb::ModuleSP module_sp,
                                            const lldb::ThreadSP thread,
                                            lldb::addr_t tls_file_addr) {
   Log *log = GetLog(LLDBLog::DynamicLoader);
-  auto it = m_loaded_modules.find(module_sp);
-  if (it == m_loaded_modules.end()) {
+  std::optional<addr_t> link_map_addr_opt = GetLoadedModuleLinkAddr(module_sp);
+  if (!link_map_addr_opt.has_value()) {
     LLDB_LOGF(
         log, "GetThreadLocalData error: module(%s) not found in loaded 
modules",
         module_sp->GetObjectName().AsCString());
     return LLDB_INVALID_ADDRESS;
   }
 
-  addr_t link_map = it->second;
+  addr_t link_map = link_map_addr_opt.value();
   if (link_map == LLDB_INVALID_ADDRESS || link_map == 0) {
     LLDB_LOGF(log,
               "GetThreadLocalData error: invalid link map address=0x%" PRIx64,

diff  --git 
a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h 
b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
index bde334aaca40b..6efb92673a13c 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
@@ -93,10 +93,6 @@ class DynamicLoaderPOSIXDYLD : public 
lldb_private::DynamicLoader {
   /// Contains the pointer to the interpret module, if loaded.
   std::weak_ptr<lldb_private::Module> m_interpreter_module;
 
-  /// Loaded module list. (link map for each module)
-  std::map<lldb::ModuleWP, lldb::addr_t, std::owner_less<lldb::ModuleWP>>
-      m_loaded_modules;
-
   /// Returns true if the process is for a core file.
   bool IsCoreFile() const;
 
@@ -180,6 +176,19 @@ class DynamicLoaderPOSIXDYLD : public 
lldb_private::DynamicLoader {
   DynamicLoaderPOSIXDYLD(const DynamicLoaderPOSIXDYLD &) = delete;
   const DynamicLoaderPOSIXDYLD &
   operator=(const DynamicLoaderPOSIXDYLD &) = delete;
+
+  /// Loaded module list. (link map for each module)
+  /// This may be accessed in a multi-threaded context. Use the accessor 
methods
+  /// to access `m_loaded_modules` safely.
+  std::map<lldb::ModuleWP, lldb::addr_t, std::owner_less<lldb::ModuleWP>>
+      m_loaded_modules;
+  llvm::sys::RWMutex m_loaded_modules_rw_mutex;
+
+  void SetLoadedModule(const lldb::ModuleSP &module_sp,
+                       lldb::addr_t link_map_addr);
+  void UnloadModule(const lldb::ModuleSP &module_sp);
+  std::optional<lldb::addr_t>
+  GetLoadedModuleLinkAddr(const lldb::ModuleSP &module_sp);
 };
 
 #endif // LLDB_SOURCE_PLUGINS_DYNAMICLOADER_POSIX_DYLD_DYNAMICLOADERPOSIXDYLD_H

diff  --git a/lldb/source/Target/Target.cpp b/lldb/source/Target/Target.cpp
index c26bca546891e..09c0c0b8a5db0 100644
--- a/lldb/source/Target/Target.cpp
+++ b/lldb/source/Target/Target.cpp
@@ -4488,6 +4488,12 @@ llvm::StringRef 
TargetProperties::GetLaunchWorkingDirectory() const {
       idx, g_target_properties[idx].default_cstr_value);
 }
 
+bool TargetProperties::GetParallelModuleLoad() const {
+  const uint32_t idx = ePropertyParallelModuleLoad;
+  return GetPropertyAtIndexAs<bool>(
+      idx, g_target_properties[idx].default_uint_value != 0);
+}
+
 const char *TargetProperties::GetDisassemblyFlavor() const {
   const uint32_t idx = ePropertyDisassemblyFlavor;
   const char *return_value;

diff  --git a/lldb/source/Target/TargetProperties.td 
b/lldb/source/Target/TargetProperties.td
index 38a345dfd8849..3940ac00a2bd9 100644
--- a/lldb/source/Target/TargetProperties.td
+++ b/lldb/source/Target/TargetProperties.td
@@ -217,6 +217,9 @@ let Definition = "target" in {
          "launched. If you change this setting, the new value will only apply 
to "
          "subsequent launches. Commands that take an explicit working 
directory "
          "will override this setting.">;
+  def ParallelModuleLoad: Property<"parallel-module-load", "Boolean">,
+    DefaultTrue,
+    Desc<"Enable loading of modules in parallel for the dynamic loader.">;
 }
 
 let Definition = "process_experimental" in {


        
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to