On iOS, every time a device that is provisioned for debugging is plugged in, 
the device management stack checks to see if it knows the OS on the device and 
if not copies the libraries from the system to the host and puts them in a 
location that lldb can find.  That shouldn’t be a big job if the throughput to 
the device is decent.  Originally this took a couple minutes to process on iOS. 
 That was annoying but except for folks working at Apple who had to update 
their devices every day it was never a burning issue because you always knew 
when it was going to happen (Xcode gave you a nice progress bar, etc.)   Note, 
internal folks did complain enough that we eventually got around to looking at 
why it was so slow and found that almost all of that time was taking the iOS 
“shared cache” - which is how the libraries exist on the device - and expanding 
it into shared libraries.  This was being done single-threaded, and just doing 
this concurrently got the time down to 10 or 20 seconds.  Given you only do 
this once per os update on your device, this doesn’t seem to bother people 
anymore.

Once the shared libraries from the device are available on the lldb host, 
startup times for running an app to first breakpoint are nowhere near 23 
seconds.  Since you were quoting times for a simulator, I tried debugging an 
iOS game app that loads 330 shared libraries at startup.  Launching an app from 
a fresh lldb  (from hitting Run in Xcode to hitting a breakpoint in 
applicationDidFinishLaunching, fetching all the stacks of all the threads and 
displaying the locals for the current frame as well as calling a bunch of 
functions in the expression parser to get Queue information for all the 
threads) took 4-5 seconds.  And the warm launch was just  a second or two.

So I’m surprised that it takes this long to load on Android.  Before we go 
complicating how lldb handles symbols, it might be worth first figuring out 
what lldb is doing differently on Android that is causing it to be an order of 
magnitude slower?

Note, if you are reading the binaries out of memory from the device, and don’t 
have local symbols, things go much more slowly.  gdb-remote is NOT a high 
bandwidth protocol, and fetching all the symbols through a series of memory 
reads is pretty slow.  lldb does have a setting to control what you do with 
binaries that don’t exist on the host (target.memory-module-load-level) that 
controls this behavior.  But it just deals with what we do and don’t read and 
makes no attempt to ameliorate the fallout from having a reduced view of the 
symbols in the program.

We did add a “debug just my code” mode to gdb back in the day, when we were 
supporting gdb here at Apple.  Basically just a load-level for symbols for 
libraries whose path matches some pattern.  gdb was quite slow to process 
libraries at that point, and this did speed loading up substantially.  It 
wasn’t that hard to implement, but it had a bunch of fallout.  Mainly because 
even though people think they would like to only debug their own code, they 
actually venture into system code pretty regularly...

For instance, if you don’t have symbols for libraries, the backtracing becomes 
unreliable.   We had to add code to force load libraries when they show up in 
backtraces to get reliable unwinding, which generally meant you had to restart 
the unwind when you found an unloaded library.

People also commonly want to set breakpoints on system functions which if you 
haven’t read symbols you can’t do.    I don’t know about Android but on iOS and 
macOS there are common symbolic breakpoints that people set, to catch error 
conditions and the like.  To work around this we added code so that if you 
specified a shared library when you set a breakpoint we would read in that 
shared library’s symbols, but it was hard to get people to use this.

People also very commonly call system libraries in expressions for a whole 
variety of reasons.  There’s no way to express to the expression parser that it 
should try to load symbols from libraries (and which ones) when it encounters 
an identifier it can’t find.  You’d probably need to do that.

There were other tweaks we had to add to gdb to make this work nicely, but that 
was a long time ago and I can’t remember them right now…

The other problem with this approach is that it often just takes a bunch of 
work that happens predictably when the user starts the debugger, and instead 
makes it happen at some time later, and if it isn’t clear to the user what is 
triggering this slowdown, that is a much worse experience.

Anyway, I don’t see why startup should be taking so long for Android.  It would 
be better to make sure we can't improve whatever is causing these delays before 
we start complicating lldb with this sort of progressive loading of library 
symbols.

Jim


> On May 8, 2020, at 9:07 AM, Emre Kultursay via lldb-dev 
> <lldb-dev@lists.llvm.org> wrote:
> 
> Hi lldb-dev,
> 
> TL;DR: Has there been any efforts to introduce something like "Just My Code" 
> debugging on LLDB? Debugging on Android would really benefit from this.
> 
> Details:
> 
> Native Android apps typically have a single .so file from the user, but load 
> ~150 system libraries.
> 
> When attaching LLDB remotely to an Android app, a significant amount of time 
> is spent on loading modules for those system libraries, even with a warm LLDB 
> cache that contains a copy of all these libraries. 
> 
> With a cold LLDB cache, things are much worse, because LLDB copies all those 
> libraries from the device back to the host to populate its cache. While one 
> might think this happens only once for a user, things are a bit worse for 
> Android. There are just too many libraries to copy, making it very slow, 
> there are new Android releases every year, and users typically use multiple 
> devices (e.g., x86, x86_64 emulators, arm32, arm64 devices), and multiple 
> hosts (work, home, laptop/desktop); thereby suffering from this issue more 
> than necessary.
> 
> If we can eliminate the latency of loading these modules, we can deliver a 
> much faster debugging startup time. In essence, this can be considered as a 
> form of Just My Code debugging. 
> 
> Prototype and Experiments
> 
> I built a simple prototype that only loads a single user module, and totally 
> avoids loading ~150 system modules. I ran it on my Windows host against an 
> Android emulator to measure the end to end latency of "Connect + Attach + 
> Resume + Hit 1st breakpoint immediately" .
> For warm LLDB cache:
> Without just-my-code: 23 seconds
> With just-my-code: 14 seconds
> For cold LLDB cache:
> Without just-my-code: 120 seconds
> With just-my-code: 16 seconds
> 
> I want to solicit some feedback and gather thoughts around this idea. It 
> would be great if there are any existing alternatives in LLDB to achieve my 
> goal, but otherwise, I can implement this on LLDB and I'd appreciate it if 
> anyone has any advice on how to implement such a feature.
> 
> Thanks.
> -Emre
> 
> 
>  
> _______________________________________________
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Reply via email to