[lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-12 Thread Scott Smith via lldb-dev
I worked on some performance improvements for lldb 3.9, and was about to forward port them so I can submit them for inclusion, but I realized there has been a major performance drop from 3.9 to 4.0. I am using the official builds on an Ubuntu 16.04 machine with 16 cores / 32 hyperthreads. Running

[lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
The algorithm included in ObjectFileELF.cpp performs a byte at a time computation, which causes long pipeline stalls in modern processors. Unfortunately, the polynomial used is not the same one used by the SSE 4.2 instruction set, but there are two ways to make it faster: 1. Work on multiple bytes

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
gt; On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >> The algorithm included in ObjectFileELF.cpp performs a byte at a time >> computation, which causes long pipeline stalls in modern processors. >> Unfortunately, the polynom

Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-12 Thread Scott Smith via lldb-dev
for regressions to sneak in without anyone noticing. > So the original idea was hey, we can have something that counts packets for > distinct operations. Like, this "next" command should take no more than 40 > packets, that kind of thing. And it could be expanded -- "b m

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
it's available? >>> >>> On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner >>> wrote: >>> >>>> Zlib is definitely optional and we cannot make it required. >>>> >>>> Did you check to see if llvm has a crc32 function somewhere in S

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-12 Thread Scott Smith via lldb-dev
Ok I stripped out the zlib crc algorithm and just left the parallelism + calls to zlib's crc32_combine, but only if we are actually linking with zlib. I left those calls here (rather than folding them info JamCRC) because I'm taking advantage of TaskRunner to parallelize the work. I moved the sys

[lldb-dev] Parallelize loading of shared libraries

2017-04-12 Thread Scott Smith via lldb-dev
The POSIX dynamic loader processes one module at a time. If you have a lot of shared libraries, each with a lot of symbols, this creates unneeded serialization (despite the use of TaskRunners during symbol loading, there is still quite a bit of serialization when loading a library). In order to p

Re: [lldb-dev] Parallelize loading of shared libraries

2017-04-13 Thread Scott Smith via lldb-dev
e are trying to a lot of things very lazily (which > unfortunately makes efficient paralelization more complicated). > > > > On 13 April 2017 at 06:34, Scott Smith via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >> The POSIX dynamic loader processes one module at a

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-13 Thread Scott Smith via lldb-dev
to llvm as well if it helps. >> >> Not trying to throw extra work on you, but it seems like a really good >> general purpose improvement and it would be a shame if only lldb can >> benefit from it. >> On Wed, Apr 12, 2017 at 8:35 PM Scott Smith via lldb-dev < >>

Re: [lldb-dev] Improve performance of crc32 calculation

2017-04-18 Thread Scott Smith via lldb-dev
>>> lldb-dev@lists.llvm.org> wrote: >>> >>>> I know this is outside of your initial goal, but it would be really >>>> great if JamCRC be updated in llvm to be parallel. I see that you're making >>>> use of TaskRunner for the parallel

[lldb-dev] Running check-lldb

2017-04-18 Thread Scott Smith via lldb-dev
I'm trying to make sure some of my changes don't break lldb tests, but I'm having trouble getting a clean run even with a plain checkout. I've tried the latest head of master, as well as release_40. I'm running Ubuntu 16.04/amd64. I built with: cmake ../llvm -G Ninja -DCMAKE_BUILD_TYPE=Debug ni

Re: [lldb-dev] Running check-lldb

2017-04-19 Thread Scott Smith via lldb-dev
s down ASAP. > > > On 18 April 2017 at 21:24, Scott Smith via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >> I'm trying to make sure some of my changes don't break lldb tests, but >> I'm having trouble getting a clean run even with a plain che

Re: [lldb-dev] Running check-lldb

2017-04-19 Thread Scott Smith via lldb-dev
Labath wrote: >> >>> It looks like we are triggering an assert in llvm on a debug build. I'll >>> try to track this down ASAP. >>> >>> >>> On 18 April 2017 at 21:24, Scott Smith via lldb-dev < >>> lldb-dev@lists.llvm.org> wro

Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-19 Thread Scott Smith via lldb-dev
tee that. I assume the change was made to allow proper memory cleanup when the symbols are discarded? On Thu, Apr 13, 2017 at 5:37 AM, Pavel Labath wrote: > Bisecting the performance regression would be extremely valuable. If you > want to do that, it would be very appreciated. > &g

Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-19 Thread Scott Smith via lldb-dev
e the change was made to allow proper memory cleanup when the >> symbols are discarded? >> >> On Thu, Apr 13, 2017 at 5:37 AM, Pavel Labath wrote: >> >>> Bisecting the performance regression would be extremely valuable. If you >>> want to do that, it w

Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

2017-04-20 Thread Scott Smith via lldb-dev
g the >>>>> pointer. Now it needs to use an actual string comparison routine. This >>>>> code: >>>>> >>>>> bool operator<(const Entry &rhs) const { return cstring < >>>>> rhs.cstring; } >>>>> >

Re: [lldb-dev] Running check-lldb

2017-04-20 Thread Scott Smith via lldb-dev
On Thu, Apr 20, 2017 at 6:47 AM, Pavel Labath wrote: > 5. specifying gcc-4.8 instead of the locally compiled clang > > has most of the tests passing, with a handful of unexpected successes: >> >> UNEXPECTED SUCCESS: TestRegisterVariables.Register >> VariableTestCase.test_and_run_command_dwarf >>

Re: [lldb-dev] Running check-lldb

2017-04-20 Thread Scott Smith via lldb-dev
Sorry, I take that back. I forgot to save the buffer that ran the test script. Oops :-( I get a number of errors that make me think it's missing libc++, which makes sense because I never installed it. However, I thought clang automatically falls back to using gcc's libstdc++. Failures include:

[lldb-dev] Parallelizing loading of shared libraries

2017-04-26 Thread Scott Smith via lldb-dev
After a dealing with a bunch of microoptimizations, I'm back to parallelizing loading of shared modules. My naive approach was to just create a new thread per shared library. I have a feeling some users may not like that; I think I read an email from someone who has thousands of shared libraries.

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-26 Thread Scott Smith via lldb-dev
rially? > Is it feasible to just require tasks to be non blocking? > On Wed, Apr 26, 2017 at 4:12 PM Scott Smith via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >> After a dealing with a bunch of microoptimizations, I'm back to >> parallelizing loading of share

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-27 Thread Scott Smith via lldb-dev
eading in shared libraries simultaneously, and adding them to the global > cache. In some of the uses that lldb has under Xcode this is actually very > common. So the task pool will have to be built up as things are added to > the global shared module cache, not at the level of individual

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-27 Thread Scott Smith via lldb-dev
her concern is that lldb keeps the modules it reads in a global > cache, shared by all debuggers & targets. It is very possible that you > could have two targets or two debuggers each with one target that are > reading in shared libraries simultaneously, and adding them to the global &

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-28 Thread Scott Smith via lldb-dev
dy state would be 2 * cores, rather than height * cores. I think that it probably overkill though. On Fri, Apr 28, 2017 at 4:37 AM, Pavel Labath wrote: > On 27 April 2017 at 00:12, Scott Smith via lldb-dev > wrote: > > After a dealing with a bunch of microoptimizations, I'm back

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-04-30 Thread Scott Smith via lldb-dev
Pool to > make it suitable? > > On Fri, Apr 28, 2017 at 8:04 AM Scott Smith via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >> Hmmm ok, I don't like hard coding pools. Your idea about limiting the >> number of high level threads gave me an idea: >> &g

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-05-01 Thread Scott Smith via lldb-dev
On Mon, May 1, 2017 at 2:42 PM, Pavel Labath wrote: > Besides, hardcoding the nesting logic into "add" is kinda wrong. > Adding a task is not the problematic operation, waiting for the result > of one is. Granted, generally these happen on the same thread, but > they don't have to be -- you can w

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-05-01 Thread Scott Smith via lldb-dev
another one. If > there are improvements to be made, let's make them there instead of in LLDB > so that other LLVM users can benefit. > > On Mon, May 1, 2017 at 2:58 PM Scott Smith via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > >> On Mon, May 1, 2017 at 2:42 PM,

[lldb-dev] Lack of parallelism

2017-05-02 Thread Scott Smith via lldb-dev
I've been trying to improve the parallelism of lldb but have run into an odd roadblock. I have the code at the point where it creates 40 worker threads, and it stays that way because it has enough work to do. However, running 'top -d 1' shows that for the time in question, cpu load never gets abo

Re: [lldb-dev] Parallelizing loading of shared libraries

2017-05-02 Thread Scott Smith via lldb-dev
LLDB has TaskRunner and TaskPool. TaskPool is nearly the same as llvm::ThreadPool. TaskRunner itself is a layer on top, though, and doesn't seem to have an analogy in llvm. Not that I'm defending TaskRunner I have written a new one called TaskMap. The idea is that if all you want is to cal

Re: [lldb-dev] Lack of parallelism

2017-05-02 Thread Scott Smith via lldb-dev
should just be once per loaded > module. > > Jim > > > On May 2, 2017, at 8:09 AM, Scott Smith via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > > > > I've been trying to improve the parallelism of lldb but have run into an > odd roadblock. I have

Re: [lldb-dev] Lack of parallelism

2017-05-02 Thread Scott Smith via lldb-dev
On Tue, May 2, 2017 at 12:43 PM, Greg Clayton wrote: > The other thing would be to try and move the demangler to use a custom > allocator everywhere. Not sure what demangler you are using when you are > doing these tests, but we can either use the native system one from > the #include , or the fa

[lldb-dev] OperatingSystem plugins

2017-05-04 Thread Scott Smith via lldb-dev
I would like to change the list of threads that lldb presents to the user for an internal application (not to be submitted upstream). It seems the right way to do this is to write an OperatingSystem plugin. 1. Can I still make it so the user can see real threads as well as whatever other "threads

[lldb-dev] Setting shared library search paths and core files

2017-05-04 Thread Scott Smith via lldb-dev
Before I dive into the code to see if there's a bug, I wanted to see if I was just doing it wrong. I have an application with a different libc, etc than the machine I'm running the debugger on. The application also has a bunch of libraries that simply don't exist in the normal location on my dev

Re: [lldb-dev] [llvm-dev] RFC: Cleaning up the Itanium demangler

2017-06-22 Thread Scott Smith via lldb-dev
When I looked at demangler performance, I was able to make significant improvements to the llvm demangler. At that point removing lldb's fast demangler didn't hurt performance very much, but the fast demangler was still faster. I forget (and apparently didn't write down) how much it mattered, but