I worked on some performance improvements for lldb 3.9, and was about to
forward port them so I can submit them for inclusion, but I realized there
has been a major performance drop from 3.9 to 4.0. I am using the official
builds on an Ubuntu 16.04 machine with 16 cores / 32 hyperthreads.
Running
The algorithm included in ObjectFileELF.cpp performs a byte at a time
computation, which causes long pipeline stalls in modern processors.
Unfortunately, the polynomial used is not the same one used by the SSE 4.2
instruction set, but there are two ways to make it faster:
1. Work on multiple bytes
gt; On Wed, Apr 12, 2017 at 12:15 PM Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> The algorithm included in ObjectFileELF.cpp performs a byte at a time
>> computation, which causes long pipeline stalls in modern processors.
>> Unfortunately, the polynom
for regressions to sneak in without anyone noticing.
> So the original idea was hey, we can have something that counts packets for
> distinct operations. Like, this "next" command should take no more than 40
> packets, that kind of thing. And it could be expanded -- "b m
it's available?
>>>
>>> On Wed, Apr 12, 2017 at 12:23 PM, Zachary Turner
>>> wrote:
>>>
>>>> Zlib is definitely optional and we cannot make it required.
>>>>
>>>> Did you check to see if llvm has a crc32 function somewhere in S
Ok I stripped out the zlib crc algorithm and just left the parallelism +
calls to zlib's crc32_combine, but only if we are actually linking with
zlib. I left those calls here (rather than folding them info JamCRC)
because I'm taking advantage of TaskRunner to parallelize the work.
I moved the sys
The POSIX dynamic loader processes one module at a time. If you have a lot
of shared libraries, each with a lot of symbols, this creates unneeded
serialization (despite the use of TaskRunners during symbol loading, there
is still quite a bit of serialization when loading a library).
In order to p
e are trying to a lot of things very lazily (which
> unfortunately makes efficient paralelization more complicated).
>
>
>
> On 13 April 2017 at 06:34, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> The POSIX dynamic loader processes one module at a
to llvm as well if it helps.
>>
>> Not trying to throw extra work on you, but it seems like a really good
>> general purpose improvement and it would be a shame if only lldb can
>> benefit from it.
>> On Wed, Apr 12, 2017 at 8:35 PM Scott Smith via lldb-dev <
>>
>>> lldb-dev@lists.llvm.org> wrote:
>>>
>>>> I know this is outside of your initial goal, but it would be really
>>>> great if JamCRC be updated in llvm to be parallel. I see that you're making
>>>> use of TaskRunner for the parallel
I'm trying to make sure some of my changes don't break lldb tests, but I'm
having trouble getting a clean run even with a plain checkout. I've tried
the latest head of master, as well as release_40. I'm running Ubuntu
16.04/amd64. I built with:
cmake ../llvm -G Ninja -DCMAKE_BUILD_TYPE=Debug
ni
s down ASAP.
>
>
> On 18 April 2017 at 21:24, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> I'm trying to make sure some of my changes don't break lldb tests, but
>> I'm having trouble getting a clean run even with a plain che
Labath wrote:
>>
>>> It looks like we are triggering an assert in llvm on a debug build. I'll
>>> try to track this down ASAP.
>>>
>>>
>>> On 18 April 2017 at 21:24, Scott Smith via lldb-dev <
>>> lldb-dev@lists.llvm.org> wro
tee that.
I assume the change was made to allow proper memory cleanup when the
symbols are discarded?
On Thu, Apr 13, 2017 at 5:37 AM, Pavel Labath wrote:
> Bisecting the performance regression would be extremely valuable. If you
> want to do that, it would be very appreciated.
>
&g
e the change was made to allow proper memory cleanup when the
>> symbols are discarded?
>>
>> On Thu, Apr 13, 2017 at 5:37 AM, Pavel Labath wrote:
>>
>>> Bisecting the performance regression would be extremely valuable. If you
>>> want to do that, it w
g the
>>>>> pointer. Now it needs to use an actual string comparison routine. This
>>>>> code:
>>>>>
>>>>> bool operator<(const Entry &rhs) const { return cstring <
>>>>> rhs.cstring; }
>>>>>
>
On Thu, Apr 20, 2017 at 6:47 AM, Pavel Labath wrote:
> 5. specifying gcc-4.8 instead of the locally compiled clang
>
> has most of the tests passing, with a handful of unexpected successes:
>>
>> UNEXPECTED SUCCESS: TestRegisterVariables.Register
>> VariableTestCase.test_and_run_command_dwarf
>>
Sorry, I take that back. I forgot to save the buffer that ran the test
script. Oops :-(
I get a number of errors that make me think it's missing libc++, which
makes sense because I never installed it. However, I thought clang
automatically falls back to using gcc's libstdc++.
Failures include:
After a dealing with a bunch of microoptimizations, I'm back to
parallelizing loading of shared modules. My naive approach was to just
create a new thread per shared library. I have a feeling some users may
not like that; I think I read an email from someone who has thousands of
shared libraries.
rially?
> Is it feasible to just require tasks to be non blocking?
> On Wed, Apr 26, 2017 at 4:12 PM Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> After a dealing with a bunch of microoptimizations, I'm back to
>> parallelizing loading of share
eading in shared libraries simultaneously, and adding them to the global
> cache. In some of the uses that lldb has under Xcode this is actually very
> common. So the task pool will have to be built up as things are added to
> the global shared module cache, not at the level of individual
her concern is that lldb keeps the modules it reads in a global
> cache, shared by all debuggers & targets. It is very possible that you
> could have two targets or two debuggers each with one target that are
> reading in shared libraries simultaneously, and adding them to the global
&
dy state would be 2 * cores, rather than height * cores. I
think that it probably overkill though.
On Fri, Apr 28, 2017 at 4:37 AM, Pavel Labath wrote:
> On 27 April 2017 at 00:12, Scott Smith via lldb-dev
> wrote:
> > After a dealing with a bunch of microoptimizations, I'm back
Pool to
> make it suitable?
>
> On Fri, Apr 28, 2017 at 8:04 AM Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> Hmmm ok, I don't like hard coding pools. Your idea about limiting the
>> number of high level threads gave me an idea:
>>
&g
On Mon, May 1, 2017 at 2:42 PM, Pavel Labath wrote:
> Besides, hardcoding the nesting logic into "add" is kinda wrong.
> Adding a task is not the problematic operation, waiting for the result
> of one is. Granted, generally these happen on the same thread, but
> they don't have to be -- you can w
another one. If
> there are improvements to be made, let's make them there instead of in LLDB
> so that other LLVM users can benefit.
>
> On Mon, May 1, 2017 at 2:58 PM Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
>
>> On Mon, May 1, 2017 at 2:42 PM,
I've been trying to improve the parallelism of lldb but have run into an
odd roadblock. I have the code at the point where it creates 40 worker
threads, and it stays that way because it has enough work to do. However,
running 'top -d 1' shows that for the time in question, cpu load never gets
abo
LLDB has TaskRunner and TaskPool. TaskPool is nearly the same as
llvm::ThreadPool. TaskRunner itself is a layer on top, though, and doesn't
seem to have an analogy in llvm. Not that I'm defending TaskRunner
I have written a new one called TaskMap. The idea is that if all you want
is to cal
should just be once per loaded
> module.
>
> Jim
>
> > On May 2, 2017, at 8:09 AM, Scott Smith via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
> >
> > I've been trying to improve the parallelism of lldb but have run into an
> odd roadblock. I have
On Tue, May 2, 2017 at 12:43 PM, Greg Clayton wrote:
> The other thing would be to try and move the demangler to use a custom
> allocator everywhere. Not sure what demangler you are using when you are
> doing these tests, but we can either use the native system one from
> the #include , or the fa
I would like to change the list of threads that lldb presents to the user
for an internal application (not to be submitted upstream). It seems the
right way to do this is to write an OperatingSystem plugin.
1. Can I still make it so the user can see real threads as well as whatever
other "threads
Before I dive into the code to see if there's a bug, I wanted to see if I
was just doing it wrong.
I have an application with a different libc, etc than the machine I'm
running the debugger on. The application also has a bunch of libraries
that simply don't exist in the normal location on my dev
When I looked at demangler performance, I was able to make significant
improvements to the llvm demangler. At that point removing lldb's fast
demangler didn't hurt performance very much, but the fast demangler was
still faster. I forget (and apparently didn't write down) how much it
mattered, but
33 matches
Mail list logo