Re: [lldb-dev] LLDB Evolution - Final Form

Greg Clayton via lldb-dev Tue, 20 Sep 2016 10:27:08 -0700

But StringRef is a smart string wrapper and it is there for exactly this 
reason: to make string handling correct. So let us let it be smart and not 
crash if you make it with null and call .size() on it...
> On Sep 19, 2016, at 2:37 PM, Zachary Turner <ztur...@google.com> wrote:
> 
> FWIW, strlen(nullptr) will also crash just as easily as StringRef(nullptr) 
> will crash, so that one isn't a particularly convincing example of poorly 
> used asserts, since the onus on the developer is exactly the same as with 
> strlen.  That said, I still know where you're coming from :)
> 
> On Mon, Sep 19, 2016 at 2:34 PM Greg Clayton <gclay...@apple.com> wrote:
> 
> > On Sep 19, 2016, at 2:20 PM, Mehdi Amini <mehdi.am...@apple.com> wrote:
> >
> >>
> >> On Sep 19, 2016, at 2:02 PM, Greg Clayton via lldb-dev 
> >> <lldb-dev@lists.llvm.org> wrote:
> >>
> >>
> >>> On Sep 19, 2016, at 1:18 PM, Zachary Turner via lldb-dev 
> >>> <lldb-dev@lists.llvm.org> wrote:
> >>>
> >>> Following up with Kate's post from a few weeks ago, I think the dust has 
> >>> settled on the code reformat and it went over pretty smoothly for the 
> >>> most part.  So I thought it might be worth throwing out some ideas for 
> >>> where we go from here.  I have a large list of ideas (more ideas than 
> >>> time, sadly) that I've been collecting over the past few weeks, so I 
> >>> figured I would throw them out in the open for discussion.
> >>>
> >>> I’ve grouped the areas for improvement into 3 high level categories.
> >>>
> >>>     • De-inventing the wheel - We should use more code from LLVM, and 
> >>> delete code in LLDB where LLVM provides a solution.  In cases where there 
> >>> is an LLVM thing that is *similar* to what we need, we should extend the 
> >>> LLVM thing to support what we need, and then use it.  Following are some 
> >>> areas I've identified.  This list is by no means complete.  For each one, 
> >>> I've given a personal assessment of how likely it is to cause some 
> >>> (temporary) hiccups, how much it would help us in the long run, and how 
> >>> difficult it would be to do.  Without further ado:
> >>>             • Use llvm::Regex instead of lldb::Regex
> >>>                     • llvm::Regex doesn’t support enhanced mode.  Could 
> >>> we add support for this to llvm::Regex?
> >>>                     • Risk: 6
> >>>                     • Impact: 3
> >>>                     • Difficulty / Effort: 3  (5 if we have to add 
> >>> enhanced mode support)
> >>
> >> As long as it supports all of the features, then this is fine. Otherwise 
> >> we will need to modify the regular expressions to work with the LLVM 
> >> version or back port any advances regex features we need into the LLVM 
> >> regex parser.
> >>
> >>>             • Use llvm streams instead of lldb::StreamString
> >>>                     • Supports output re-targeting (stderr, stdout, 
> >>> std::string, etc), printf style formatting, and type-safe streaming 
> >>> operators.
> >>>                     • Interoperates nicely with many existing llvm 
> >>> utility classes
> >>>                     • Risk: 4
> >>>                     • Impact: 5
> >>>                     • Difficulty / Effort: 7
> >>
> >> I have worked extensively with both C++ streams and with printf. I don't 
> >> find the type safe streaming operators to be readable and a great way to 
> >> place things into streams. Part of my objection to this will be quelled if 
> >> the LLVM streams are not stateful like C++ streams are (add an extra 
> >> "std::hex" into the stream and suddenly other things down stream start 
> >> coming out in hex). As long as printf functionality is in the LLVM streams 
> >> I am ok with it.
> >>
> >>>             • Use llvm::Error instead of lldb::Error
> >>>                     • llvm::Error is an error class that *requires* you 
> >>> to check whether it succeeded or it will assert.  In a way, it's similar 
> >>> to a C++ exception, except that it doesn't come with the performance hit 
> >>> associated with exceptions.  It's extensible, and can be easily extended 
> >>> to support the various ways LLDB needs to construct errors and error 
> >>> messages.
> >>>                     • Would need to first rename lldb::Error to LLDBError 
> >>> so that te conversion from LLDBError to llvm::Error could be done 
> >>> incrementally.
> >>>                     • Risk: 7
> >>>                     • Impact: 7
> >>>                     • Difficulty / Effort: 8
> >>
> >> We must be very careful here. Nothing can crash LLDB and adding something 
> >> that will be known to crash in some cases can only be changed if there is 
> >> a test that can guarantee that it won't crash. assert() calls in a shared 
> >> library like LLDB are not OK and shouldn't fire. Even if they do, when 
> >> asserts are off, then it shouldn't crash LLDB. We have made efforts to 
> >> only use asserts in LLDB for things that really can't happen, but for 
> >> things like constructing a StringRef with NULL is one example of why I 
> >> don't want to use certain classes in LLVM. If we can remove the crash 
> >> threat, then lets use them. But many people firmly believe that asserts 
> >> belong in llvm and clang code and I don't agree.
> >
> > I’m surprise by your aversion to assertions, what is your suggested 
> > alternative? Are you expecting to check and handle every possible 
> > invariants everywhere and recover (or signal an error) properly? That does 
> > not seem practical, and it seems to me that it'd lead to just involving UB 
> > (or breaking design invariant) without actually noticing it.
> 
> We have to as a debugger. We get input from a variety of different compilers 
> and we parse and use things there were produced by our toolchains and others. 
> Crashing Xcode because someone accidentally created a llvm::StringRef(NULL) 
> should not happen. It is not OK for library code to crash. This is quite OK 
> for compilers, linkers and other tools, but it isn't for any other code. 
> Since there are so many asserts, LLDB code is currently responsible for 
> figuring out what will make clang unhappy and we must be sure to not feed it 
> anything that makes it unhappy or we crash. So I don't see that as better.
> 
> >
> >> Also many asserts are in header files so even if you build llvm and clang 
> >> with asserts off, but then build our project that uses LLVM with asserts 
> >> on, you get LLVM and clang asserts when you don't want them.
> >
> > It is not really supported to use the LLVM C++ headers without assertions 
> > and link to a LLVM build with assertions (and vice-versa).
> 
> As we have found out. We claim llvm and clang can be linked into tools a 
> libraries, but it really means, you should really use it out of process 
> because it might crash at any time so don't try and use it in a program you 
> actually want to be able run for a long time.
> 
> >
> >
> >>
> >>>             • StringRef instead of const char *, len everywhere
> >>>                     • Can do most common string operations in a way that 
> >>> is guaranteed to be safe.
> >>>                     • Reduces string manipulation algorithm complexity by 
> >>> an order of magnitude.
> >>>                     • Can potentially eliminate tens of thousands of 
> >>> string copies across the codebase.
> >>>                     • Simplifies code.
> >>>                     • Risk: 3
> >>>                     • Impact: 8
> >>>                     • Difficulty / Effort: 7
> >>
> >> I don't think StringRef needs to be used everywhere, but it many places it 
> >> does make sense. For example our public API should not contain any LLVM 
> >> classes in the API. "const char *" is a fine parameter for public 
> >> functions. I really hate the fact that constructing llvm::StringRef with 
> >> NULL causes an assertion. Many people don't know that and don't take that 
> >> into account when making their changes. I would love to see the assertion 
> >> taken out to tell you the truth. That would make me feel better about 
> >> using StringRef in many more places within LLDB as we shouldn't ever crash 
> >> due to this. I would expect most internal APIs are safe to move to 
> >> llvm::StringRef, but we will need to make sure none of these require a 
> >> null terminated string which would cause us to have to call 
> >> "str.str().c_str()". So internally, yes we can cut over to more use of 
> >> StringRef. But the public API can't be changed.
> >>
> >>
> >>>             • ArrayRef instead of const void *, len everywhere
> >>>                     • Same analysis as StringRef
> >>
> >> Internally yes, external APIs no.
> >>>             • MutableArrayRef instead of void *, len everywhere
> >>>                     • Same analysis as StringRef
> >> Internally yes, external APIs no.
> >>>             • Delete ConstString, use a modified StringPool that is 
> >>> thread-safe.
> >>>                     • StringPool is a non thread-safe version of 
> >>> ConstString.
> >>>                     • Strings are internally refcounted so they can be 
> >>> cleaned up when they are no longer used.  ConstStrings are a large source 
> >>> of memory in LLDB, so ref-counting and removing stale strings has the 
> >>> potential to be a huge savings.
> >>>                     • Risk: 2
> >>>                     • Impact: 9
> >>>                     • Difficulty / Effort: 6
> >>
> >> We can't currently rely on ref counting. We currently hand out "const char 
> >> *" as return values from many public API calls and these rely on the 
> >> strings living forever. We many many existing clients and we can't change 
> >> the public API. So I vote to avoid this. We are already using LLVM string 
> >> pools under the covers and we also associate the mangled/demangled names 
> >> in the map as is saves us a ton of cycles when parsing stuff as we never 
> >> demangle (very expensive) the same name twice. So I don't see the need to 
> >> change this as it is already backed by LLVM technology under the covers.
> >>
> >>>             • thread_local instead of lldb::ThreadLocal
> >>>                     • This fixes a number of bugs on Windows that cannot 
> >>> be fixed otherwise, as they require compiler support.
> >>>                     • Some other compilers may not support this yet?
> >>>                     • Risk: 2
> >>>                     • Impact: 3
> >>>                     • Difficulty: 3
> >>
> >> As long as all compilers support this then this is fine.
> >>
> >>>             • Use llvm::cl for the command line arguments to the primary 
> >>> lldb executable.
> >>>                     • Risk: 2
> >>>                     • Impact: 3
> >>>                     • Difficulty / Effort: 4
> >>
> >> Easy and fine to switch over to. We might need to port some getopt_long 
> >> functionality into it if it doesn't handle all of the things that 
> >> getopt_long does. For example arguments and options can be interspersed in 
> >> getopt_long(). You can also terminate your arguments with "--". It accepts 
> >> single dashes for long option names if they don't conflict with short 
> >> options. Many things like this.
> >>
> >>>     • Testing - Our testing infrastructure is unstable, and our test 
> >>> coverage is lacking.  We should take steps to improve this.
> >>>             • Port as much as possible to lit
> >>>                     • Simple tests should be trivial to port to lit 
> >>> today.  If nothing else this serves as a proof of concept while 
> >>> increasing the speed and stability of the test suite, since lit is a more 
> >>> stable harness.
> >>
> >> As long a tests use the public API to run test. We are not doing text 
> >> scraping.
> >>
> >>>             • Separate testing tools
> >>>                     • One question that remains open is how to represent 
> >>> the complicated needs of a debugger in lit tests.  Part a) above covers 
> >>> the trivial cases, but what about the difficult cases?  In 
> >>> https://reviews.llvm.org/D24591 a number of ideas were discussed.  We 
> >>> started getting to this idea towards the end, about a separate tool which 
> >>> has an interface independent of the command line interface and which can 
> >>> be used to test.  lldb-mi was mentioned.  While I have serious concerns 
> >>> about lldb-mi due to its poorly written and tested codebase, I do agree 
> >>> in principle with the methodology.  In fact, this is the entire 
> >>> philosophy behind lit as used with LLVM, clang, lld, etc.
> >>
> >> We have a public API... Why are we going to go and spend _any_ time on 
> >> doing anything else? I just don't get it. What a waste of time. We have a 
> >> public API. Lets use it. Not simple enough for people? Tough. Testing a 
> >> debugger should be done using the public API except when we are actually 
> >> trying to test the LLDB command line in pexpect like tests. Those and only 
> >> those are fine to covert over to using LIT and use text scraping. But 95% 
> >> of our testing should be done using the API that our IDEs use. Using MI? 
> >> Please no.
> >>>
> >>> I don’t take full credit for this idea.  I had been toying with a similar 
> >>> idea for some time, but it was further cemented in an offline discussion 
> >>> with a co-worker.
> >>>
> >>> There many small, targeted tools in LLVM (e.g. llc, lli, llvm-objdump, 
> >>> etc) whose purpose are to be chained together to do interesting things.  
> >>> Instead of a command line api as we think of in LLDB where you type 
> >>> commands from an interactive prompt, they have a command line api as you 
> >>> would expect from any tool which is launched from a shell.
> >>>
> >>> I can imagine many potential candidates for lldb tools of this nature.  
> >>> Off the top of my head:
> >>>     • lldb-unwind - A tool for testing the unwinder.  Accepts byte code 
> >>> as input and passes it through to the unwinder, outputting a compressed 
> >>> summary of the steps taken while unwinding, which could be pattern 
> >>> matched in lit.  The output format is entirely controlled by the tool, 
> >>> and not by the unwinder itself, so it would be stable in the face of 
> >>> changes to the underlying unwinder.  Could have various options to enable 
> >>> or disable features of the unwinder in order to force the unwinder into 
> >>> modes that can be tricky to encounter in the wild.
> >>>     • lldb-symbol - A tool for testing symbol resolution.  Could have 
> >>> options for testing things like:
> >>>             • Determining if a symbol matches an executable
> >>>             • looking up a symbol by name in the debug info, and mapping 
> >>> it to an address in the process.
> >>>             • Displaying candidate symbols when doing name lookup in a 
> >>> particular scope (e.g. while stopped at a breakpoint).
> >>>     • lldb-breakpoint - A tool for testing breakpoints and stepping.  
> >>> Various options could include:
> >>>             • Set breakpoints and out addresses and/or symbol names where 
> >>> they were resolved to.
> >>>             • Trigger commands, so that when a breakpoint is hit the tool 
> >>> could automatically continue and try to run to another breakpoint, etc.
> >>>             • options to inspect certain useful pieces of state about an 
> >>> inferior, to be matched in lit.
> >>>     • lldb-interpreter - tests the jitter etc.  I don’t know much about 
> >>> this, but I don’t see why this couldn’t be tested in a manner similar to 
> >>> how lli is tested.
> >>>     • lldb-platform - tests lldb local and remote platform interfaces.
> >>>     • lldb-cli -- lldb interactive command line.
> >>>     • lldb-format - lldb data formatters etc.
> >>
> >>
> >> I agree that testing more functionality it a good idea, but if it is worth 
> >> testing we should see if we can get it into our public API. If so, then we 
> >> test it there. If not, then we come up with another solution. Even in the 
> >> alternate solution, it will be something that will probably create 
> >> structured data (JSON, Yaml, etc) and then parsed, so I would rather spend 
> >> the time getting these things into out public APIs, or test them vai our 
> >> public APIs.
> >>
> >>>     • Tests NOW, not later.
> >>>             • I know we’ve been over this a million times and it’s not 
> >>> worth going over the arguments again.  And I know it’s hard to write 
> >>> tests, often requiring the invention of new SB APIs.  Hopefully those 
> >>> issues will be addressed by above a) and b) above and writing tests will 
> >>> be easier.  Vedant Kumar ran some analytics on the various codebases and 
> >>> found that LLDB has the lowest test / commit ratio of any LLVM project 
> >>> (He didn’t post numbers for lld, so I’m not sure what it is there).
> >>>                     • lldb: 287 of the past 1000 commits
> >>>                     • llvm: 511 of the past 1000 commits
> >>>                     • clang: 622 of the past 1000 commits
> >>>                     • compiler-rt: 543 of the past 1000 commits
> >>> This is an alarming statistic, and I would love to see this number closer 
> >>> to 50%.
> >>>     • Code style / development conventions - Aside from just the column 
> >>> limitations and bracing styles, there are other areas where LLDB differs 
> >>> from LLVM on code style.  We should continue to adopt more of LLVM's 
> >>> style where it makes sense.  I've identified a couple of areas 
> >>> (incomplete list) which I outline below.
> >>>             • Clean up the mess of cyclical dependencies and properly 
> >>> layer the libraries.  This is especially important for things like 
> >>> lldb-server that need to link in as little as possible, but regardless it 
> >>> leads to a more robust architecture, faster build and link times, better 
> >>> testability, and is required if we ever want to do a modules build of LLDB
> >>>             • Use CMake instead of Xcode project (CMake supports 
> >>> Frameworks).  CMake supports Apple Frameworks, so the main roadblock to 
> >>> getting this working is just someone doing it.  Segmenting the build 
> >>> process by platform doesn't make sense for the upstream, especially when 
> >>> there is a perfectly workable solution.  I have no doubt that the 
> >>> resulting Xcode workspace generated automatically by CMake will not be as 
> >>> "nice" as one that is maintained by hand.  We face this problem with 
> >>> Visual Studio on Windows as well.  The solution that most people have 
> >>> adopted is to continue using the IDE for code editing and debugging, but 
> >>> for actually running the build, use CMake with Ninja.  A similar workflow 
> >>> should still be possible with an OSX CMake build, but as I do not work 
> >>> every day on a Mac, all I can say is that it's possible, I have no idea 
> >>> how impactful it would be on peoples' workflows.
> >>>             • Variable naming conventions
> >>>                     • I don’t expect anyone is too fond of LLDB’s naming 
> >>> conventions, but if we’re committed to joining the LLVM ecosystem, then 
> >>> let’s go all the way.
> >>
> >> Hopefully we can get LLVM and Clang to adopt naming member variables with 
> >> a prefix... If so, that is my main remaining concern.
> >>
> >>>             • Use more modern C++ and less C
> >>>                     • Old habits die hard, but this isn’t just a matter 
> >>> of style.  It leads to safer, more robust, and less fragile code as well.
> >>>             • Shorter functions and classes with more narrowly targeted 
> >>> responsibilities
> >>>                     • It’s not uncommon to find functions that are 
> >>> hundreds (and in a few cases even 1,000+) of lines long.  We really need 
> >>> to be better about breaking functions and classes down into smaller 
> >>> responsibilities.  This helps not just for someone coming in to read the 
> >>> function, but also for testing.  Smaller functions are easier to unit 
> >>> test.
> >>>             • Convert T foo(X, Y, Error &error) functions to Expected<T> 
> >>> foo(X, Y) style (Depends on 1.c)
> >>>                     • llvm::Expected is based on the llvm::Error class 
> >>> described earlier.  It’s used when a function is supposed to return a 
> >>> value, but it could fail.  By packaging the error with the return value, 
> >>> it’s impossible to have a situation where you use the return value even 
> >>> in case of an error, and because llvm::Error has mandatory checking, it’s 
> >>> also impossible to have a sitaution where you don’t check the error.  So 
> >>> it’s very safe.
> >>
> >> As crashes if you don't check the errors. One of the big differences 
> >> between LLDB and LLVM/Clang is that asserts are used liberally all over 
> >> clang which make the code very crashy when used in production with 
> >> uncontrolled input like we get in the debugger. We will need to be very 
> >> careful with any changes to make sure we don't increase crash frequency.
> >>
> >>>
> >>> Whew.  That was a lot.  If you made it this far, thanks for reading!
> >>>
> >>> Obviously if we were to embark on all of the above, it would take many 
> >>> months to complete everything.  So I'm not proposing anyone stop what 
> >>> they're doing to work on this.  This is just my own personal wishlist
> >>
> >> There are many great things in here. As long as we are careful to not 
> >> increase the crash frequency in LLDB I am all for it. My mains areas of 
> >> concern are:
> >> - public API can't change in its current form
> >> - no LLVM or clang classes in the public API as arguments or return values.
> >> - don't crash more by introducing assert heavy code into code paths that 
> >> use input from outside sources
> >
> > It seems to me that you are mixing two things: asserting on user inputs vs 
> > asserting on internal invariants of the system. LLVM is very intensive 
> > about asserting on the second category and this seems fine to me.
> > Asserting on external inputs is not great (in LLVM as much as in LLDB).
> >
> > The asserting error class above falls into the second category and is a 
> > great tool to enforce programmer error
> >
> > —
> > Mehdi
>


_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] LLDB Evolution - Final Form

Reply via email to