> On Sep 20, 2016, at 10:33 AM, Greg Clayton <gclay...@apple.com> wrote: > >> >> On Sep 19, 2016, at 2:46 PM, Mehdi Amini <mehdi.am...@apple.com> wrote: >> >>> >>> On Sep 19, 2016, at 2:34 PM, Greg Clayton <gclay...@apple.com> wrote: >>> >>>> >>>> On Sep 19, 2016, at 2:20 PM, Mehdi Amini <mehdi.am...@apple.com> wrote: >>>> >>>>> >>>>> On Sep 19, 2016, at 2:02 PM, Greg Clayton via lldb-dev >>>>> <lldb-dev@lists.llvm.org> wrote: >>>>> >>>>> >>>>>> On Sep 19, 2016, at 1:18 PM, Zachary Turner via lldb-dev >>>>>> <lldb-dev@lists.llvm.org> wrote: >>>>>> >>>>>> Following up with Kate's post from a few weeks ago, I think the dust has >>>>>> settled on the code reformat and it went over pretty smoothly for the >>>>>> most part. So I thought it might be worth throwing out some ideas for >>>>>> where we go from here. I have a large list of ideas (more ideas than >>>>>> time, sadly) that I've been collecting over the past few weeks, so I >>>>>> figured I would throw them out in the open for discussion. >>>>>> >>>>>> I’ve grouped the areas for improvement into 3 high level categories. >>>>>> >>>>>> • De-inventing the wheel - We should use more code from LLVM, and >>>>>> delete code in LLDB where LLVM provides a solution. In cases where >>>>>> there is an LLVM thing that is *similar* to what we need, we should >>>>>> extend the LLVM thing to support what we need, and then use it. >>>>>> Following are some areas I've identified. This list is by no means >>>>>> complete. For each one, I've given a personal assessment of how likely >>>>>> it is to cause some (temporary) hiccups, how much it would help us in >>>>>> the long run, and how difficult it would be to do. Without further ado: >>>>>> • Use llvm::Regex instead of lldb::Regex >>>>>> • llvm::Regex doesn’t support enhanced mode. Could we >>>>>> add support for this to llvm::Regex? >>>>>> • Risk: 6 >>>>>> • Impact: 3 >>>>>> • Difficulty / Effort: 3 (5 if we have to add enhanced >>>>>> mode support) >>>>> >>>>> As long as it supports all of the features, then this is fine. Otherwise >>>>> we will need to modify the regular expressions to work with the LLVM >>>>> version or back port any advances regex features we need into the LLVM >>>>> regex parser. >>>>> >>>>>> • Use llvm streams instead of lldb::StreamString >>>>>> • Supports output re-targeting (stderr, stdout, >>>>>> std::string, etc), printf style formatting, and type-safe streaming >>>>>> operators. >>>>>> • Interoperates nicely with many existing llvm utility >>>>>> classes >>>>>> • Risk: 4 >>>>>> • Impact: 5 >>>>>> • Difficulty / Effort: 7 >>>>> >>>>> I have worked extensively with both C++ streams and with printf. I don't >>>>> find the type safe streaming operators to be readable and a great way to >>>>> place things into streams. Part of my objection to this will be quelled >>>>> if the LLVM streams are not stateful like C++ streams are (add an extra >>>>> "std::hex" into the stream and suddenly other things down stream start >>>>> coming out in hex). As long as printf functionality is in the LLVM >>>>> streams I am ok with it. >>>>> >>>>>> • Use llvm::Error instead of lldb::Error >>>>>> • llvm::Error is an error class that *requires* you to >>>>>> check whether it succeeded or it will assert. In a way, it's similar to >>>>>> a C++ exception, except that it doesn't come with the performance hit >>>>>> associated with exceptions. It's extensible, and can be easily extended >>>>>> to support the various ways LLDB needs to construct errors and error >>>>>> messages. >>>>>> • Would need to first rename lldb::Error to LLDBError >>>>>> so that te conversion from LLDBError to llvm::Error could be done >>>>>> incrementally. >>>>>> • Risk: 7 >>>>>> • Impact: 7 >>>>>> • Difficulty / Effort: 8 >>>>> >>>>> We must be very careful here. Nothing can crash LLDB and adding something >>>>> that will be known to crash in some cases can only be changed if there is >>>>> a test that can guarantee that it won't crash. assert() calls in a shared >>>>> library like LLDB are not OK and shouldn't fire. Even if they do, when >>>>> asserts are off, then it shouldn't crash LLDB. We have made efforts to >>>>> only use asserts in LLDB for things that really can't happen, but for >>>>> things like constructing a StringRef with NULL is one example of why I >>>>> don't want to use certain classes in LLVM. If we can remove the crash >>>>> threat, then lets use them. But many people firmly believe that asserts >>>>> belong in llvm and clang code and I don't agree. >>>> >>>> I’m surprise by your aversion to assertions, what is your suggested >>>> alternative? Are you expecting to check and handle every possible >>>> invariants everywhere and recover (or signal an error) properly? That does >>>> not seem practical, and it seems to me that it'd lead to just involving UB >>>> (or breaking design invariant) without actually noticing it. >>> >>> We have to as a debugger. We get input from a variety of different >>> compilers and we parse and use things there were produced by our toolchains >>> and others. Crashing Xcode because someone accidentally created a >>> llvm::StringRef(NULL) should not happen. It is not OK for library code to >>> crash. This is quite OK for compilers, linkers and other tools, but it >>> isn't for any other code. >> >> Well, except that no, it is not OK for the compiler either, we want to use >> it as a library (JIT, etc.). >> But it comes back to my point: you are against asserting for enforcing >> “external input” where there should be sanitization done with proper error >> handling, which does not say anything about the majority of assertions which >> are not dealing with user input. > > I am all for asserting in debug builds. Just not in release builds.
Right, we’re on the same page, and that’s what we do in LLVM. > And just because you assert, this doesn't mean it is ok to not handle what is > being asserted. There is code in clang like: > > void do_something(foo *p) > { > assert(p); > p->crash(); > } > > This should be: > > void do_something(foo *p) > { > assert(p); > if (p) > p->crash(); > } Well to be honest I wouldn’t use a pointer in the first place but a reference in such case. Otherwise, I’d be fine with such code inside a private API (inside a LLVM library for example), where you sanitize the input at the API boundary and assert on invariant inside the library. >> For instance, it is still not clear to me what’s wrong with the asserting >> Error class mentioned before. > > If someone adds code that succeeds almost all of the time, including in the > test suite, and then we release LLDB and someone uses bad debug info and the > error condition suddenly fires and they didn't check the error, it is not > acceptable to crash. This is not what it is about, the assertion fires when the error is *unchecked*, independently of success or failure, for example: ErrorOr<Foo> getFoo(); Foo bar() { auto F = getFoo(); return F.getValue(); } Here the developer does not check the returned error from the call to getFoo(). This is a case where bad-things would happen when there is an actual error, but it wouldn’t be caught during testing until the error actually happens. The LLVM Error class will *always* assert, even when getFoo() succeeds and there is no actual error. — Mehdi >> >> >>> Since there are so many asserts, LLDB code is currently responsible for >>> figuring out what will make clang unhappy and we must be sure to not feed >>> it anything that makes it unhappy or we crash. So I don't see that as >>> better. >>> >>>> >>>>> Also many asserts are in header files so even if you build llvm and clang >>>>> with asserts off, but then build our project that uses LLVM with asserts >>>>> on, you get LLVM and clang asserts when you don't want them. >>>> >>>> It is not really supported to use the LLVM C++ headers without assertions >>>> and link to a LLVM build with assertions (and vice-versa). >>> >>> As we have found out. We claim llvm and clang can be linked into tools a >>> libraries, but it really means, you should really use it out of process >>> because it might crash at any time so don't try and use it in a program you >>> actually want to be able run for a long time. >> >> I’m not sure how this sentence relates to the fact that the headers are not >> ABI insensitive to -DDEBUG. >> It seems totally orthogonal. > > It causes us to crash. There is nothing that currently enforces the fact that > you can build llvm/clang without asserts yet use the headers with asserts on. > It happens, and it is happening to us. If you are designing a library, then > asserts in header files are just a bad idea. Very easy to avoid and fix. > >> >> — >> Mehdi >> >> >>> >>>> >>>> >>>>> >>>>>> • StringRef instead of const char *, len everywhere >>>>>> • Can do most common string operations in a way that is >>>>>> guaranteed to be safe. >>>>>> • Reduces string manipulation algorithm complexity by >>>>>> an order of magnitude. >>>>>> • Can potentially eliminate tens of thousands of string >>>>>> copies across the codebase. >>>>>> • Simplifies code. >>>>>> • Risk: 3 >>>>>> • Impact: 8 >>>>>> • Difficulty / Effort: 7 >>>>> >>>>> I don't think StringRef needs to be used everywhere, but it many places >>>>> it does make sense. For example our public API should not contain any >>>>> LLVM classes in the API. "const char *" is a fine parameter for public >>>>> functions. I really hate the fact that constructing llvm::StringRef with >>>>> NULL causes an assertion. Many people don't know that and don't take that >>>>> into account when making their changes. I would love to see the assertion >>>>> taken out to tell you the truth. That would make me feel better about >>>>> using StringRef in many more places within LLDB as we shouldn't ever >>>>> crash due to this. I would expect most internal APIs are safe to move to >>>>> llvm::StringRef, but we will need to make sure none of these require a >>>>> null terminated string which would cause us to have to call >>>>> "str.str().c_str()". So internally, yes we can cut over to more use of >>>>> StringRef. But the public API can't be changed. >>>>> >>>>> >>>>>> • ArrayRef instead of const void *, len everywhere >>>>>> • Same analysis as StringRef >>>>> >>>>> Internally yes, external APIs no. >>>>>> • MutableArrayRef instead of void *, len everywhere >>>>>> • Same analysis as StringRef >>>>> Internally yes, external APIs no. >>>>>> • Delete ConstString, use a modified StringPool that is >>>>>> thread-safe. >>>>>> • StringPool is a non thread-safe version of >>>>>> ConstString. >>>>>> • Strings are internally refcounted so they can be >>>>>> cleaned up when they are no longer used. ConstStrings are a large >>>>>> source of memory in LLDB, so ref-counting and removing stale strings has >>>>>> the potential to be a huge savings. >>>>>> • Risk: 2 >>>>>> • Impact: 9 >>>>>> • Difficulty / Effort: 6 >>>>> >>>>> We can't currently rely on ref counting. We currently hand out "const >>>>> char *" as return values from many public API calls and these rely on the >>>>> strings living forever. We many many existing clients and we can't change >>>>> the public API. So I vote to avoid this. We are already using LLVM string >>>>> pools under the covers and we also associate the mangled/demangled names >>>>> in the map as is saves us a ton of cycles when parsing stuff as we never >>>>> demangle (very expensive) the same name twice. So I don't see the need to >>>>> change this as it is already backed by LLVM technology under the covers. >>>>> >>>>>> • thread_local instead of lldb::ThreadLocal >>>>>> • This fixes a number of bugs on Windows that cannot be >>>>>> fixed otherwise, as they require compiler support. >>>>>> • Some other compilers may not support this yet? >>>>>> • Risk: 2 >>>>>> • Impact: 3 >>>>>> • Difficulty: 3 >>>>> >>>>> As long as all compilers support this then this is fine. >>>>> >>>>>> • Use llvm::cl for the command line arguments to the primary >>>>>> lldb executable. >>>>>> • Risk: 2 >>>>>> • Impact: 3 >>>>>> • Difficulty / Effort: 4 >>>>> >>>>> Easy and fine to switch over to. We might need to port some getopt_long >>>>> functionality into it if it doesn't handle all of the things that >>>>> getopt_long does. For example arguments and options can be interspersed >>>>> in getopt_long(). You can also terminate your arguments with "--". It >>>>> accepts single dashes for long option names if they don't conflict with >>>>> short options. Many things like this. >>>>> >>>>>> • Testing - Our testing infrastructure is unstable, and our test >>>>>> coverage is lacking. We should take steps to improve this. >>>>>> • Port as much as possible to lit >>>>>> • Simple tests should be trivial to port to lit today. >>>>>> If nothing else this serves as a proof of concept while increasing the >>>>>> speed and stability of the test suite, since lit is a more stable >>>>>> harness. >>>>> >>>>> As long a tests use the public API to run test. We are not doing text >>>>> scraping. >>>>> >>>>>> • Separate testing tools >>>>>> • One question that remains open is how to represent >>>>>> the complicated needs of a debugger in lit tests. Part a) above covers >>>>>> the trivial cases, but what about the difficult cases? In >>>>>> https://reviews.llvm.org/D24591 a number of ideas were discussed. We >>>>>> started getting to this idea towards the end, about a separate tool >>>>>> which has an interface independent of the command line interface and >>>>>> which can be used to test. lldb-mi was mentioned. While I have serious >>>>>> concerns about lldb-mi due to its poorly written and tested codebase, I >>>>>> do agree in principle with the methodology. In fact, this is the entire >>>>>> philosophy behind lit as used with LLVM, clang, lld, etc. >>>>> >>>>> We have a public API... Why are we going to go and spend _any_ time on >>>>> doing anything else? I just don't get it. What a waste of time. We have a >>>>> public API. Lets use it. Not simple enough for people? Tough. Testing a >>>>> debugger should be done using the public API except when we are actually >>>>> trying to test the LLDB command line in pexpect like tests. Those and >>>>> only those are fine to covert over to using LIT and use text scraping. >>>>> But 95% of our testing should be done using the API that our IDEs use. >>>>> Using MI? Please no. >>>>>> >>>>>> I don’t take full credit for this idea. I had been toying with a >>>>>> similar idea for some time, but it was further cemented in an offline >>>>>> discussion with a co-worker. >>>>>> >>>>>> There many small, targeted tools in LLVM (e.g. llc, lli, llvm-objdump, >>>>>> etc) whose purpose are to be chained together to do interesting things. >>>>>> Instead of a command line api as we think of in LLDB where you type >>>>>> commands from an interactive prompt, they have a command line api as you >>>>>> would expect from any tool which is launched from a shell. >>>>>> >>>>>> I can imagine many potential candidates for lldb tools of this nature. >>>>>> Off the top of my head: >>>>>> • lldb-unwind - A tool for testing the unwinder. Accepts byte code as >>>>>> input and passes it through to the unwinder, outputting a compressed >>>>>> summary of the steps taken while unwinding, which could be pattern >>>>>> matched in lit. The output format is entirely controlled by the tool, >>>>>> and not by the unwinder itself, so it would be stable in the face of >>>>>> changes to the underlying unwinder. Could have various options to >>>>>> enable or disable features of the unwinder in order to force the >>>>>> unwinder into modes that can be tricky to encounter in the wild. >>>>>> • lldb-symbol - A tool for testing symbol resolution. Could have >>>>>> options for testing things like: >>>>>> • Determining if a symbol matches an executable >>>>>> • looking up a symbol by name in the debug info, and mapping it >>>>>> to an address in the process. >>>>>> • Displaying candidate symbols when doing name lookup in a >>>>>> particular scope (e.g. while stopped at a breakpoint). >>>>>> • lldb-breakpoint - A tool for testing breakpoints and stepping. >>>>>> Various options could include: >>>>>> • Set breakpoints and out addresses and/or symbol names where >>>>>> they were resolved to. >>>>>> • Trigger commands, so that when a breakpoint is hit the tool >>>>>> could automatically continue and try to run to another breakpoint, etc. >>>>>> • options to inspect certain useful pieces of state about an >>>>>> inferior, to be matched in lit. >>>>>> • lldb-interpreter - tests the jitter etc. I don’t know much about >>>>>> this, but I don’t see why this couldn’t be tested in a manner similar to >>>>>> how lli is tested. >>>>>> • lldb-platform - tests lldb local and remote platform interfaces. >>>>>> • lldb-cli -- lldb interactive command line. >>>>>> • lldb-format - lldb data formatters etc. >>>>> >>>>> >>>>> I agree that testing more functionality it a good idea, but if it is >>>>> worth testing we should see if we can get it into our public API. If so, >>>>> then we test it there. If not, then we come up with another solution. >>>>> Even in the alternate solution, it will be something that will probably >>>>> create structured data (JSON, Yaml, etc) and then parsed, so I would >>>>> rather spend the time getting these things into out public APIs, or test >>>>> them vai our public APIs. >>>>> >>>>>> • Tests NOW, not later. >>>>>> • I know we’ve been over this a million times and it’s not >>>>>> worth going over the arguments again. And I know it’s hard to write >>>>>> tests, often requiring the invention of new SB APIs. Hopefully those >>>>>> issues will be addressed by above a) and b) above and writing tests will >>>>>> be easier. Vedant Kumar ran some analytics on the various codebases and >>>>>> found that LLDB has the lowest test / commit ratio of any LLVM project >>>>>> (He didn’t post numbers for lld, so I’m not sure what it is there). >>>>>> • lldb: 287 of the past 1000 commits >>>>>> • llvm: 511 of the past 1000 commits >>>>>> • clang: 622 of the past 1000 commits >>>>>> • compiler-rt: 543 of the past 1000 commits >>>>>> This is an alarming statistic, and I would love to see this number >>>>>> closer to 50%. >>>>>> • Code style / development conventions - Aside from just the column >>>>>> limitations and bracing styles, there are other areas where LLDB differs >>>>>> from LLVM on code style. We should continue to adopt more of LLVM's >>>>>> style where it makes sense. I've identified a couple of areas >>>>>> (incomplete list) which I outline below. >>>>>> • Clean up the mess of cyclical dependencies and properly layer >>>>>> the libraries. This is especially important for things like lldb-server >>>>>> that need to link in as little as possible, but regardless it leads to a >>>>>> more robust architecture, faster build and link times, better >>>>>> testability, and is required if we ever want to do a modules build of >>>>>> LLDB >>>>>> • Use CMake instead of Xcode project (CMake supports >>>>>> Frameworks). CMake supports Apple Frameworks, so the main roadblock to >>>>>> getting this working is just someone doing it. Segmenting the build >>>>>> process by platform doesn't make sense for the upstream, especially when >>>>>> there is a perfectly workable solution. I have no doubt that the >>>>>> resulting Xcode workspace generated automatically by CMake will not be >>>>>> as "nice" as one that is maintained by hand. We face this problem with >>>>>> Visual Studio on Windows as well. The solution that most people have >>>>>> adopted is to continue using the IDE for code editing and debugging, but >>>>>> for actually running the build, use CMake with Ninja. A similar >>>>>> workflow should still be possible with an OSX CMake build, but as I do >>>>>> not work every day on a Mac, all I can say is that it's possible, I have >>>>>> no idea how impactful it would be on peoples' workflows. >>>>>> • Variable naming conventions >>>>>> • I don’t expect anyone is too fond of LLDB’s naming >>>>>> conventions, but if we’re committed to joining the LLVM ecosystem, then >>>>>> let’s go all the way. >>>>> >>>>> Hopefully we can get LLVM and Clang to adopt naming member variables with >>>>> a prefix... If so, that is my main remaining concern. >>>>> >>>>>> • Use more modern C++ and less C >>>>>> • Old habits die hard, but this isn’t just a matter of >>>>>> style. It leads to safer, more robust, and less fragile code as well. >>>>>> • Shorter functions and classes with more narrowly targeted >>>>>> responsibilities >>>>>> • It’s not uncommon to find functions that are hundreds >>>>>> (and in a few cases even 1,000+) of lines long. We really need to be >>>>>> better about breaking functions and classes down into smaller >>>>>> responsibilities. This helps not just for someone coming in to read the >>>>>> function, but also for testing. Smaller functions are easier to unit >>>>>> test. >>>>>> • Convert T foo(X, Y, Error &error) functions to Expected<T> >>>>>> foo(X, Y) style (Depends on 1.c) >>>>>> • llvm::Expected is based on the llvm::Error class >>>>>> described earlier. It’s used when a function is supposed to return a >>>>>> value, but it could fail. By packaging the error with the return value, >>>>>> it’s impossible to have a situation where you use the return value even >>>>>> in case of an error, and because llvm::Error has mandatory checking, >>>>>> it’s also impossible to have a sitaution where you don’t check the >>>>>> error. So it’s very safe. >>>>> >>>>> As crashes if you don't check the errors. One of the big differences >>>>> between LLDB and LLVM/Clang is that asserts are used liberally all over >>>>> clang which make the code very crashy when used in production with >>>>> uncontrolled input like we get in the debugger. We will need to be very >>>>> careful with any changes to make sure we don't increase crash frequency. >>>>> >>>>>> >>>>>> Whew. That was a lot. If you made it this far, thanks for reading! >>>>>> >>>>>> Obviously if we were to embark on all of the above, it would take many >>>>>> months to complete everything. So I'm not proposing anyone stop what >>>>>> they're doing to work on this. This is just my own personal wishlist >>>>> >>>>> There are many great things in here. As long as we are careful to not >>>>> increase the crash frequency in LLDB I am all for it. My mains areas of >>>>> concern are: >>>>> - public API can't change in its current form >>>>> - no LLVM or clang classes in the public API as arguments or return >>>>> values. >>>>> - don't crash more by introducing assert heavy code into code paths that >>>>> use input from outside sources >>>> >>>> It seems to me that you are mixing two things: asserting on user inputs vs >>>> asserting on internal invariants of the system. LLVM is very intensive >>>> about asserting on the second category and this seems fine to me. >>>> Asserting on external inputs is not great (in LLVM as much as in LLDB). >>>> >>>> The asserting error class above falls into the second category and is a >>>> great tool to enforce programmer error >>>> >>>> — >>>> Mehdi
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev