FWIW I added a function to StringRef the other day that looks like this: static StringRef withNullAsEmpty(const char *Str) { return StringRef(Str ? Str : ""); }
I've been using this code in converting existing uses of const char * over to StringRef. It's not 100% what you want, but at least it's better than nothing. On Tue, Sep 20, 2016 at 10:26 AM Greg Clayton <gclay...@apple.com> wrote: > But StringRef is a smart string wrapper and it is there for exactly this > reason: to make string handling correct. So let us let it be smart and not > crash if you make it with null and call .size() on it... > > On Sep 19, 2016, at 2:37 PM, Zachary Turner <ztur...@google.com> wrote: > > > > FWIW, strlen(nullptr) will also crash just as easily as > StringRef(nullptr) will crash, so that one isn't a particularly convincing > example of poorly used asserts, since the onus on the developer is exactly > the same as with strlen. That said, I still know where you're coming from > :) > > > > On Mon, Sep 19, 2016 at 2:34 PM Greg Clayton <gclay...@apple.com> wrote: > > > > > On Sep 19, 2016, at 2:20 PM, Mehdi Amini <mehdi.am...@apple.com> > wrote: > > > > > >> > > >> On Sep 19, 2016, at 2:02 PM, Greg Clayton via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > > >> > > >> > > >>> On Sep 19, 2016, at 1:18 PM, Zachary Turner via lldb-dev < > lldb-dev@lists.llvm.org> wrote: > > >>> > > >>> Following up with Kate's post from a few weeks ago, I think the dust > has settled on the code reformat and it went over pretty smoothly for the > most part. So I thought it might be worth throwing out some ideas for > where we go from here. I have a large list of ideas (more ideas than time, > sadly) that I've been collecting over the past few weeks, so I figured I > would throw them out in the open for discussion. > > >>> > > >>> I’ve grouped the areas for improvement into 3 high level categories. > > >>> > > >>> • De-inventing the wheel - We should use more code from LLVM, > and delete code in LLDB where LLVM provides a solution. In cases where > there is an LLVM thing that is *similar* to what we need, we should extend > the LLVM thing to support what we need, and then use it. Following are > some areas I've identified. This list is by no means complete. For each > one, I've given a personal assessment of how likely it is to cause some > (temporary) hiccups, how much it would help us in the long run, and how > difficult it would be to do. Without further ado: > > >>> • Use llvm::Regex instead of lldb::Regex > > >>> • llvm::Regex doesn’t support enhanced mode. > Could we add support for this to llvm::Regex? > > >>> • Risk: 6 > > >>> • Impact: 3 > > >>> • Difficulty / Effort: 3 (5 if we have to add > enhanced mode support) > > >> > > >> As long as it supports all of the features, then this is fine. > Otherwise we will need to modify the regular expressions to work with the > LLVM version or back port any advances regex features we need into the LLVM > regex parser. > > >> > > >>> • Use llvm streams instead of lldb::StreamString > > >>> • Supports output re-targeting (stderr, stdout, > std::string, etc), printf style formatting, and type-safe streaming > operators. > > >>> • Interoperates nicely with many existing llvm > utility classes > > >>> • Risk: 4 > > >>> • Impact: 5 > > >>> • Difficulty / Effort: 7 > > >> > > >> I have worked extensively with both C++ streams and with printf. I > don't find the type safe streaming operators to be readable and a great way > to place things into streams. Part of my objection to this will be quelled > if the LLVM streams are not stateful like C++ streams are (add an extra > "std::hex" into the stream and suddenly other things down stream start > coming out in hex). As long as printf functionality is in the LLVM streams > I am ok with it. > > >> > > >>> • Use llvm::Error instead of lldb::Error > > >>> • llvm::Error is an error class that *requires* > you to check whether it succeeded or it will assert. In a way, it's > similar to a C++ exception, except that it doesn't come with the > performance hit associated with exceptions. It's extensible, and can be > easily extended to support the various ways LLDB needs to construct errors > and error messages. > > >>> • Would need to first rename lldb::Error to > LLDBError so that te conversion from LLDBError to llvm::Error could be done > incrementally. > > >>> • Risk: 7 > > >>> • Impact: 7 > > >>> • Difficulty / Effort: 8 > > >> > > >> We must be very careful here. Nothing can crash LLDB and adding > something that will be known to crash in some cases can only be changed if > there is a test that can guarantee that it won't crash. assert() calls in a > shared library like LLDB are not OK and shouldn't fire. Even if they do, > when asserts are off, then it shouldn't crash LLDB. We have made efforts to > only use asserts in LLDB for things that really can't happen, but for > things like constructing a StringRef with NULL is one example of why I > don't want to use certain classes in LLVM. If we can remove the crash > threat, then lets use them. But many people firmly believe that asserts > belong in llvm and clang code and I don't agree. > > > > > > I’m surprise by your aversion to assertions, what is your suggested > alternative? Are you expecting to check and handle every possible > invariants everywhere and recover (or signal an error) properly? That does > not seem practical, and it seems to me that it'd lead to just involving UB > (or breaking design invariant) without actually noticing it. > > > > We have to as a debugger. We get input from a variety of different > compilers and we parse and use things there were produced by our toolchains > and others. Crashing Xcode because someone accidentally created a > llvm::StringRef(NULL) should not happen. It is not OK for library code to > crash. This is quite OK for compilers, linkers and other tools, but it > isn't for any other code. Since there are so many asserts, LLDB code is > currently responsible for figuring out what will make clang unhappy and we > must be sure to not feed it anything that makes it unhappy or we crash. So > I don't see that as better. > > > > > > > >> Also many asserts are in header files so even if you build llvm and > clang with asserts off, but then build our project that uses LLVM with > asserts on, you get LLVM and clang asserts when you don't want them. > > > > > > It is not really supported to use the LLVM C++ headers without > assertions and link to a LLVM build with assertions (and vice-versa). > > > > As we have found out. We claim llvm and clang can be linked into tools a > libraries, but it really means, you should really use it out of process > because it might crash at any time so don't try and use it in a program you > actually want to be able run for a long time. > > > > > > > > > > >> > > >>> • StringRef instead of const char *, len everywhere > > >>> • Can do most common string operations in a way > that is guaranteed to be safe. > > >>> • Reduces string manipulation algorithm > complexity by an order of magnitude. > > >>> • Can potentially eliminate tens of thousands of > string copies across the codebase. > > >>> • Simplifies code. > > >>> • Risk: 3 > > >>> • Impact: 8 > > >>> • Difficulty / Effort: 7 > > >> > > >> I don't think StringRef needs to be used everywhere, but it many > places it does make sense. For example our public API should not contain > any LLVM classes in the API. "const char *" is a fine parameter for public > functions. I really hate the fact that constructing llvm::StringRef with > NULL causes an assertion. Many people don't know that and don't take that > into account when making their changes. I would love to see the assertion > taken out to tell you the truth. That would make me feel better about using > StringRef in many more places within LLDB as we shouldn't ever crash due to > this. I would expect most internal APIs are safe to move to > llvm::StringRef, but we will need to make sure none of these require a null > terminated string which would cause us to have to call "str.str().c_str()". > So internally, yes we can cut over to more use of StringRef. But the public > API can't be changed. > > >> > > >> > > >>> • ArrayRef instead of const void *, len everywhere > > >>> • Same analysis as StringRef > > >> > > >> Internally yes, external APIs no. > > >>> • MutableArrayRef instead of void *, len everywhere > > >>> • Same analysis as StringRef > > >> Internally yes, external APIs no. > > >>> • Delete ConstString, use a modified StringPool that is > thread-safe. > > >>> • StringPool is a non thread-safe version of > ConstString. > > >>> • Strings are internally refcounted so they can > be cleaned up when they are no longer used. ConstStrings are a large > source of memory in LLDB, so ref-counting and removing stale strings has > the potential to be a huge savings. > > >>> • Risk: 2 > > >>> • Impact: 9 > > >>> • Difficulty / Effort: 6 > > >> > > >> We can't currently rely on ref counting. We currently hand out "const > char *" as return values from many public API calls and these rely on the > strings living forever. We many many existing clients and we can't change > the public API. So I vote to avoid this. We are already using LLVM string > pools under the covers and we also associate the mangled/demangled names in > the map as is saves us a ton of cycles when parsing stuff as we never > demangle (very expensive) the same name twice. So I don't see the need to > change this as it is already backed by LLVM technology under the covers. > > >> > > >>> • thread_local instead of lldb::ThreadLocal > > >>> • This fixes a number of bugs on Windows that > cannot be fixed otherwise, as they require compiler support. > > >>> • Some other compilers may not support this yet? > > >>> • Risk: 2 > > >>> • Impact: 3 > > >>> • Difficulty: 3 > > >> > > >> As long as all compilers support this then this is fine. > > >> > > >>> • Use llvm::cl for the command line arguments to the > primary lldb executable. > > >>> • Risk: 2 > > >>> • Impact: 3 > > >>> • Difficulty / Effort: 4 > > >> > > >> Easy and fine to switch over to. We might need to port some > getopt_long functionality into it if it doesn't handle all of the things > that getopt_long does. For example arguments and options can be > interspersed in getopt_long(). You can also terminate your arguments with > "--". It accepts single dashes for long option names if they don't conflict > with short options. Many things like this. > > >> > > >>> • Testing - Our testing infrastructure is unstable, and our test > coverage is lacking. We should take steps to improve this. > > >>> • Port as much as possible to lit > > >>> • Simple tests should be trivial to port to lit > today. If nothing else this serves as a proof of concept while increasing > the speed and stability of the test suite, since lit is a more stable > harness. > > >> > > >> As long a tests use the public API to run test. We are not doing text > scraping. > > >> > > >>> • Separate testing tools > > >>> • One question that remains open is how to > represent the complicated needs of a debugger in lit tests. Part a) above > covers the trivial cases, but what about the difficult cases? In > https://reviews.llvm.org/D24591 a number of ideas were discussed. We > started getting to this idea towards the end, about a separate tool which > has an interface independent of the command line interface and which can be > used to test. lldb-mi was mentioned. While I have serious concerns about > lldb-mi due to its poorly written and tested codebase, I do agree in > principle with the methodology. In fact, this is the entire philosophy > behind lit as used with LLVM, clang, lld, etc. > > >> > > >> We have a public API... Why are we going to go and spend _any_ time > on doing anything else? I just don't get it. What a waste of time. We have > a public API. Lets use it. Not simple enough for people? Tough. Testing a > debugger should be done using the public API except when we are actually > trying to test the LLDB command line in pexpect like tests. Those and only > those are fine to covert over to using LIT and use text scraping. But 95% > of our testing should be done using the API that our IDEs use. Using MI? > Please no. > > >>> > > >>> I don’t take full credit for this idea. I had been toying with a > similar idea for some time, but it was further cemented in an offline > discussion with a co-worker. > > >>> > > >>> There many small, targeted tools in LLVM (e.g. llc, lli, > llvm-objdump, etc) whose purpose are to be chained together to do > interesting things. Instead of a command line api as we think of in LLDB > where you type commands from an interactive prompt, they have a command > line api as you would expect from any tool which is launched from a shell. > > >>> > > >>> I can imagine many potential candidates for lldb tools of this > nature. Off the top of my head: > > >>> • lldb-unwind - A tool for testing the unwinder. Accepts byte > code as input and passes it through to the unwinder, outputting a > compressed summary of the steps taken while unwinding, which could be > pattern matched in lit. The output format is entirely controlled by the > tool, and not by the unwinder itself, so it would be stable in the face of > changes to the underlying unwinder. Could have various options to enable > or disable features of the unwinder in order to force the unwinder into > modes that can be tricky to encounter in the wild. > > >>> • lldb-symbol - A tool for testing symbol resolution. Could > have options for testing things like: > > >>> • Determining if a symbol matches an executable > > >>> • looking up a symbol by name in the debug info, and > mapping it to an address in the process. > > >>> • Displaying candidate symbols when doing name lookup in > a particular scope (e.g. while stopped at a breakpoint). > > >>> • lldb-breakpoint - A tool for testing breakpoints and > stepping. Various options could include: > > >>> • Set breakpoints and out addresses and/or symbol names > where they were resolved to. > > >>> • Trigger commands, so that when a breakpoint is hit the > tool could automatically continue and try to run to another breakpoint, etc. > > >>> • options to inspect certain useful pieces of state > about an inferior, to be matched in lit. > > >>> • lldb-interpreter - tests the jitter etc. I don’t know much > about this, but I don’t see why this couldn’t be tested in a manner similar > to how lli is tested. > > >>> • lldb-platform - tests lldb local and remote platform > interfaces. > > >>> • lldb-cli -- lldb interactive command line. > > >>> • lldb-format - lldb data formatters etc. > > >> > > >> > > >> I agree that testing more functionality it a good idea, but if it is > worth testing we should see if we can get it into our public API. If so, > then we test it there. If not, then we come up with another solution. Even > in the alternate solution, it will be something that will probably create > structured data (JSON, Yaml, etc) and then parsed, so I would rather spend > the time getting these things into out public APIs, or test them vai our > public APIs. > > >> > > >>> • Tests NOW, not later. > > >>> • I know we’ve been over this a million times and it’s > not worth going over the arguments again. And I know it’s hard to write > tests, often requiring the invention of new SB APIs. Hopefully those > issues will be addressed by above a) and b) above and writing tests will be > easier. Vedant Kumar ran some analytics on the various codebases and found > that LLDB has the lowest test / commit ratio of any LLVM project (He didn’t > post numbers for lld, so I’m not sure what it is there). > > >>> • lldb: 287 of the past 1000 commits > > >>> • llvm: 511 of the past 1000 commits > > >>> • clang: 622 of the past 1000 commits > > >>> • compiler-rt: 543 of the past 1000 commits > > >>> This is an alarming statistic, and I would love to see this number > closer to 50%. > > >>> • Code style / development conventions - Aside from just the > column limitations and bracing styles, there are other areas where LLDB > differs from LLVM on code style. We should continue to adopt more of > LLVM's style where it makes sense. I've identified a couple of areas > (incomplete list) which I outline below. > > >>> • Clean up the mess of cyclical dependencies and > properly layer the libraries. This is especially important for things like > lldb-server that need to link in as little as possible, but regardless it > leads to a more robust architecture, faster build and link times, better > testability, and is required if we ever want to do a modules build of LLDB > > >>> • Use CMake instead of Xcode project (CMake supports > Frameworks). CMake supports Apple Frameworks, so the main roadblock to > getting this working is just someone doing it. Segmenting the build > process by platform doesn't make sense for the upstream, especially when > there is a perfectly workable solution. I have no doubt that the resulting > Xcode workspace generated automatically by CMake will not be as "nice" as > one that is maintained by hand. We face this problem with Visual Studio on > Windows as well. The solution that most people have adopted is to continue > using the IDE for code editing and debugging, but for actually running the > build, use CMake with Ninja. A similar workflow should still be possible > with an OSX CMake build, but as I do not work every day on a Mac, all I can > say is that it's possible, I have no idea how impactful it would be on > peoples' workflows. > > >>> • Variable naming conventions > > >>> • I don’t expect anyone is too fond of LLDB’s > naming conventions, but if we’re committed to joining the LLVM ecosystem, > then let’s go all the way. > > >> > > >> Hopefully we can get LLVM and Clang to adopt naming member variables > with a prefix... If so, that is my main remaining concern. > > >> > > >>> • Use more modern C++ and less C > > >>> • Old habits die hard, but this isn’t just a > matter of style. It leads to safer, more robust, and less fragile code as > well. > > >>> • Shorter functions and classes with more narrowly > targeted responsibilities > > >>> • It’s not uncommon to find functions that are > hundreds (and in a few cases even 1,000+) of lines long. We really need to > be better about breaking functions and classes down into smaller > responsibilities. This helps not just for someone coming in to read the > function, but also for testing. Smaller functions are easier to unit test. > > >>> • Convert T foo(X, Y, Error &error) functions to > Expected<T> foo(X, Y) style (Depends on 1.c) > > >>> • llvm::Expected is based on the llvm::Error > class described earlier. It’s used when a function is supposed to return a > value, but it could fail. By packaging the error with the return value, > it’s impossible to have a situation where you use the return value even in > case of an error, and because llvm::Error has mandatory checking, it’s also > impossible to have a sitaution where you don’t check the error. So it’s > very safe. > > >> > > >> As crashes if you don't check the errors. One of the big differences > between LLDB and LLVM/Clang is that asserts are used liberally all over > clang which make the code very crashy when used in production with > uncontrolled input like we get in the debugger. We will need to be very > careful with any changes to make sure we don't increase crash frequency. > > >> > > >>> > > >>> Whew. That was a lot. If you made it this far, thanks for reading! > > >>> > > >>> Obviously if we were to embark on all of the above, it would take > many months to complete everything. So I'm not proposing anyone stop what > they're doing to work on this. This is just my own personal wishlist > > >> > > >> There are many great things in here. As long as we are careful to not > increase the crash frequency in LLDB I am all for it. My mains areas of > concern are: > > >> - public API can't change in its current form > > >> - no LLVM or clang classes in the public API as arguments or return > values. > > >> - don't crash more by introducing assert heavy code into code paths > that use input from outside sources > > > > > > It seems to me that you are mixing two things: asserting on user > inputs vs asserting on internal invariants of the system. LLVM is very > intensive about asserting on the second category and this seems fine to me. > > > Asserting on external inputs is not great (in LLVM as much as in LLDB). > > > > > > The asserting error class above falls into the second category and is > a great tool to enforce programmer error > > > > > > — > > > Mehdi > > > >
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev