Re: [lldb-dev] [Reproducers] SBReproducer RFC

Pavel Labath via lldb-dev Mon, 07 Jan 2019 01:41:26 -0800

On 04/01/2019 22:19, Jonas Devlieghere via lldb-dev wrote:

Hi Everyone,
In September I sent out an RFC [1] about adding reproducers to LLDB.Over thepast few months, I landed the reproducer framework, support for the GDBremoteprotocol and a bunch of preparatory changes. There's still an open codereview
[2] for dealing with files, but that one is currently blocked by a change to
the VFS in LLVM [3].
The next big piece of work is supporting user commands (e.g. in thedriver) andSB API calls. Originally I expected these two things to be separate, butPavel
made a good case [4] that they're actually very similar.

I created a prototype of how I envision this to work. As usual, we can
differentiate between capture and replay.

## SB API Capture
When capturing a reproducer, every SB function/method is instrumentedusing a
macro at function entry. The added code tracks the function identifier
(currently we use its name with __PRETTY_FUNCTION__) and its arguments.

It also tracks when a function crosses the boundary between internal and
external use. For example, when someone (be it the driver, the pythonbindingor the RPC server) call SBFoo, and in its implementation SBFoo callsSBBar, wedon't need to record SBBar. When invoking SBFoo during replay, it willitself
call SBBar.
When a boundary is crossed, the function name and arguments areserialized to a
file. This is trivial for basic types. For objects, we maintain a table that
maps pointer values to indices and serialize the index.
To keep our table consistent, we also need to track return for functionsthat
return an object by value. We have a separate macro that wraps the returned
object.
The index is sufficient because every object that is passed to afunction hascrossed the boundary and hence was recorded. During replay (see below)we map
the index to an address again which ensures consistency.

## SB API Replay

To replay the SB function calls we need a way to invoke the corresponding
function from its serialized identifier. For every SB function, there's a
counterpart that deserializes its arguments and invokes the function. These
functions are added to the map and are called by the replay logic.

Replaying is just a matter looping over the function identifiers in the
serialized file, dispatching the right deserialization function, untilno more
data is available.
The deserialization function for constructors or functions that returnby value
contains additional logic for dealing with the aforementioned indices. The
resulting objects are added to a table (similar to the one describedearlier)that maps indices to pointers. Whenever an object is passed as anargument, the
index is used to get the actual object from the table.

## Tool

Even when using macros, adding the necessary capturing and replay code is
tedious and scales poorly. For the prototype, we did this by hand, but we
propose a new clang-based tool to streamline the process.

For the capture code, the tool would validate that the macro matches the
function signature, suggesting a fixit if the macros are incorrect ormissing.
Compared to generating the macros altogether, it has the advantage that we
don't have "configured" files that are harder to debug (without faking line
numbers etc).

The deserialization code would be fully generated. As shown in the prototype
there are a few different cases, depending on whether we have to account for
objects or not.

## Prototype Code
I created a differential [5] on Phabricator with the prototype. Itcontains the
necessary methods to re-run the gdb remote (reproducer) lit test.

## Feedback
Before moving forward I'd like to get the community's input. What do youthinkabout this approach? Do you have concerns or can we be smartersomewhere? Any
feedback would be greatly appreciated!

Thanks,
Jonas

[1] http://lists.llvm.org/pipermail/lldb-dev/2018-September/014184.html
[2] https://reviews.llvm.org/D54617
[3] https://reviews.llvm.org/D54277
[4] https://reviews.llvm.org/D55582
[5] https://reviews.llvm.org/D56322

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[Adding Tamas for his experience with recording and replaying APIs.]

Thank you for sharing the prototype Jonas. It looks very interesting,but there are a couple of things that worry me about it.

The first one is the usage of __PRETTY_FUNCTION__. That sounds like anon-starter even for an initial implementation, as the string thatexpands to is going to differ between compilers (gcc and clang willprobably agree on it, but I know for a fact it will be different onmsvc). It that was just an internal property of the serializationformat, then it might be fine, but it looks like you are hardcoding thevalues in code to connect the methods with their replayers, which isgoing to be a problem.

I've been thinking about how could this be done better, and the best(though not ideal) way I came up with is using the functions address asthe key. That's guaranteed to be unique everywhere. Of course, youcannot serialize that to a file, but since you already have a centralplace where you list all intercepted functions (to register theirreplayers), that place can be also used to assign unique integer IDs tothese functions. So then the idea would be that the SB_RECORD macrotakes the address of the current function, that gets converted to an IDin the lookup table, and the ID gets serialized.

The part that bugs me about this is that taking the address of anoverloaded function is extremely tedious (you have to write somethinglike static_cast<function_prototype>(&SBFoo::Bar)). That would mean allof these things would have to be passed to the RECORD macro. OTOH, theupshot of this would be that the macro would now have sufficientinformation to perform pretty decent error checking on its invocation.Another nice about this could be that once you already have a prototypeand an address of the function, it should be possible (with sufficienttemplate-fu) to synthesize replay code for the function automatically,at least in the simple cases, which would avoid the repetitiveness ofthe current replay code. Together, this might obviate the need for anyclang plugins or other funny build steps.

The second thing I noticed is the usage of pointers for identifyingobject. A pointer is good for that but only while the object it pointsto is alive. Once the object is gone, the pointer can (and most likelywill) be reused. So, it sounds to me like you also need to track thelifetime of these objects. That may be as simple as interceptingconstructor/destructor calls, but I haven't seen anything like that yet(though I haven't looked at all details of the patch).

Tying into that is the recording of return values. It looks like thecurrent RECORD_RETURN macro will record the address of the temporaryobject in the frame of the current function. However, that address willbecome invalid as soon as the function returns as the result object willbe copied into a location specified by the caller as a part of thereturn processing. Are you handling this in any way?

The final thing, which I noticed is the lack of any sign of threadingsupport. I'm not too worried about that, as that sounds like somethingthat could be fitted into the existing framework incrementally, but itis something worth keeping in mind, as you're going to run into thatpretty soon.


_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] [Reproducers] SBReproducer RFC

Reply via email to