[Lldb-commits] [PATCH] D83023: [lldb/ObjectFileMachO] Fetch shared cache images from our own shared cache

Pavel Labath via Phabricator via lldb-commits Fri, 03 Jul 2020 01:05:14 -0700

labath added a comment.

In D83023#2128475 <https://reviews.llvm.org/D83023#2128475>, @friss wrote:

> In D83023#2128298 <https://reviews.llvm.org/D83023#2128298>, @labath wrote:
>
> > I think this is a very interesting feature (lldb being able to load modules 
> > from memory; the mac shared cache thingy is interesting too, but in a 
> > different way). We have a feature request from people who are downloading 
> > modules from a network (from a proprietary symbol server, etc.) and would 
> > like to pass them to lldb without having to serialize them to disk. This 
> > would be a step towards making that happen. It could also be useful for our 
> > own unit tests which now have to do a similar thing.
> >
> > However, I think this could use some clean up. There's a lot of juggling of 
> > data/file/object offsets going on, and it all seems inconsistent and 
> > avoidable to me. Please see inline comments for details.
>
>
> I'll see what can be done. My main goal while working on this was to avoid 
> changing the semantics outside of the shared cache usecase. I understand 
> fairly well the codepath that I added and then just moved some other bits 
> around to keep the existing semantics for the rest. Happy to rework this.

So, if an object file needs to access some data which is outside of "its" image 
then my idea about using sliced data buffers will probably not work. It that 
case, using the "object offset" field to communicate the location of the 
"object" might not be a bad idea (it's still different than the use in .a 
files, but maybe we can stretch that definition). The part that bugs me then is 
having this functionality key off of the "data" field being set. Ideally, these 
would be two orthogonal features:

- the "data" would control whether you read the file system to obtain the 
object contents
- the "object offset" would tell you where to locate the desired object inside 
these "jumbo" objects

I think that might be doable by adding one more argument to the 
ObjectFile::GetModuleSpecifications interface, which says "this buffer contains 
a lot of stuff, but the object I am really interested in is at <offset>". Then 
ObjectFileMachO could do what it needs to read the object from that offset, but 
it would still have access to the entire buffer to read the __LINKEDIT thingy.

Or.... we could try to do something completely different, and avoid touching 
the common interfaces altogether. The way I see it, this shared cache thing is 
unlikely to be reusable for anything else, and the reason we're adding these 
interfaces is so that we can communicate information from DynamicLoaderDarwin 
to ObjectFileMachO through `Target::GetOrCreateModule`. But that is silly, 
because DynamicLoaderDarwin already knows that the request is going to go to 
ObjectFileMachO (and indeed things would break if it doesn't). If there was a 
way for these two to communicate directly, then all of this would be unneeded, 
because DynamicLoaderDarwin could use a ObjectFileMachO interface, which would 
be specially crafted for this purpose.

It also seems to me we are only really interested in the "Get" part of 
`Target::GetOrCreateModule` -- i.e., ensuring we don't create multiple module 
instances for the same object. The rest of the function deals with various ways 
to try to find a module, but that's the thing we actually want to avoid, as we 
already know the data buffer that contains it. And all of the checks about 
matching uuids/architecures/etc. don't seem useful either as 
DynamicLoaderDarwin is repeating most of these anyway.

We actually already have an interface that almost does this: 
`Module::CreateModuleFromObjectFile`. It allows one to create a Module while 
directly specifying the ObjectFile class to use, and passing arbitrary 
constructor arguments. The part it does *not* do is register this module into 
the global module cache. That's because so far, this function is used to create 
"weird" modules that we wouldn't want to cache anyway. But, what if we created 
a version of this function which does that? Could `DynamicLoaderDarwin` then 
instead of `target.GetOrCreateModule(shared_cache_spec)` do something like:

  module_sp = target.GetImages().FindFirstModule(shared_cache_spec);
  // If target already contains the module, we're done.
  if (!module_sp) {
    // This checks the global module cache for a matching module. If it finds 
one, it returns it. Otherwise, it creates a module through 
CreateModuleFromObjectFile, and registers it into the cache.
    module_sp = 
Module::GetOrCreateModuleFromObjectFile<ObjectFileMachO>(shared_cache_spec, 
whatever_args_we_need_to_get_macho_to_load_properly);
    target.GetImages().Append(module_sp, /*notify=*/false);
  }

I think an interface like this could be useful in the future as an escape 
hatch, because there are a lot of situations where one already knows he is 
dealing with a specific kind of object files, but is not able to take advantage 
of that. For example, DynamicLoaderPOSIXDYLD "knows" it's going to load an ELF 
file, that knowledge does not help it in any way. So far, we haven't needed to 
do anything super weird there, but that class already does contain a bunch of 
fallbacks/workarounds for bugs in various dynamic loaders and other platform 
features (e.g. the VDSO pseudo-module). I can certainly see how being able to 
create an module more directly could be helpful at times.

> 
> 
>> The patch is also quite light on testing. If done right, I believe this 
>> should make it possible to yaml2obj a file into memory in a unit test and 
>> then create a Module from that. That would enable us to test the 
>> Module(Spec) changes in isolation, and move them to a separate patch.
> 
> I agree about the light testing. FWIW, this got significant living-on testing 
> on Apple platforms, but some more targeted testing would be great.

Yeah, I'm sure it works now. I'm more worried about it continuing to work in 
face of other random changes. :)

================
Comment at: lldb/include/lldb/Utility/DataBuffer.h:82

+class DataBufferHostMemory : public DataBuffer {
+public:
----------------
friss wrote:
> labath wrote:
> > All of our other DataBuffers also point to host memory (how could they not 
> > do that?). I guess what really makes this one special is that it does not 
> > own the storage that it points to...
> Good point. What about `DataBufferUnowned` ? (and adding a comment to explain 
> what it's used for)
Sounds good. I don't think that the comment really needs to mention the shared 
cache. I can see this being useful in other circumstances too...

================
Comment at: lldb/source/Core/Module.cpp:154-159
+  // If the ModuleSpec provided a DataBuffer, let's respect the ModuleSpec's
+  // file offset when reading in this buffer.
+  if (data_sp) {
+    file_offset = module_spec.GetObjectOffset();
+    file_size = module_spec.GetData()->GetByteSize();
+  }
----------------
friss wrote:
> labath wrote:
> > I think this overloads the meaning of `module_spec.GetObjectOffset()` in a 
> > fairly confusing way. Normally, I believe 
> > `ModuleSpec::GetObject{Name,Offset,Size}` is used to refer to the 
> > name/offse/size of a object file located in an archive (.a). However, here 
> > it seems you are reusing it for something else. It seems unfortunate that 
> > the meaning of this field should change depending on the "data" field being 
> > present.
> > 
> > What exactly is the purpose of this field? Could we avoid this by just 
> > creating an appropriately-sized DataBuffer ?
> So the shared cache is some kind of a container in some ways similar to a 
> `.a`, file, in some ways very different. The main difference is that the 
> contained dylibs are not each in their own subpart of the file. For example, 
> the __LINKEDIT segment is shared by all of them. The load commands that 
> describe the images are relative to the shared cache itself, not to one 
> specific image.
> 
> So I reused the object_offset field in its "offset from the beginning of a 
> container sense". 
> 
> I think I tried having the SharedCacheInfo return only the subpart of the 
> cache that contains the image, and ran into issues with this. But I don't 
> remember exactly what, and I have a much better understanding now, so I 
> should try it again.
Hmm... interesting. I'm going to touch on this more in the main comment.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83023/new/

https://reviews.llvm.org/D83023

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D83023: [lldb/ObjectFileMachO] Fetch shared cache images from our own shared cache

Reply via email to