Hi Aaron,

On Sun, 2023-11-12 at 15:16 -0500, Aaron Merey wrote:
> It is possible for segments of different shared libaries to be interleaved
> in memory such that the segments of one library are located in between
> non-contiguous segments of another library.
> 
> For example, this can be seen with firefox on RHEL 7.9 where multiple
> shared libraries could be mapped in between ld-2.17.so segments:
> 
>       [...]
>       7f0972082000-7f09720a4000 00000000 139264      /usr/lib64/ld-2.17.so
>       7f09720a4000-7f09720a5000 00000000 4096        /memfd:mozilla-ipc 
> (deleted)
>       7f09720a5000-7f09720a7000 00000000 8192        /memfd:mozilla-ipc 
> (deleted)
>       7f09720a7000-7f09720a9000 00000000 8192        /memfd:mozilla-ipc 
> (deleted)
>       7f0972134000-7f0972136000 00000000 8192        
> /usr/lib64/firefox/libmozwayland.so
>       7f0972136000-7f0972137000 00002000 4096        
> /usr/lib64/firefox/libmozwayland.so
>       7f0972137000-7f0972138000 00003000 4096        
> /usr/lib64/firefox/libmozwayland.so
>       7f0972138000-7f0972139000 00003000 4096        
> /usr/lib64/firefox/libmozwayland.so
>       7f097213a000-7f0972147000 00000000 53248       
> /usr/lib64/firefox/libmozsqlite3.so
>       7f0972147000-7f097221e000 0000d000 880640      
> /usr/lib64/firefox/libmozsqlite3.so
>       7f097221e000-7f0972248000 000e4000 172032      
> /usr/lib64/firefox/libmozsqlite3.so
>       7f0972248000-7f0972249000 0010e000 4096        
> /usr/lib64/firefox/libmozsqlite3.so
>       7f0972249000-7f097224c000 0010e000 12288       
> /usr/lib64/firefox/libmozsqlite3.so
>       7f097224c000-7f0972250000 00111000 16384       
> /usr/lib64/firefox/libmozsqlite3.so
>       7f0972250000-7f0972253000 00000000 12288       
> /usr/lib64/firefox/liblgpllibs.so
>       [...]
>       7f09722a3000-7f09722a4000 00021000 4096        /usr/lib64/ld-2.17.so
>       7f09722a4000-7f09722a5000 00022000 4096        /usr/lib64/ld-2.17.so
> 
> dwfl_segment_report_module did not account for the possibility of
> interleaving non-contiguous segments, resulting in premature closure
> of modules as well as failing to report modules.

Nice description of the issue.

> Fix this by removing segment skipping in dwfl_segment_report_module.
> When dwfl_segment_report_module reported a module, it would return
> the index of the segment immediately following the end address of the
> current module.  Since there's a chance that other modules might fall
> within this address range, dwfl_segment_report_module instead returns
> the index of the next segment.

This makes sense.

> This patch also fixes premature module closure that can occur in
> dwfl_segment_report_module when interleaving non-contiguous segments
> are found.  Previously modules with start and end addresses that overlap
> with the current segment would have their build-id compared the build-id
> associated with the current segment.  If there was a mismatch, that module
> would be closed.  Avoid closing modules in this case when mismatching
> build-ids correspond to distinct modules.

Nice find.

> A couple caveats should be mentioned.  First, start and end addresses
> of reported modules cannot be assumed to contain segments from only
> that module.  This has always been the case however.

There is dwfl_addrmodule/dwfl_addrsegment to find the module that
covers a specific address. Defined in libdwfl/segment.c. I think this
should handle this by checking the closes load address. But I have not
tested it.

Normally only kernel modules (.ko ET_REL files) have multiple segments.
So it might make sense to double check none of this impacts systemtap.

> Second, the testcases in this patch use a firefox corefile that is
> fairly large.  The .bz2 corefile is about 47M.  A clean elfutils repo
> is currently about 42M, so this corefile more than doubles the size of
> the elfutils repo.  I looked for a much smaller process with
> interleaving non-contiguous shared library sections but was not able
> to find one.  I've included the corefile and tests in this patch but
> they can be removed if we'd prefer to not approx. double the size of
> the repo.

I really appreciate the testcase, but it really is too big. It is 736M
bunzip2ed (which would happen on any make check). I think this makes
the repo and the build/check a little too heavy. Also we try to include
instructions to recreate any binary test files and that isn't really
possible in this case.

> https://sourceware.org/bugzilla/show_bug.cgi?id=30975
> 
> Signed-off-by: Aaron Merey <ame...@redhat.com>
> ---
>  libdwfl/dwfl_segment_report_module.c |  37 +++++--
>  tests/Makefile.am                    |   5 +-
>  tests/run-unstrip-noncontig.sh       | 155 +++++++++++++++++++++++++++
>  tests/testcore-noncontig.bz2         | Bin 0 -> 49146091 bytes
>  4 files changed, 184 insertions(+), 13 deletions(-)
>  create mode 100755 tests/run-unstrip-noncontig.sh
>  create mode 100644 tests/testcore-noncontig.bz2
> 
> diff --git a/libdwfl/dwfl_segment_report_module.c 
> b/libdwfl/dwfl_segment_report_module.c
> index 3ef62a7d..09ee37b3 100644
> --- a/libdwfl/dwfl_segment_report_module.c
> +++ b/libdwfl/dwfl_segment_report_module.c
> @@ -737,17 +737,34 @@ dwfl_segment_report_module (Dwfl *dwfl, int ndx, const 
> char *name,
>               && invalid_elf (module->elf, module->disk_file_has_build_id,
>                               &build_id))
>             {
> -             elf_end (module->elf);
> -             close (module->fd);
> -             module->elf = NULL;
> -             module->fd = -1;
> +             /* If MODULE's build-id doesn't match the disk file's
> +                build-id, close ELF only if MODULE and ELF refer to
> +                different builds of files with the same name.  This
> +                prevents premature closure of the correct ELF in cases
> +                where segments of a module are non-contiguous in memory.  */
> +             if (name != NULL && module->name[0] != '\0'
> +                 && strcmp (basename (module->name), basename (name)) == 0)
> +               {
> +                 elf_end (module->elf);
> +                 close (module->fd);
> +                 module->elf = NULL;
> +                 module->fd = -1;
> +               }
>             }

Nice.

> -         if (module->elf != NULL)
> +         else if (module->elf != NULL)
>             {
> -             /* Ignore this found module if it would conflict in address
> -                space with any already existing module of DWFL.  */
> +             /* This module has already been reported.  */
>               skip_this_module = true;
>             }
> +         else
> +           {
> +             /* Only report this module if we haven't already done so.  */
> +             for (Dwfl_Module *mod = dwfl->modulelist; mod != NULL;
> +                  mod = mod->next)
> +               if (mod->low_addr == module_start
> +                   && mod->high_addr == module_end)
> +                 skip_this_module = true;
> +           }
>         }
>        if (skip_this_module)
>       goto out;

OK. So now we only skip modules if we have already seen this exact
module.

> @@ -781,10 +798,6 @@ dwfl_segment_report_module (Dwfl *dwfl, int ndx, const 
> char *name,
>       }
>      }
>  
> -  /* Our return value now says to skip the segments contained
> -     within the module.  */
> -  ndx = addr_segndx (dwfl, segment, module_end, true);
> -
>    /* Examine its .dynamic section to get more interesting details.
>       If it has DT_SONAME, we'll use that as the module name.
>       If it has a DT_DEBUG, then it's actually a PIE rather than a DSO.
> @@ -929,6 +942,8 @@ dwfl_segment_report_module (Dwfl *dwfl, int ndx, const 
> char *name,
>        ndx = -1;
>        goto out;
>      }
> +  else
> +    ndx++;
>  
>    /* We have reported the module.  Now let the caller decide whether we
>       should read the whole thing in right now.  */

Right, so addr_segndx would normally return this ndx because next ==
true. But now we just always just report the next ndx.

> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index 7fb8efb1..a12df1d3 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -212,7 +212,7 @@ TESTS = run-arextract.sh run-arsymtest.sh run-ar.sh 
> newfile test-nlist \
>       $(asm_TESTS) run-disasm-bpf.sh run-low_high_pc-dw-form-indirect.sh \
>       run-nvidia-extended-linemap-libdw.sh 
> run-nvidia-extended-linemap-readelf.sh \
>       run-readelf-dw-form-indirect.sh run-strip-largealign.sh \
> -     run-readelf-Dd.sh
> +     run-readelf-Dd.sh run-unstrip-noncontig.sh
>  
>  if !BIARCH
>  export ELFUTILS_DISABLE_BIARCH = 1
> @@ -632,7 +632,8 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
>            run-nvidia-extended-linemap-libdw.sh 
> run-nvidia-extended-linemap-readelf.sh \
>            testfile_nvidia_linemap.bz2 \
>            testfile-largealign.o.bz2 run-strip-largealign.sh \
> -          run-funcretval++11.sh
> +          run-funcretval++11.sh \
> +          run-unstrip-noncontig.sh testcore-noncontig.bz2
>  
>  
>  if USE_VALGRIND

I really like to have a testcase, but only if we can find/construct
something much smaller.

Thanks,

Mark

Reply via email to