https://sourceware.org/bugzilla/show_bug.cgi?id=23254
Bug ID: 23254 Summary: ld.bfd mishandles file pointers while scanning archive with LTO members, causing "malformed archive" errors for a well-formed archive Product: binutils Version: 2.31 (HEAD) Status: UNCONFIRMED Severity: critical Priority: P2 Component: ld Assignee: unassigned at sourceware dot org Reporter: zenith432 at users dot sourceforge.net Target Milestone: --- When processing an archive with LTO object members, ld.bfd mismanages the file pointer, occasionally causing "malformed archive" errors even though the archive is well-formed. The cause of this problem is as follows: An archive is processed by the function elf_link_add_archive_symbols (bfd/elflink.c), which calls for each member the functions _bfd_get_elt_at_filepos bfd_check_format add_archive_element The first two functions (_bfd_get_elt_at_filepos, bfd_check_format) process the archive element using the I/O functions found in bfd/bfdio.c and bfd/cache.c. These I/O functions use stdio FILE*. They store the FILE* in abfd->iostream for the parent archive, and use fseek/fread to scan the archive file. After this is done, if an archive element is loaded, add_archive_element (in ld/ldmain.c) is called, which eventually calls plugin_maybe_claim (in ld/plugin.c) to handle LTO elements. This calls plugin_object_p (in ld/plugin.c) which eventually calls the plugin's claim_file_handler method (see plugin_call_claim_file). The plugin's claim_file_handler function processes the archive file using system calls lseek()/read() using a file descriptor set up in plugin_object_p. Description of The Problem: Prior to commit 7d0b9ebc1e0079271a7c7737b53bc026525eab64 (by Alan Modra), plugin_object_p would obtain a file descriptor to the archive file based on its name using system call open(), then pass the file descriptor downward to the plugin's claim_file_handler for it to process the file. In the commit mentioned above, plugin_object_p was changed so that instead of opening the archive by name, it obtains the file descriptor from (FILE*) abfd->iostream using fileno(), then duplicates it with systemcall dup(), then passes it down to the plugin's claim_file_handler. *The documentation for system call dup() very clearly explains that the dup'd file descriptor shares a file pointer with the original descriptor*. As a result of this change, the plugin's claim_file_handler clobbers the file pointer using calls to lseek()/read(). HOWEVER, the archive is later continued to be accessed using fseek()/fread() on abfd->iostream when processing the next archive element !! This is WRONG. Because fseek/fread *do not know* that the file pointer was changed outside the stdio FILE* framework. The FILE structure caches a buffer for the file, and if fseek() gets a position that is within the loaded buffer, it may not call lseek(). Moreover, if FILE struct caches a position for the file, this potentially causes fseek() and fread() to believe the file pointer is in a different place than it really is and they read the wrong part of the file and eventually cause a "malformed archive" error. A later commit ace667e59aede65c400381f1cff704b61e8ccb0b by Andrew Burgess added a small fix that wraps the call to plugin's claim_file_handler in code saving and restoring the file pointer. However, restoration of the file pointer is only done if the plugin does not "claim" the archive element. It seems this was done to prevent bugs if the archive consists of non-LTO elements. However, if the archive has LTO elements and the element is "claimed", the file pointer remains clobbered and the problem persists. This problem started with 2.28 official release of binutils (in line with dates on above commits), and continues to this very day. I am able to consistently reproduce the "malformed archive" errors on large EDK2-based build on macOS. This is with self-built gcc 8.1 and binutils 2.30 or binutils git master. I've tried tweaking configure options, but none of them made any difference. For some reason I cannot reproduce this problem with EDK2 build on Fedora 28 with GCC 8.1.1. I've examined the patches Redhat makes to gcc and binutils from their source RPMs, but there doesn't seem to be anything there that fixes this. Another possibility is that fseek() on Fedora works differently than on macOS, so it always calls lseek() and fixes the file pointer that's been clobbered, so the issue never surfaces. I can't be sure because the development tools are all pre-built binaries and difficult to debug and see what's going on. Due to the complexity of this problem I cannot attach a simple example to reproduce it, and like I said - I'm not sure it reproduces on Fedora at all. However, I spent time debugging it and the cause is what is explained above and can be checked in source code. At any rate, mixing use of fseek()/fread() with out-of-line use of lseek()/read() on the same (or dup) file descriptor is obviously a programming error - as stdio was not meant to be used this way. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ bug-binutils mailing list bug-binutils@gnu.org https://lists.gnu.org/mailman/listinfo/bug-binutils