URL: <https://savannah.gnu.org/bugs/?66981>
Summary: [troff] use-after-free bug in file name reporting Group: GNU roff Submitter: gbranden Submitted: Thu 03 Apr 2025 09:48:01 AM UTC Category: Core Severity: 4 - Important Item Group: Incorrect behaviour Status: In Progress Privacy: Public Assigned to: gbranden Open/Closed: Open Discussion Lock: Any Planned Release: None _______________________________________________________ Follow-up Comments: ------------------------------------------------------- Date: Thu 03 Apr 2025 09:48:01 AM UTC By: G. Branden Robinson <gbranden> I haven't bisected it yet but I think this bug is of recent vintage. Here's the regression test script in its current form, which does most of the explaining I can do. #!/bin/sh # # Copyright (C) 2025 Free Software Foundation, Inc. # # This file is part of groff. # # groff is free software; you can redistribute it and/or modify it under # the terms of the GNU General Public License as published by the Free # Software Foundation, either version 3 of the License, or # (at your option) any later version. # # groff is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or # FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. # groff="${abs_top_builddir:-.}/test-groff" # Regression-test Savannah #XXXXX. # # File name strings in GNU troff tend to be dynamically allocated and to # have highly variable lifetimes. Aggressively freeing them can lead to # undefined behavior (referencing deallocated memory). # # Because we're talking about dynamic memory UB, the following input may # not reproduce bad behavior in all environments. On GBR's system, the # following is evident prior to the bug fix. # # {"name": "CE", "file name": "a", "starting line number": 2, ... } # {"name": "CE", "file name": "\u0090\u0092\u009B\u00CE6V", ... } # # ...where the garbage in the file name varies with every run. input='. .lf 2 a .ds CE \" empty .TS H l l. \& .TH \& .pm CE .TE .pm CE .' output=$(printf '%s\n' "$input" | "$groff" -Zt -ms) echo "$output" | sed -n 2p | grep -Fqx '"file name": "a",' # vim:set autoindent expandtab shiftwidth=2 tabstop=2 textwidth=72: For _me_, I can reproduce it on the command line, but the file name needs to be longer. Something to do with _glibc_'s memory allocator and its buckets of storage may be involved. Again, this is UB so it's squirrely stuff. $ cat string-becomes.roff .ds CE \" empty .TS H l l. \& .TH \& .\"tm GBR 1 .pm CE .TE .\"tm GBR 2 .pm CE $ ./build/test-groff -tz -ms string-becomes.roff {"name": "CE", "file name": "string-becomes.roff", "starting line number": 1, "length": 0} {"name": "CE", "file name": "p\u00C2\u00E5g\u0015V", "starting line number": 1, "length": 0} The weird file name is an artifact of the stochastic process. If I make it longer... $ cp string-becomes.roff string-becomes-empty.roff $ ./build/test-groff -tz -ms string-becomes-empty.roff {"name": "CE", "file name": "string-becomes-empty.roff", "starting line number": 1, "length": 0} {"name": "CE", "file name": "string-becomes-empty.roff", "starting line number": 1, "length": 0} ...the bug hides. (Yes, an enhanced macro dumper that reports the location of its definition is in evidence here. It's not in Savannah yet but will be in my next push.) How did anything so careless get committed? Well, I haven't bisected yet so I don't know if this is _all_ my fault, but that's the safe way to bet. * I was trying to be *careful* by `free()`ing or deleting duplicated strings after using them. * The list of file names a _roff_ input document employs does not necessarily form a simple data structure. Cycles could be involved. For example, a document could `so` itself, many times even, produce output varying based on register or string values or even the page number, and as long as it eventually `ex`its, it's valid. * Sloppy memory handling with respect to file names in particular is much more _visible_ now that we have node and macro dumpers. That said, I spotted the problem in a backtrace that spewed incomplete UTF-8 sequences to the terminal when reporting the file name. Back on the gripping hand, useful backtraces are pretty much new to _groff_ 1.23.0 anyway. The reproducer is actually a radically simplified version of our _doc/pic.ms_ document, ruthelessly pared down to what would reproduce the issue for me. Just about every element you see in the regression test is essential; take any bit out, like use of the _ms_ package or the `TH` call, and the bug hides again...for me. I have a fix in preparation that, unfortunately, leaks the string storage. But I have an STL-based remedy for that in mind. And as noted above, we can't be confident we can free memory allocated by `read_rest_of_line_as_argument()`--when that argument is a file name--until the formatter is about to exit anyway. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66981> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature