URL:
  <https://savannah.gnu.org/bugs/?66981>

                 Summary: [troff] use-after-free bug in file name reporting
                   Group: GNU roff
               Submitter: gbranden
               Submitted: Thu 03 Apr 2025 09:48:01 AM UTC
                Category: Core
                Severity: 4 - Important
              Item Group: Incorrect behaviour
                  Status: In Progress
                 Privacy: Public
             Assigned to: gbranden
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Thu 03 Apr 2025 09:48:01 AM UTC By: G. Branden Robinson <gbranden>
I haven't bisected it yet but I think this bug is of recent vintage.

Here's the regression test script in its current form, which does most of the
explaining I can do.


#!/bin/sh
#
# Copyright (C) 2025 Free Software Foundation, Inc.
#
# This file is part of groff.
#
# groff is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free
# Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# groff is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#

groff="${abs_top_builddir:-.}/test-groff"

# Regression-test Savannah #XXXXX.
#
# File name strings in GNU troff tend to be dynamically allocated and to
# have highly variable lifetimes.  Aggressively freeing them can lead to
# undefined behavior (referencing deallocated memory).
#
# Because we're talking about dynamic memory UB, the following input may
# not reproduce bad behavior in all environments.  On GBR's system, the
# following is evident prior to the bug fix.
#
# {"name": "CE", "file name": "a", "starting line number": 2, ... }
# {"name": "CE", "file name": "\u0090\u0092\u009B\u00CE6V", ... }
#
# ...where the garbage in the file name varies with every run.

input='.
.lf 2 a
.ds CE \" empty
.TS H
l
l.
\&
.TH
\&
.pm CE
.TE
.pm CE
.'

output=$(printf '%s\n' "$input" | "$groff" -Zt -ms)
echo "$output" | sed -n 2p | grep -Fqx '"file name": "a",'

# vim:set autoindent expandtab shiftwidth=2 tabstop=2 textwidth=72:


For _me_, I can reproduce it on the command line, but the file name needs to
be longer.  Something to do with _glibc_'s memory allocator and its buckets of
storage may be involved.  Again, this is UB so it's squirrely stuff.


$ cat string-becomes.roff
.ds CE \" empty
.TS H
l
l.
\&
.TH
\&
.\"tm GBR 1
.pm CE
.TE
.\"tm GBR 2
.pm CE
$ ./build/test-groff -tz -ms string-becomes.roff
{"name": "CE", "file name": "string-becomes.roff", "starting line number": 1,
"length": 0}
{"name": "CE", "file name": "p\u00C2\u00E5g\u0015V", "starting line number":
1, "length": 0}


The weird file name is an artifact of the stochastic process.  If I make it
longer...


$ cp string-becomes.roff string-becomes-empty.roff
$ ./build/test-groff -tz -ms string-becomes-empty.roff
{"name": "CE", "file name": "string-becomes-empty.roff", "starting line
number": 1, "length": 0}
{"name": "CE", "file name": "string-becomes-empty.roff", "starting line
number": 1, "length": 0}


...the bug hides.

(Yes, an enhanced macro dumper that reports the location of its definition is
in evidence here.  It's not in Savannah yet but will be in my next push.)

How did anything so careless get committed?  Well, I haven't bisected yet so I
don't know if this is _all_ my fault, but that's the safe way to bet.

* I was trying to be *careful* by `free()`ing or deleting duplicated strings
after using them.

* The list of file names a _roff_ input document employs does not necessarily
form a simple data structure.  Cycles could be involved.  For example, a
document could `so` itself, many times even, produce output varying based on
register or string values or even the page number, and as long as it
eventually `ex`its, it's valid.

* Sloppy memory handling with respect to file names in particular is much more
_visible_ now that we have node and macro dumpers.  That said, I spotted the
problem in a backtrace that spewed incomplete UTF-8 sequences to the terminal
when reporting the file name.  Back on the gripping hand, useful backtraces
are pretty much new to _groff_ 1.23.0 anyway. The reproducer is actually a
radically simplified version of our _doc/pic.ms_ document, ruthelessly pared
down to what would reproduce the issue for me.  Just about every element you
see in the regression test is essential; take any bit out, like use of the
_ms_ package or the `TH` call, and the bug hides again...for me.

I have a fix in preparation that, unfortunately, leaks the string storage.
But I have an STL-based remedy for that in mind.  And as noted above, we can't
be confident we can free memory allocated by
`read_rest_of_line_as_argument()`--when that argument is a file name--until
the formatter is about to exit anyway.







    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66981>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to