Bug#1030344: filename on command line gets mangled

Zefram Fri, 03 Feb 2023 03:00:16 -0800

Package: nodejs
Version: 12.22.12~dfsg-1~deb11u3
Severity: normal

The node(1) command-line interface is documented to take the filename
of a script to execute.  Here's how well it understands filenames:


$ echo 'process.stdout.write("sadness\n");' > $'L\xef\xbf\xbdon.js'
$ echo 'process.stdout.write("joy\n");' > $'L\xe9on.js'
$ node $'L\xe9on.js'
sadness
$

In this example I've created two different script files, with different
but related names, I've told node(1) to run one of them, and it's run
the wrong one.  One can see via strace(1) that node(1) doesn't use the
correct filename for any file operations at all; it completely substitutes
the erroneous filename.  The use of the wrong filename doesn't depend on
the file of that wrong name existing: if there is no file of that name
then node(1) will fail to find any script to execute, and will generate
an error message that shows the erroneous filename.

This bug occurs whenever the supplied filename doesn't have the syntax of
UTF-8 encoding of text, containing only Unicode codepoints of which node
approves.  The nature of the manglement is that each octet that doesn't
look like valid UTF-8 gets replaced with the three octets $'\xef\xbf\xbd',
which is the UTF-8 encoding of U+fffd "replacement character".  It appears
that the supplied filename is being UTF-8 decoded, with decoding
errors muffled and the replacement character silently substituted in,
and then the lossily-decoded filename is re-encoded in correct UTF-8,
and the result of that process is the filename that gets actually used.
As far as I can see the manglement comes only from UTF-8 decoding:
there isn't also any Unicode normalisation.

This could cause a security problem in some circumstances that are only
slightly strange.  Suppose a privileged program is using node(1) to run
scripts that partly derive from untrusted user input.  Suppose the program
has created an innocuous script to run, has permitted an untrusted user
to determine part of the filename for that script, and has ensured that
the supplied filename is innocuous from a Unix point of view but isn't
preventing the use of filenames with high-half octets.  Suppose further
that an untrusted user can cause the same program to create another file,
of content that the program doesn't intend to execute, under the mangled
name, which is equally innocuous from a Unix point of view.  Then it
could execute code determined by a malicious user, due entirely to node(1)
misinterpreting a filename.  I'm not aware of any specific program that
can be exploited in this way, and I haven't based the declared severity
of this bug report on this security issue.

Preferably, node(1) should use the script file of the name that was
supplied on the command line.  It must pass to the file syscalls the
same octet string that was supplied as a command line argument, without
assuming anything about its syntax.

If it cannot be made to handle arbitrary filenames correctly, then
node(1) must at least detect that it can't handle the specified filename.
It must signal an error on any filename it can't handle, and not use
any mangled form of the filename for any purpose.  Furthermore, this
limitation must be documented.

-zefram

Bug#1030344: filename on command line gets mangled

Reply via email to