Package: nodejs Version: 12.22.12~dfsg-1~deb11u3 Severity: normal The node(1) command-line interface is documented to take the filename of a script to execute. Here's how well it understands filenames:
$ echo 'process.stdout.write("sadness\n");' > $'L\xef\xbf\xbdon.js' $ echo 'process.stdout.write("joy\n");' > $'L\xe9on.js' $ node $'L\xe9on.js' sadness $ In this example I've created two different script files, with different but related names, I've told node(1) to run one of them, and it's run the wrong one. One can see via strace(1) that node(1) doesn't use the correct filename for any file operations at all; it completely substitutes the erroneous filename. The use of the wrong filename doesn't depend on the file of that wrong name existing: if there is no file of that name then node(1) will fail to find any script to execute, and will generate an error message that shows the erroneous filename. This bug occurs whenever the supplied filename doesn't have the syntax of UTF-8 encoding of text, containing only Unicode codepoints of which node approves. The nature of the manglement is that each octet that doesn't look like valid UTF-8 gets replaced with the three octets $'\xef\xbf\xbd', which is the UTF-8 encoding of U+fffd "replacement character". It appears that the supplied filename is being UTF-8 decoded, with decoding errors muffled and the replacement character silently substituted in, and then the lossily-decoded filename is re-encoded in correct UTF-8, and the result of that process is the filename that gets actually used. As far as I can see the manglement comes only from UTF-8 decoding: there isn't also any Unicode normalisation. This could cause a security problem in some circumstances that are only slightly strange. Suppose a privileged program is using node(1) to run scripts that partly derive from untrusted user input. Suppose the program has created an innocuous script to run, has permitted an untrusted user to determine part of the filename for that script, and has ensured that the supplied filename is innocuous from a Unix point of view but isn't preventing the use of filenames with high-half octets. Suppose further that an untrusted user can cause the same program to create another file, of content that the program doesn't intend to execute, under the mangled name, which is equally innocuous from a Unix point of view. Then it could execute code determined by a malicious user, due entirely to node(1) misinterpreting a filename. I'm not aware of any specific program that can be exploited in this way, and I haven't based the declared severity of this bug report on this security issue. Preferably, node(1) should use the script file of the name that was supplied on the command line. It must pass to the file syscalls the same octet string that was supplied as a command line argument, without assuming anything about its syntax. If it cannot be made to handle arbitrary filenames correctly, then node(1) must at least detect that it can't handle the specified filename. It must signal an error on any filename it can't handle, and not use any mangled form of the filename for any purpose. Furthermore, this limitation must be documented. -zefram