On Sunday 10 November 2024 00:14:42 Pali Rohár wrote: > On Saturday 09 November 2024 21:30:53 Lasse Collin wrote: > > On 2024-11-09 LIU Hao wrote: > > > Really, I don't think this should be fixed in the CRT. It should be > > > fixed by sanitizing the result in `GetCommandLineA()`. Reverting the > > > commit makes sense if Microsoft will fix it sooner or latter. > > > > My point comment approving a revert was in context that then a better > > fix will be done. For example, including a GUI dialog to display the > > error in GUI apps. > > > > > I have a crazy idea now. Does it make sense to overwrite `_acmdln` > > > (for MSVCRT) or `*__p__acmdln()` (for UCRT) with a sanitized string, > > > so existent argument parsing may be reused? > > It looks like that both _acmdln and _wcmdln are initialized in CRT DLL > entry point. And these variables are used by all other calls, > GetCommandLineA() or GetCommandLineW() are not used later. > > So from this quick look, it should be enough to change _acmdln in > mingw-w64 startup code as early as possible and then __getmainargs() > should work fine (it also uses _acmdln and not GetCommandLineA(), at > least in msvcrt.dll).
I looked also on UCRT source code and seems that this should work. UCRT's _configure_narrow_argv() also takes command line string from _acmdln. And _acmdln is initialized in UCRT DLL entry point via GetCommandLineA(). So in my opinion overwriting content of _acmdln in EXE entry point should be enough and __getmainargs() then would work correctly. Note that both _acmdln and _wcmdln are always initialized, for both ANSI and UNICODE builds. So as a simple sanitization, something like should be enough? Iterate over (multibyte) string _acmdln and for every found double quote check that it exists also at _wcmdln[iter] (where iter is iteration in multibyte _acmdln string). If double quote is in *_acmdln_iter but not in _wcmdln[iter] then change *_acmdln_iter to some other character. This could ensure that any code which parses _acmdln will see only the original double quotes and not best fits of double quotes. > > If wildcard expansion is enabled, things can still go wrong if a > > wildcard matches a filename that cannot be converted losslessly to the > > active code page. > > Maybe stupid question, but what happens when you try to list folder > which contains files which names in active code page are all same? > Imagine that you have an application which does not use argv[] at all, > it list files in the current directory and from every file prints for > example first byte. What would happen in this case? Is not here same > problem as with wildcard expansion? > > > If wildcard expansion is disabled, then it should be enough to verify > > that GetCommandLineW() can be losslessly converted to the active code > > page. Then the existing narrow parser should be safe. > > > > -- > > Lasse Collin _______________________________________________ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public