On Thu, Nov 14, 2024 at 12:26:45AM +0100, Patrice Dumas wrote: > On Wed, Nov 13, 2024 at 10:08:10PM +0000, Gavin Smith wrote: > > I feel like this should be quite easy to achieve for the input > > files, at least, as we already do something similar for an input > > file with a Latin-1 name, which we place in a directory in the > > build directory ("built_input"). This is the rule to build > > "input_file_names_recoded_stamp.txt" in tp/tests/Makefile.am. > > > > I've started work for something similar for files such as > > tp/tests/encoded/osé_utf8.texi - this will take me some time to finish > > properly. > > You could probably reuse tp/maintain/copy_change_file_name_encoding.pl, > though the substitution is hardcoded in that file, maybe substitutions > within a hash could be selected with a command-line option. As run in > Makefile.am, a side effect of copy_change_file_name_encoding.pl is to > set input_file_names_recoded_stamp.txt which is later on used to > determine if tests with 'Need recoded file names' in their specification > in list-of-tests are run. Doing the same for encoding of file names to > UTF-8 could be right if it fails when it should on MS-Windows.
Here are the non-ASCII file names left in the tar file, now the results files are all escaped: texinfo-7.1.91/tp/t/results/formats_encodings/accented_character_in_file_name/res_info/osé_utf8.info texinfo-7.1.91/tp/tests/encoded/çss.css texinfo-7.1.91/tp/tests/encoded/osé_utf8_no_setfilename.texi texinfo-7.1.91/tp/tests/encoded/an_ïmage.png texinfo-7.1.91/tp/tests/encoded/osé_utf8.texi texinfo-7.1.91/tp/tests/encoded/txt_çimage.txt texinfo-7.1.91/tp/tests/encoded/cêss.css texinfo-7.1.91/tp/tests/included_akçentêd.texi texinfo-7.1.91/tp/tests/tex_html/tex_encodé_utf8.texi texinfo-7.1.91/tp/tests/many_input_files/input_files/dir_înclùde/ texinfo-7.1.91/tp/tests/many_input_files/input_files/dir_înclùde/file_image.png texinfo-7.1.91/tp/tests/many_input_files/input_files/dir_înclùde/included_file.texi Would it be okay to remove the t/formats_encodings.t accented_character_in_file_name test, as it may be redundant with the tp/tests tests? The comment above it states: # This tests is also (and maybe more) a test for the test code. # In particular it shows that the file names in error messages # are doubly encoded to utf-8. It does not prevent tests to succeed as # both the reference and the checked result are doubly encoded. # A similar test is also used in tests/encoded, but here we have the # tree in addition. Otherwise, extra complexity is needed for just this one test, so it would not in fact serve as a good "test for the test code".