On Thu, Nov 14, 2024 at 11:44:36AM +0200, Eli Zaretskii wrote: > > Date: Thu, 14 Nov 2024 10:07:11 +0100 > > From: Patrice Dumas <pertu...@free.fr> > > > > On Thu, Nov 14, 2024 at 07:42:02AM +0000, Gavin Smith wrote: > > > On Thu, Nov 14, 2024 at 08:47:25AM +0200, Eli Zaretskii wrote: > > > > I'm not sure I follow: do you intend to use "int\xc3\xa9rnal.txt" as > > > > an actual file name on disk? In that case, please note that a > > > > backslash cannot be part of a file name on Windows: it's a directory > > > > separator. If you want an escape character, it should be something > > > > else, like # for example. > > > > > > I had meant that - but I had forgotten that a backslash shouldn't be > > > used in a file name on Windows. > > > > # is a comment in shell, maybe a % would be better? > > Yes, % is another possibility.
I've written a Perl program to rename a list of files provided on standard input, using maintain/copy_change_file_name_encoding.pl as a starting point (this was not as simple for me as I thought it might be, as both directory and ordinary files could have to be renamed). In tests/run_parser_all.sh, the output files are listed using the "find" command, which are then piped to the Perl program. With this change, the tests can be updated with "make -k check" followed by "make copy-tests", followed by committing the changes to the test results ("for f in */res_parser; do git add $f ; done"). This leads to 27 files being renamed, which therefore would not be in the tar distribution or tracked in git. (This is not counting the "tex-html" tests some of which would also be affected.) It does not deal with the issue of skipping such tests, though. diff --git a/tp/tests/escape_file_names.pl b/tp/tests/escape_file_names.pl new file mode 100755 index 0000000000..5d66871e2e --- /dev/null +++ b/tp/tests/escape_file_names.pl @@ -0,0 +1,80 @@ +#! /usr/bin/env perl + +# escape_file_name.pl: read list of file names from stdin and rename +# any with non-ASCII characters +# +# Copyright 2024 Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, +# or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +use strict; +use utf8; + +use File::Copy; +use File::Basename; +use File::Spec; +use File::Path; + +my @files; + +# Read all of input first +while (<>) { + chomp; + push @files, $_; +} + +# Sort files in forward order. This should mean we create directories +# before any files they contain. +@files = sort @files; + +my @moved_files; + +for my $file (@files) { + if ($file =~ /[^[:ascii:]]/) { + unshift @moved_files, $file; + + my $ascii_name = ''; + for my $char (split('', $file)) { + if (ord($char) < 0x80) { + $ascii_name .= $char; + } else { + $ascii_name .= sprintf("%%%x", ord($char)); + } + } + + my $dest_path = $ascii_name; + + if (-d $file) { + mkdir $dest_path; + } else { + my $copy_succeeded = copy($file, $dest_path); + if (not $copy_succeeded) { + warn "could not move $file: $!\n"; + exit(1); + } + } + } +} + +# After copying the files, remove the files from the original locations +# in reverse order. +for my $delete (@moved_files) { + if (-d $delete) { + File::Path::rmtree($delete); + } else { + unlink $delete; + } +} + +exit(0); diff --git a/tp/tests/run_parser_all.sh b/tp/tests/run_parser_all.sh index a25562002b..f5c4f0be7a 100755 --- a/tp/tests/run_parser_all.sh +++ b/tp/tests/run_parser_all.sh @@ -178,6 +178,12 @@ post_process_output () fi } +# ensure only ASCII filenames are used in output +escape_file_names () +{ + find "${outdir}${dir}" | ${srcdir}/escape_file_names.pl +} + LC_ALL=C; export LC_ALL LANGUAGE=en; export LANGUAGE @@ -443,6 +449,7 @@ while read line; do rm -rf "${raw_outdir}$dir" post_process_output + escape_file_names if test "z$res_dir_used" != 'z' ; then diff $DIFF_OPTIONS -r "$res_dir_used" "${outdir}$dir" 2>>$logfile > "$testdir/$diffs_dir/$diff_base.diff"