Marko Rauhamaa <[email protected]> writes: > David Kastrup <[email protected]>: > >> Marko Rauhamaa <[email protected]> writes: >>> [email protected] (Ludovic Courtès): >>>> Guile assumes its command-line arguments are UTF-8-encoded and >>>> decodes them accordingly. >>> >>> I'm afraid that choice (which Python made, as well) was a bad one >>> because Linux doesn't guarantee UTF-8 purity. >> >> Have you looked at the error messages? They are all perfect UTF-8. As >> was the command line locale. > > I was responding to Ludovic. > >> Apparently, Guile can open the file just fine, and it sees the command >> line just fine as encoded in utf-8. > > My problem is when it is not valid UTF-8. > >> So I really, really, really suggest that before people post their >> theories that they actually bother cross-checking them with Guile. > > Well, execute these commands from bash: > > $ touch $'\xee' > $ touch xyz > $ ls -a > . .. ''$'\356' xyz
We are not talking about file names not encoded in UTF-8. It is well-known that Guile is unable to work with strings in UTF-8-encoding when their byte-pattern is not valid UTF-8. This is a red herring. The problem is not that Guile is unable to deal with badly encoded UTF-8 file names. The problem is that Guile is unable to deal with properly encoded UTF-8 file names when it is supposed to execute them from the command line. > Then, execute this guile program: > > ======================================================================== > (let ((dir (opendir "."))) > (let loop () > (let ((filename (readdir dir))) > (if (not (eof-object? filename)) > (begin > (if (access? filename R_OK) > (format #t "~s\n" filename)) > (loop)))))) > ======================================================================== > > It outputs: > > ".." > "." > "xyz" > > skipping a file. This is a security risk. Files like these appear easily > when extracting zip files, for example. I am surprised this does not just throw a bad encoding exception. But at any rate, this cannot easily be fixed since Guile uses libraries for encoding/decoding that cannot deal reproducibly with improper byte patterns. The problem here is that Guile cannot even deal with _properly_ encoded UTF-8 file names on the command line. -- David Kastrup
