On 30 March 2011 18:07, Michael Snoyman <[email protected]> wrote: > On Wed, Mar 30, 2011 at 9:26 AM, Jason Dagit <[email protected]> wrote: >> >> >> On Tue, Mar 29, 2011 at 11:52 PM, Michael Snoyman <[email protected]> >> wrote: >>> >>> Hi all, >>> >>> I think this is a well-known issue: it seems that there is no >>> character decoding performed on the values returned from the functions >>> in System.Directory (getDirectoryContents specifically). I could >>> manually do something like (utf8Decode . S8.pack), but that presumes >>> that the character encoding on the system in question is UTF8. So two >>> questions: >>> >>> * Is there a package out there that handles all the gory details for >>> me automatically, and simply returns a properly decoded String (or >>> Text)? >>> * If not, is there a standard way to determine the character encoding >>> used by the filesystem, short of hard-coding in character encodings >>> used by the major ones? >> >> I started to write a thoughtful reply, but I found that the answers here sum >> up everything I was going to say: >> http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux >> This same issue comes up from time to time for darcs and, if I recall >> correctly, the solution has been to treat unix file paths as arbitrary bytes >> whenever possible and to escape non-ascii compatible bytes when they occur. >> Otherwise it can be hard to encode them in textual patch descriptions or >> xml (where an encoding is required and I believe utf8 is a standard >> default). >> I wish you luck. It's not as easy problem, at least on unix. I've heard >> that windows has a much easier time here as MS has provided a standard for >> it. >> Jason > > Thanks to you (and everyone else) for the informative responses. For > now, I've simply hard-coded in UTF-8 encoding for all non-Windows > systems. I'm not sure how this will play with OSes besides Windows and > Linux (especially Mac), but it's a good stop-gap measure. > > I *do* think it would be incredibly useful to provide alternatives to > all the standard operations on FilePath which used opaque datatypes > and properly handles filename encoding. I noticed John Millikin's > system-filepath package[1]. Do people have experience with it? It > seems that adding a few functions like getDirectoryContents, plus > adding a version of toString which performs some character decoding, > would get us pretty far. > > Michael > > [1] http://hackage.haskell.org/package/system-filepath > > _______________________________________________ > Haskell-Cafe mailing list > [email protected] > http://www.haskell.org/mailman/listinfo/haskell-cafe >
It would also be great to have a package which combines the proper encoding/decoding of filepaths of the system-filepath package with the type-safety of the pathtype package: http://hackage.haskell.org/package/pathtype Bas _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
