On Wed, Mar 30, 2011 at 9:26 AM, Jason Dagit <[email protected]> wrote: > > > On Tue, Mar 29, 2011 at 11:52 PM, Michael Snoyman <[email protected]> > wrote: >> >> Hi all, >> >> I think this is a well-known issue: it seems that there is no >> character decoding performed on the values returned from the functions >> in System.Directory (getDirectoryContents specifically). I could >> manually do something like (utf8Decode . S8.pack), but that presumes >> that the character encoding on the system in question is UTF8. So two >> questions: >> >> * Is there a package out there that handles all the gory details for >> me automatically, and simply returns a properly decoded String (or >> Text)? >> * If not, is there a standard way to determine the character encoding >> used by the filesystem, short of hard-coding in character encodings >> used by the major ones? > > I started to write a thoughtful reply, but I found that the answers here sum > up everything I was going to say: > http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux > This same issue comes up from time to time for darcs and, if I recall > correctly, the solution has been to treat unix file paths as arbitrary bytes > whenever possible and to escape non-ascii compatible bytes when they occur. > Otherwise it can be hard to encode them in textual patch descriptions or > xml (where an encoding is required and I believe utf8 is a standard > default). > I wish you luck. It's not as easy problem, at least on unix. I've heard > that windows has a much easier time here as MS has provided a standard for > it. > Jason
Thanks to you (and everyone else) for the informative responses. For now, I've simply hard-coded in UTF-8 encoding for all non-Windows systems. I'm not sure how this will play with OSes besides Windows and Linux (especially Mac), but it's a good stop-gap measure. I *do* think it would be incredibly useful to provide alternatives to all the standard operations on FilePath which used opaque datatypes and properly handles filename encoding. I noticed John Millikin's system-filepath package[1]. Do people have experience with it? It seems that adding a few functions like getDirectoryContents, plus adding a version of toString which performs some character decoding, would get us pretty far. Michael [1] http://hackage.haskell.org/package/system-filepath _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
