On Tue, Mar 29, 2011 at 11:52 PM, Michael Snoyman <[email protected]>wrote:
> Hi all, > > I think this is a well-known issue: it seems that there is no > character decoding performed on the values returned from the functions > in System.Directory (getDirectoryContents specifically). I could > manually do something like (utf8Decode . S8.pack), but that presumes > that the character encoding on the system in question is UTF8. So two > questions: > > * Is there a package out there that handles all the gory details for > me automatically, and simply returns a properly decoded String (or > Text)? > * If not, is there a standard way to determine the character encoding > used by the filesystem, short of hard-coding in character encodings > used by the major ones? > I started to write a thoughtful reply, but I found that the answers here sum up everything I was going to say: http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux This same issue comes up from time to time for darcs and, if I recall correctly, the solution has been to treat unix file paths as arbitrary bytes whenever possible and to escape non-ascii compatible bytes when they occur. Otherwise it can be hard to encode them in textual patch descriptions or xml (where an encoding is required and I believe utf8 is a standard default). I wish you luck. It's not as easy problem, at least on unix. I've heard that windows has a much easier time here as MS has provided a standard for it. Jason
_______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
