On Dec 31, 2007 1:41 PM, ChadDavis <[EMAIL PROTECTED]> wrote: > When I run 'ls' on a given directory, some of the file names show a question > mark in the place of a non-supported character. In trying to understand > what is happening, I find that I don't understand a couple of fundamentals. > > 1) what is the default encoding of my debian system?
On new Etch installs, UTF-8 is the default. On older systems, it depends on you locale (I'm not sure if a system upgraded to Etch would be UTF-8 or not). In the US it would be ISO-8859-1 or ISO-8859-15, I think. Use the command "locale" and see what it says. Mine says en_US.UTF-8 > 2) It seems that a file itself doesn't have any encoding as it is sitting on > the hard drive -- its just bytes, right? when a given application picks it > up, that application will try to read it as a certain encoding -- how is > that determiniation made? All files have encoding. Text files do, of course, but so binary files like .jpg or .mp3. Even binary executables and libraries have a specific format (binary executables are in ELF format on non-ancient Linux systems). When a text file is opened, I believe most simple apps try to interpret it based on your systems locale. Some smarter programs may apply fairly complicated heuristics to determine the encoding. Some plain-text-based file types, such as xml, declare the encoding near the beginning of the file. > 3) What is the encoding of the file name? Is this a feature of the > filesystem? This is also based on your locale. Note that if you download a text file that is in, say, Shift-JIS (a common Japanese encoding), the file and perhaps the filename will still be in Shift-JIS. Even if your system is UTF-8 and has Japanese fonts installed, it will not display the file correctly if it simply interprets it based on your locale. There are programs that can convert between encodings, including the "convmv" package, which converts only filenames, the package "utf8-migration-tool" and the "recode" package. > I realize these questions may not be that "smart"; please tell me what I'm > missing if so. Also, point me to documentation if you know of some that > explains all of this. I couldn't find anything on the topic searching the > web or debian docs. For general info start with these wiki pages and some of the other pages they link to: http://en.wikipedia.org/wiki/Locale http://en.wikipedia.org/wiki/Character_encoding If you want more in-depth programmer-oriented info on unicode, check out Joel's article: http://www.joelonsoftware.com/articles/Unicode.html There is more Debian-specific info about charsets, locales, etc. in the Debian Reference section on L10n (Localization) [take out 10 letters]: http://www.debian.org/doc/manuals/debian-reference/ch-tune.en.html#s-l10n and in the Debian i18n (internationalization) [take out 18 letters] Guide: http://www.debian.org/doc/manuals/intro-i18n/index.en.html Cheers, Kelly Clowers -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]