On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <et...@stoneleaf.us> wrote: > On 06/05/2014 11:30 AM, Marko Rauhamaa wrote: >> >> >> How text is represented is very different from whether text is a >> fundamental data type. A fundamental text file is such that ordinary >> operating system facilities can't see inside the black box (that is, >> they are *not* encoded as far as the applications go). > > Of course they are. It may be an ASCII-encoding of some flavor or other, or > something really (to me) strange -- but an encoding is most assuredly in > affect.
Allow me to explain what I think Marko's getting at here. In most file systems, a file exists on the disk as a set of sectors of data, plus some metadata including the file's actual size. When you ask the OS to read you that file, it goes to the disk, reads those sectors, truncates the data to the real size, and gives you those bytes. It's possible to mount a file as a directory, in which case the physical representation is very different, but the file still appears the same. In that case, the OS goes reading some part of the file, maybe decompresses it, and gives it to you. Same difference. These files still contain bytes. A "fundamental text file" would be one where, instead of reading and writing bytes, you read and write Unicode text. Since the hard disk still works with sectors and bytes, it'll still be stored as such, but that's an implementation detail; and you could format your disk UTF-8 or UTF-16 or FSR or anything you like, and the only difference you'd see is performance. This could certainly be done, in theory. I don't know how well it'd fit with any of the popular OSes of today, but it could be done. And these files would not have an encoding; their on-platter representations would, but that's purely implementation - the text that you wrote out and the text that you read in are the same text, and there's been no encoding visible. ChrisA -- https://mail.python.org/mailman/listinfo/python-list