Sunday, November 7, 2004, 11:25:52 AM, Jens Rieks wrote:
> On Sunday 07 November 2004 09:48, Leopold Toetsch wrote:
>> * where exactly is the mismatch coming from?
> Unix uses "\n" to indicate end-of-line, windows uses "\r\n". The problem is,
> that the "perlhist.txt" file is checked in as a text file. I'll recommit it
> as a binary file in the hope that it fixes the problem.
> The root of the problem is the different line ending, I have no idea how
> parrot can deal with it, or if it should deal with it at all.

Here are my thoughts on that topic:  In my mind, "plain old text
files" are a model for text, just as PNG or JPEG are for images.

The model for the text file is a list of lines (strings).  The list is
created by a delimiter (though some programs treat it more like a
terminator).  Thus, for a correct parse one needs to know
  - the encoding
  - the character set
  - the delimiter string
all of which usually default to the current platform, but may be
different.

For example, there are Windows programs that write UTF16 Unicode (see
C:\WINDOWS\wusetup.log on WinXP).  In Europe usually 8bit encoding
with Codepage 1250 is used, except for the "Command Prompt" which uses
Codepage 850 (which leads to fun for example with german umlauts).

For me it boils down to the question whether parrot should support
plain old text files.

Ron



Reply via email to