On Tue, Feb 3, 2009 at 11:13 AM, Joar Wingfors <j...@joar.com> wrote:
>
> On Feb 2, 2009, at 9:55 PM, Michael Ash wrote:
>
>> It hasn't been addressed because it's not really relevant to the
>> question at hand. Yes, you definitely need to either know or be able
>> to discover the text encoding of the text files you're dealing with.
>> But aside from both being about text files, that question is unrelated
>> to the question of how to process a large text file line-by-line.
>
>
> Would a correct implementation not depend on being able to iterate over
> characters, and not simply using a fixed step size? I wanted to call
> attention to this because I, perhaps incorrectly, assumed that it would be.
> I think that providing a general solution to this problem, one that works
> for all text encodings, is difficult enough that it should be provided by a
> library. That said, most developers (the OP included) probably wouldn't
> require a completely general solution, and might be able to cobble together
> something that works fine with the data that they have to deal with.

For 99% of the cases, no, you don't have to care about the encoding to
do the basic parsing. A \r or \n byte will indicate an actual CR or LF
character in ASCII, in any 8-bit ASCII-compatible encoding, in most or
all of the language-specific double-byte encodings, and in UTF-8. It
will fail with EBCDIC and with UTF-16.

Most text processing tools out there assume precisely this sort of
blind encoding-agnostic scanning, so character encodings tend to take
that into account. It would break a lot of tools to have a 0x0A or
0x0D byte show up in the stream but have a different meaning, so they
avoid it.

If you anticipate processing UTF-16 files then you'll have to write a
completely different code path for that, of course, but they tend to
be rare, and it would be entirely reasonable for such a tool to not
support UTF-16.

Mike
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to