On Tue, Feb 3, 2009 at 11:13 AM, Joar Wingfors <j...@joar.com> wrote: > > On Feb 2, 2009, at 9:55 PM, Michael Ash wrote: > >> It hasn't been addressed because it's not really relevant to the >> question at hand. Yes, you definitely need to either know or be able >> to discover the text encoding of the text files you're dealing with. >> But aside from both being about text files, that question is unrelated >> to the question of how to process a large text file line-by-line. > > > Would a correct implementation not depend on being able to iterate over > characters, and not simply using a fixed step size? I wanted to call > attention to this because I, perhaps incorrectly, assumed that it would be. > I think that providing a general solution to this problem, one that works > for all text encodings, is difficult enough that it should be provided by a > library. That said, most developers (the OP included) probably wouldn't > require a completely general solution, and might be able to cobble together > something that works fine with the data that they have to deal with.
For 99% of the cases, no, you don't have to care about the encoding to do the basic parsing. A \r or \n byte will indicate an actual CR or LF character in ASCII, in any 8-bit ASCII-compatible encoding, in most or all of the language-specific double-byte encodings, and in UTF-8. It will fail with EBCDIC and with UTF-16. Most text processing tools out there assume precisely this sort of blind encoding-agnostic scanning, so character encodings tend to take that into account. It would break a lot of tools to have a 0x0A or 0x0D byte show up in the stream but have a different meaning, so they avoid it. If you anticipate processing UTF-16 files then you'll have to write a completely different code path for that, of course, but they tend to be rare, and it would be entirely reasonable for such a tool to not support UTF-16. Mike _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com