On 22 Apr 2009, at 06:57, Seth Willits wrote:

In my app, I import data from potentially very large files. In the first pass, I simply mmap'd the entire file, created a string using CFStringCreateWithBytesNoCopy, and go about my business. This works great until it hits the address limit when it's running as a 32-bit process, so now in the second pass I want to rework it a bit to only mmap a chunk (128 MB) at a time.

Now, if it were simply binary data, I could chop up the file however I wanted, but since the file I'm processing is actually a huge *text* file, I need to mmap an appropriate range so creating the string doesn't fail because a multi-byte character was split down the middle.

Hi Seth,

I think this highlights a significant deficiency in the CFString/ NSString API, which is that it's impossible to get any kind of streaming encoder/decoder (which is really what you want for this kind of task).

Have you considered using libiconv instead to convert to UTF-16, then creating your strings from that? That would give you more control and would mean that you didn't have to guess where the encoder would want to start/finish working on your data (since it will tell you).

I guess ICU might also be a way around this, though iconv() et al. have the significant benefit of being documented and supported API.

Kind regards,

Alastair.

--
http://alastairs-place.net



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to