On 9 Feb 2014, at 15:53, Greg Parker <gpar...@apple.com> wrote: > On Feb 9, 2014, at 12:19 AM, Gerriet M. Denkmann <gerr...@mdenkmann.de> wrote: >> The real app (which I am trying to optimise) has actually two loops: one is >> counting, the other one is modifying. Which seems to be good news. >> >> But I would really like to understand what I should do. Trial and error (or >> blindly groping in the mist) is not really my preferred way of working. > > Optimizing small loops like this is a black art. Very small effects become > critically important, such as the alignment of your loop instructions or the > associativity of that CPU's L1 cache.
So it seems that my test app is not of much use. The real loop looks like: NSUInteger nbrBytes = ... (big, some GB) unsigned char *bitField = calloc( nbrBytes, sizeof( unsigned char) ); NSUInteger len = ... might be rather big, so I tried to use dispatch_apply NSUInteger incr = ... might be as small as 3, or much bigger NSUInteger bitPointer = ... // bitPointer + len * incr < nbrBytes * 8 for( NSUInteger i = 0; i < len; i++ ) { unsigned char bitIndex = bitPointer & 0x7; NSUInteger byteIndex = bitPointer >> 3; unsigned char mask = maskP[ bitIndex ]; // mask = 0x1 << bitIndex; bitField[byteIndex] |= mask; bitIndex += incr; }; I looked at Accelerate, but it seems not to fit. I am also looking at OpenCL, but have not yet understood, whether this would help with my problem. Kind regards, Gerriet. _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com