On Mon, Apr 09, 2007 at 09:47:07AM -0700, Andrew Pinski wrote: > On 4/9/07, J.C. Pizarro <[EMAIL PROTECTED]> wrote: > >#include <stddef.h> > > > >void *__allocate_array_OptionA(size_t num, size_t size) { // 1st best > > unsigned long long tmp = (unsigned long long)size * num; > > if (tmp >= 0x0000000080000000ULL) tmp=~size_t(0); > > return operator new[](tmp); > >} > > First this just happens to be the best for x86, what about PPC or > really any embedded target where people are more concern about code > size than say x86.
It's nowhere close to best for x86. But to get the best, you'd need to use assembly language, and the penalty in time is one instruction: insert a jnc (jump short if no carry), with the prediction flag set as "taken", after the mull instruction. This would jump over code to load all-ones into the result. You have to multiply, and the processor tells you if there's an overflow. A general approach would be to have an intrinsic for unsigned multiply with saturation, have a C fallback, and add an efficient implemention of the intrinsic on a per-target basis.