Kenneth Zadeck <zad...@naturalbridge.com> writes: > I would like to see some comment to the effect that this to allow > inlining for the common case for widest int and offset int without > inlining the uncommon case for regular wide-int.
OK, how about: /* If the precision is known at compile time to be greater than HOST_BITS_PER_WIDE_INT, we can optimize the single-HWI case knowing that (a) all bits in those HWIs are significant and (b) the result has room for at least two HWIs. This provides a fast path for things like offset_int and widest_int. The STATIC_CONSTANT_P test prevents this path from being used for wide_ints. wide_ints with precisions greater than HOST_BITS_PER_WIDE_INT are relatively rare and there's not much point handling them inline. */ Thanks, Richard > On 11/28/2013 12:38 PM, Richard Sandiford wrote: >> Currently add and sub have no fast path for offset_int and widest_int, >> they just call the out-of-line version. This patch handles the >> single-HWI cases inline. At least on x86_64, this only adds one branch >> per call; the fast path itself is straight-line code. >> >> On the same fold-const.ii testcase, this reduces the number of >> add_large calls from 877507 to 42459. It reduces the number of >> sub_large calls from 25707 to 148. >> >> Tested on x86_64-linux-gnu. OK to install? >> >> Thanks, >> Richard >> >> >> Index: gcc/wide-int.h >> =================================================================== >> --- gcc/wide-int.h 2013-11-28 13:34:19.596839877 +0000 >> +++ gcc/wide-int.h 2013-11-28 16:08:11.387731775 +0000 >> @@ -2234,6 +2234,17 @@ wi::add (const T1 &x, const T2 &y) >> val[0] = xi.ulow () + yi.ulow (); >> result.set_len (1); >> } >> + else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT) >> + && xi.len + yi.len == 2) >> + { >> + unsigned HOST_WIDE_INT xl = xi.ulow (); >> + unsigned HOST_WIDE_INT yl = yi.ulow (); >> + unsigned HOST_WIDE_INT resultl = xl + yl; >> + val[0] = resultl; >> + val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1; >> + result.set_len (1 + (((resultl ^ xl) & (resultl ^ yl)) >> + >> (HOST_BITS_PER_WIDE_INT - 1))); >> + } >> else >> result.set_len (add_large (val, xi.val, xi.len, >> yi.val, yi.len, precision, >> @@ -2288,6 +2299,17 @@ wi::sub (const T1 &x, const T2 &y) >> val[0] = xi.ulow () - yi.ulow (); >> result.set_len (1); >> } >> + else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT) >> + && xi.len + yi.len == 2) >> + { >> + unsigned HOST_WIDE_INT xl = xi.ulow (); >> + unsigned HOST_WIDE_INT yl = yi.ulow (); >> + unsigned HOST_WIDE_INT resultl = xl - yl; >> + val[0] = resultl; >> + val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1; >> + result.set_len (1 + (((resultl ^ xl) & (xl ^ yl)) >> + >> (HOST_BITS_PER_WIDE_INT - 1))); >> + } >> else >> result.set_len (sub_large (val, xi.val, xi.len, >> yi.val, yi.len, precision, >>