On 2017-05-26 08:32, Richard Henderson wrote: > On 05/25/2017 02:04 PM, Aurelien Jarno wrote: > > - if (srclen) { > > - v1 = cpu_ldub_data_ra(env, src, ra); > > + if (*srclen) { > > + v1 = cpu_ldub_data_ra(env, *src, ra); > > } > > - if (destlen) { > > - v2 = cpu_ldub_data_ra(env, dest, ra); > > + if (*destlen) { > > + v2 = cpu_ldub_data_ra(env, *dest, ra); > > } > > if (v1 != v2) { > > @@ -746,16 +743,55 @@ uint32_t HELPER(clcle)(CPUS390XState *env, uint32_t > > r1, uint64_t a2, > > break; > > } > > - if (srclen) { > > - src++; > > - srclen--; > > + if (*srclen) { > > + *src += 1; > > + *srclen -= 1; > > } > > - if (destlen) { > > - dest++; > > - destlen--; > > + if (*destlen) { > > + *dest += 1; > > + *destlen -= 1; > > } > > } > > If you don't access these as pointers in the inner loop like this, the > compiler will give you better code without needing to force the function to > be inlined.
I agree that it would allow to drop the inline from the pointers point of view. That said we still want to the compiler to optimize out the call to cpu_ldusize_data_ra, check_alignment (both introduced in the CLCLU patch), or length limit. My goal of using a common code for clcl/clcle/clclu was mostly to unify code to having different bugs depending on the function. That could also have been done using the preprocessor. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net