On 31.05.2016 8:53, Bruce Evans wrote: > On Tue, 31 May 2016, Andrey Chernov wrote: > >> On 31.05.2016 6:42, Bruce Evans wrote: >>> >>> Er, I already said which types are better -- [u]int_fast32_t here. >> >> [u]int_fast32_t have _at_least_ 32 bits. int32_t in the initial PRNG can >> be changed since does not overflow and involve several calculations, but >> uint_fast32_t is needed just for two operations: > > I think you mean a native uint32_t is needed for 2 operations. > >> *f += *r; >> i = (*f >> 1) & 0x7fffffff; > > This takes 2 operations (add and shift) with native uint32_t. It takes 4 > logical operations (maybe more physically, or less after optimization) > with emulated uint32_t (add, mask to 32 bits (maybe move to another > register to do this), shift, mask to 32 bits). When you write the final > mask explicitly, it is to 31 bits and optimizing this away is especially > easy in both cases. > >> We need to assign values from uint32_t to uint_fast32_t (since array >> size can't be changed), > > FP code using double_t is similar: data in tables should normally be > in doubles since double_t might be too much larger; data in function > parameters is almost always in doubles since APIs are deficient and > don't even support double_t as an arg; then it is best to assign to > a double_t variable since if you just use the double then expressions > using it will promote it to double_t but it is too easy to lose this > expansion too early. It takes extra variables and a little more code > for the assignments, but the extra variables are optimized away in > cases where there is no expansion. > >> do this single operation fast and store them >> back into array of uint32_t. I doubt that much gain can comes from it >> and even pessimization in some cases. Better let compiler do its job >> here. > > It's never a pessimization if the compiler does its job. > > It is good to practice this on a simple 2-step operation. Think of a > multi-step operation where each step requires clipping to 32 bits. > Using uint32_t for the calculation is just a concise way of writing > "& 0xffffffff" after every step (even ones that don't need it). It > is difficult and sometimes impossible for the compiler to optimize > away these masks across a large number of steps. Sometimes this is > easy for the programmer.
The biggest problem so far is that fast types for [u]int32_t are exact _the_same_ as not fast for i386 and amd64, see /usr/include/x86/_types.h Without any gain on major platforms I don't think this change is needed. _______________________________________________ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"