On Tue, 28 Feb 2012, Tijl Coosemans wrote:

Log:
 Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace
 amd64/i386/pc98 endian.h with stubs.

 In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been
 resolved by reimplementing the macro in terms of __bswap32(x). As a side
 effect __bswap64_var(x) is now implemented using two bswap instructions on
 i386 and should be much faster. __bswap32_const(x) has been reimplemented
 in terms of __bswap16(x) for consistency.

The files now have enough in common for me to be happy with centralizing
them.  The trip through <machine> would have to be avoided to complete
the cleanup.  I don't know how to do that.  Maybe install can produce
a direct symlink.  We should try harder to not install both x86/foo.h
and machine/foo.h, so pathnames <x86> and <machine> don't start being
used in applications and then having to be supported for compatibility.

The patch is unreadable due to significant reimplementations, even without
the move.  So I looked at the resulting x86/endian.h:

% ...
% #ifdef __cplusplus
% extern "C" {
% #endif

This ifdef is old style bug:
- it is a hard-coded spelling of __BEGIN_DECLS
- it is placed around the whole file.  Most of the file consists of macro
  definitions.  __BEGIN_DECLS would obviously make no sense around macro
  definitions.
- the rest of the file consists of inline function declarations.  Since
  these aren't extern, 'extern "C"' makes no sense for them, though it
  isn't so clear that __BEGIN_DECLS makes no sense for them.
- the log message for the commit that added it is not good, but at least
  says that C++ support is added.

% #if defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE_BUILTIN_CONSTANT_P)

This ifdef is still misplaced.  Defining the 'const' versions doesn't
require the gcc builtin, and it is further from requiring gcc asm.

% % #define __bswap64_const(_x) \
%       (((_x) >> 56) |                           \
%       (((_x) >> 40) & (0xffULL << 8)) |       \
%       (((_x) >> 24) & (0xffULL << 16)) |      \
%       (((_x) >> 8) & (0xffULL << 24)) |       \
%       (((_x) << 8) & (0xffULL << 32)) |       \
%       (((_x) << 24) & (0xffULL << 40)) |      \
%       (((_x) << 40) & (0xffULL << 48)) |      \
%       ((_x) << 56))

I think this can be simplified in the same way as the `var' case should
have been simplified, by building it up out of 2 __bswap32_const()s...

% % #define __bswap32_const(_x) \
%       (((_x) >> 24) |                           \
%       (((_x) & (0xff << 16)) >> 8) |          \
%       (((_x) & (0xff << 8)) << 8) |           \
%       ((_x) << 24))

... and this can be built out of 2 __bswap16_const()s', though it is
short enough already.  Building up would be essential for __bswap_2^N
and good for N > 64.

% % #define __bswap16_const(_x) (__uint16_t)((_x) << 8 | (_x) >> 8) % % static __inline __uint64_t
% __bswap64_var(__uint64_t __x)
% {
% % return __bswap64_const(__x);
% }

This seems to be a pessimization for amd64, and doesn't match your
claim that it now uses 2 bswap instructions on i386.  It is the old
i386 version.  amd64 used to use a single 64-bit bswap in asm here.
Last time I looked (only on i386), gcc was incapable of turning either
__bswap64_const() or __bswap32_const() into 2 or 1 bswap instructions,
but generated pessimial code using shifts and masks.  Only clang
generated the 2 or 1 bswaps.

There is a PR about bswap64 being pessimal on i386.  The following seems
to be optimal for gcc on i386.  It uses the building-up method.  See the
PR for other details.

@ #define       _KERNEL
@ #define       I486_CPU
@ @ #include <machine/endian.h>
@ #include <stdint.h>
@ @ uint64_t x, y; @ @ int
@ main(void)
@ {
@       /*
@        * Follow is a better __bswap64().  It doesn't even need const and
@        * var subcases.
@        */
@       y = __bswap32(x) | (uint64_t)__bswap32(x >> 32) << 32;
@       return (0);
@ }

Expanding on its comment, we probably don't even need the const subcase
for consts, but can just use the above expression for consts too.
However, this only works right for i386.   For amd64, there are the
following cases:
- const case, compiler either gcc or clang: the above expression should
  work
- variable case, gcc: needs the inline with a 64-bit bswap, unless gcc
  has become smarter
- variable case, clang: the above expression won't work if __bswap32()
  uses an inline asm with a 32-bit bswap like it does now, since clang
  will be forced to generate 2 32-bit bswaps.  To recover to single
  64-bit bswap, either write it in inline asm for clang too, or maybe
  use the general 64-bit 'const' expression (written like it is now,
  not as in my version above).  I think clang translates the big const
  expression into a single 64-bit bswap, not 2 32-bit ones like it does
  on i386.

% %

Extra blank line (old style bug duplicated).

% ... % #define __bswap64(_x) \
%       (__builtin_constant_p(_x) ?                     \
%           __bswap64_const((__uint64_t)(_x)) : __bswap64_var(_x))

Can be replaced by a "building-up" expression except for complications
with clang on amd64.

% % #define __bswap32(_x) \
%       (__builtin_constant_p(_x) ?                     \
%           __bswap32_const((__uint32_t)(_x)) : __bswap32_var(_x))

Can be replaced by a "building-up" expression except for complications
for gcc.  (We already killed the asm for __bswap16_var(), so the
optimization problem now is to turn the general expression for
__bswap32() into a single 32-bit bswap instruction.  Only clang does
this automatically.

% % #define __bswap16(_x) \
%       (__builtin_constant_p(_x) ?                     \
%           __bswap16_const((__uint16_t)(_x)) : __bswap16_var(_x))

Need a primitive somewhere -- no building up for this :-).

% #ifdef __cplusplus
% }
% #endif

Brackets the 'extern "C"' style bug. but looks uglier with a brace all
by itself.

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to