Hi everyone,
I can't remember if this has been brought up in the fpc-devel mailing list yet,
but as discussed under
Feature Ideas (
http://wiki.lazarus.freepascal.org/Feature_Ideas#Per-type_Byte_Alignment ) and
in bug number
32780 ( https://bugs.freepascal.org/view.php?id=32780 ), I would like to
propose the ability to more finely
control the byte alignment of variables on a per-type basis. While this can be
controlled to an extent with
compiler directives, it is somewhat messy and might cause conflicts in some
situations, especially where
third-party modules are concerned. Such a feature would be extremely useful
for the Intel SSE and AVX
extensions, where reading and writing to memory that isn't aligned to a 16-byte
boundary incurs a
performance penalty, and as hinted in the bug report, would allow for C-style
inline intrinsics to be fully
supported (at least on Linux - Free Pascal will need support of "vectorcall" on
Windows as well).
Apparently Delphi supports such a feature, but is not well-documented (can
someone confirm this?). For
syntax, I would recommend what Delphi apparently uses, which is to append
"align #" after the type
definition, with # being a power of 2.
type
AlignedSingle = Single align 16; { Single is aligned to at least a 4-byte
boundary because of its size,
but an AlignedSingle will be on a 16-byte boundary }
AlignedDouble = Double align 16; { Double is aligned to at least an 8-byte
boundary because of its size,
but an AlignedDouble will be on a 16-byte boundary }
M128 = packed record
case Integer of
0: (Scalar: Single);
1: (X, Y, Z, W: Single);
2: (E: array[0..3] of Single);
end align 16;
TVector4f = M128; { Is also aligned to a 16-byte boundary because M128 is }
TVectorArray = array of M128 align 32; { Tighter restrictions than M128, so
should be fine, although this
should probably align Var[0] to the 32-byte boundary rather than Var itself.
One would use such a definition
if passing the data into YMM registers, which require alignment to a 32-byte
boundary, but where there may
be an odd number of vectors }
There are some nuances to consider, namely where typecasting is concerned
(while typecasting from an aligned
to an unaligned type is fine, what about the reverse? And let's not get started
with pointers to such
types!), and since this talks about Intel x86 and x86-64 in particular, what
happens if it's used on another
platform? Are there other platforms that support memory alignment, and if not,
should a warning or error be
raised for the appearance of 'align', or should it be ignored?
Just as an example of an intrinsic (on 64-bit Linux) so one can take full
advantage of SSE and AVX (because
there will always be cases where the compiler won't create the best machine
code no matter how good it is):
function _sse_addps(Input1, Input2: M128): M128; assembler; nostackframe;
inline;
asm
ADDPS XMM0, XMM1 { Intel syntax - AT&T would be "addps %xmm1,%xmm0" }
end;
Mind you, when it comes to inlining these intrinsics, it depends on how smart
the compiler is in assigning
variables to the XMM registers and switching them around in the intrinsic
subroutines. For Windows,
"vectorcall" is required (see https://bugs.freepascal.org/view.php?id=32781 and
http://wiki.lazarus.freepascal.org/Feature_Ideas#.22vectorcall.3B.22_modifier_for_Win32_and_Win64
for more
information) because the standard Microsoft calling convention does not
properly take advantage of MM
registers. For 32-bit Linux I'm not sure what would be the best approach,
except to perhaps adopt
Microsoft's "vectorcall" just because there doesn't seem to be another
appropriate 32-bit standard.
Gareth aka. Kit
_______________________________________________
fpc-devel maillist - [email protected]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel