Hi everyone,

I can't remember if this has been brought up in the fpc-devel mailing list yet, 
but as discussed under 
Feature Ideas ( 
http://wiki.lazarus.freepascal.org/Feature_Ideas#Per-type_Byte_Alignment ) and 
in bug number 
32780 ( https://bugs.freepascal.org/view.php?id=32780 ), I would like to 
propose the ability to more finely 
control the byte alignment of variables on a per-type basis.  While this can be 
controlled to an extent with 
compiler directives, it is somewhat messy and might cause conflicts in some 
situations, especially where 
third-party modules are concerned.  Such a feature would be extremely useful 
for the Intel SSE and AVX 
extensions, where reading and writing to memory that isn't aligned to a 16-byte 
boundary incurs a 
performance penalty, and as hinted in the bug report, would allow for C-style 
inline intrinsics to be fully 
supported (at least on Linux - Free Pascal will need support of "vectorcall" on 
Windows as well).

Apparently Delphi supports such a feature, but is not well-documented (can 
someone confirm this?).  For 
syntax, I would recommend what Delphi apparently uses, which is to append 
"align #" after the type 
definition, with # being a power of 2.

type
  AlignedSingle = Single align 16; { Single is aligned to at least a 4-byte 
boundary because of its size, 
but an AlignedSingle will be on a 16-byte boundary }

  AlignedDouble = Double align 16; { Double is aligned to at least an 8-byte 
boundary because of its size, 
but an AlignedDouble will be on a 16-byte boundary }

  M128 = packed record
    case Integer of
    0: (Scalar: Single);
    1: (X, Y, Z, W: Single);
    2: (E: array[0..3] of Single);
  end align 16;

  TVector4f = M128; { Is also aligned to a 16-byte boundary because M128 is }

  TVectorArray = array of M128 align 32; { Tighter restrictions than M128, so 
should be fine, although this 
should probably align Var[0] to the 32-byte boundary rather than Var itself. 
One would use such a definition 
if passing the data into YMM registers, which require alignment to a 32-byte 
boundary, but where there may 
be an odd number of vectors }

There are some nuances to consider, namely where typecasting is concerned 
(while typecasting from an aligned 
to an unaligned type is fine, what about the reverse? And let's not get started 
with pointers to such 
types!), and since this talks about Intel x86 and x86-64 in particular, what 
happens if it's used on another 
platform?  Are there other platforms that support memory alignment, and if not, 
should a warning or error be 
raised for the appearance of 'align', or should it be ignored?

Just as an example of an intrinsic (on 64-bit Linux) so one can take full 
advantage of SSE and AVX (because 
there will always be cases where the compiler won't create the best machine 
code no matter how good it is):

function _sse_addps(Input1, Input2: M128): M128; assembler; nostackframe; 
inline;
asm
  ADDPS XMM0, XMM1 { Intel syntax - AT&T would be "addps %xmm1,%xmm0" }
end;

Mind you, when it comes to inlining these intrinsics, it depends on how smart 
the compiler is in assigning 
variables to the XMM registers and switching them around in the intrinsic 
subroutines.  For Windows, 
"vectorcall" is required (see https://bugs.freepascal.org/view.php?id=32781 and 
http://wiki.lazarus.freepascal.org/Feature_Ideas#.22vectorcall.3B.22_modifier_for_Win32_and_Win64
 for more 
information) because the standard Microsoft calling convention does not 
properly take advantage of MM 
registers.  For 32-bit Linux I'm not sure what would be the best approach, 
except to perhaps adopt 
Microsoft's "vectorcall" just because there doesn't seem to be another 
appropriate 32-bit standard.

Gareth aka. Kit
_______________________________________________
fpc-devel maillist  -  [email protected]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to