Hi Srinath,
On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:
Hello,
This patches series is to support Arm MVE ACLE intrinsics.
Please refer to Arm reference manual [1] and MVE intrinsics [2] for
more details.
Please refer to Chapter 13 MVE ACLE [3] for MVE intrinsics concepts.
This patch series depends on upstream patches "Armv8.1-M Mainline
Security Extension" [4],
"CLI and multilib support for Armv8.1-M Mainline MVE extensions" [5]
and "support for Armv8.1-M
Mainline scalar shifts" [6].
[1]
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914
[2]
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
[3]
https://static.docs.arm.com/101028/0009/Q3-ACLE_2019Q3_release-0009.pdf?_ga=2.239684871.588348166.1573726994-1501600630.1548848914
[4] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html
[5] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.html
[6] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01194.html
Srinath Parvathaneni(38):
[PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.
[PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with
unary operand.
[PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][1/5x]: MVE store intrinsics.
[PATCH][ARM][GCC][2/5x]: MVE load intrinsics.
[PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix.
[PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix.
[PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte,
halfword, or word from memory.
[PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads
half word and word or double word from memory.
[PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half
word or word to memory.
[PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores
an half word, word and double word to memory.
[PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus
operator.
[PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics.
[PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup
intrinsics with writeback.
[PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store
intrinsics with writeback.
[PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care)
variant.
[PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across
beats" and "beat-wise substract".
[PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and
deinterleaving load intrinsics and also aliases to vstr and vldr
intrinsics.
[PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane.
[PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics.
[PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry
intrinsics.
Thank you for working on these.
I will reply to individual patches with more targeted comments.
As this is a fairly large amount of code, here's my high-level view:
The MVE intrinsics spec has more complexities than the Neon intrinsics one:
* It needs support for both the user-namespace versions, and the __arm_*
ones.
* There are also overloaded forms that in C are implemented using _Generic.
The above two facts make for a rather bulky and messy arm_mve.h
implementation.
In the case of the _Generic usage we hit the performance problems
reported in PR c/91937.
Ideally, I'd like to see the frontend parts of these intrinsics
implemented in a similar way to the SVE ACLE
(https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00413.html)
i.e. have the compiler inject the right functions into the language and
do overload resolution through the appropriate hooks, thus keeping the
(unavoidable) complexity in the backend rather than arm_mve.h
That being said, this is a major feature that I would very much like to
see in GCC 10 and the current implementation, outside of the new .md
file and arm_mve.h file, doesn't disturb anything in the backend it
shouldn't.
That is, implementing the SVE ACLE approach shouldn't require any risky
backend surgery, just ripping out a chunk of code and replacing it with
another chunk of code.
So I'll accept the current approach for GCC 10 as long as we improve the
frontend parts for theĀ GCC 11 timeframe.
My reviews on the individual patches will be in this context.
Thanks,
Kyrill
Regards,
Srinath.
Entire patch series attached to cover letter.