On Fri, Jan 05, 2018 at 11:41:25AM +0000, Richard Sandiford wrote: > Here's the patch updated to apply on top of the v8.4 and > __builtin_load_no_speculate support. It also handles the new > vec_perm_indices and CONST_VECTOR encoding and uses VNx... names > for the SVE modes. > > Richard Sandiford <richard.sandif...@linaro.org> writes: > > This patch adds support for ARM's Scalable Vector Extension. > > The patch just contains the core features that work with the > > current vectoriser framework; later patches will add extra > > capabilities to both the target-independent code and AArch64 code. > > The patch doesn't include: > > > > - support for unwinding frames whose size depends on the vector length > > - modelling the effect of __tls_get_addr on the SVE registers > > > > These are handled by later patches instead. > > > > Some notes: > > > > - The copyright years for aarch64-sve.md start at 2009 because some of > > the code is based on aarch64.md, which also starts from then. > > > > - The patch inserts spaces between items in the AArch64 section > > of sourcebuild.texi. This matches at least the surrounding > > architectures and looks a little nicer in the info output. > > > > - aarch64-sve.md includes a pattern: > > > > while_ult<GPI:mode><PRED_ALL:mode> > > > > A later patch adds a matching "while_ult" optab, but the pattern > > is also needed by the predicate vec_duplicate expander.
I'm keen to take this. The code is good quality overall, I'm confident in your reputation and implementation. There are some parts of the design that I'm less happy about, but pragmatically, we should take this now to get the behaviour correct, and look to optimise, refactor, and clean-up in future. Sorry it took me a long time to get to the review. I've got no meaningful design concerns here, and certainly nothing so critical that we couldn't fix it after the fact in GCC 9 and up. That said... > (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p) I'm not a big fan of these sorts of functions which return a char* where we've dumped the text we want to print out in the short term. The interface (fill a static char[] we can then leak on return) is pretty ugly. One consideration for future work would be refactoring out aarch64.c - it is getting to be too big for my liking (near 18,000 lines). > (aarch64_expand_sve_mem_move) Do we have a good description of how SVE big-endian vectors work, <snip more comments - I found the detailed comment at the top of aarch64-sve.md> The sort of comment you write later ("see the comment at the head of aarch64-sve.md for details") would also be useful here as a reference. > aarch64_get_reg_raw_mode Do we assert/warn anywhere for users of __builtin_apply that they are fundamentally unsupported? > offset_4bit_signed_scaled_p So much code duplication here and in similair functions. Would a single interface (unsigned bits, bool signed, bool scaled) let you avoid the many identical functions? > aarch64_evpc_rev_local I'm likely missing something obvious, but what is the distinction you're drawing between global and local? Could you comment it? > aarch64-sve.md - scheduling types None of the instructions here have types for scheduling. That's going to make for a future headache. Adding them to the existing scheduling types is going to cause all sorts of trouble when building GCC (we already have too many types for some compilers to handle the structure!). We'll need to finds a solution to how we'll direct scheduling for SVE. > aarch64-sve.md - predicated operands It is a shame this ends up being so ugly and requiring UNSPEC_MERGE_PTRUE everywhere. That will block a lot of useful optimisation. Otherwise, this is OK for trunk. I'm happy to take it as is, and have the above suggestions applied as follow-ups if you think they are worth doing. Thanks, James