Thanks for the review!

James Greenhalgh <james.greenha...@arm.com> writes:
> On Fri, Jan 05, 2018 at 11:41:25AM +0000, Richard Sandiford wrote:
>> Here's the patch updated to apply on top of the v8.4 and
>> __builtin_load_no_speculate support.  It also handles the new
>> vec_perm_indices and CONST_VECTOR encoding and uses VNx... names
>> for the SVE modes.
>> 
>> Richard Sandiford <richard.sandif...@linaro.org> writes:
>> > This patch adds support for ARM's Scalable Vector Extension.
>> > The patch just contains the core features that work with the
>> > current vectoriser framework; later patches will add extra
>> > capabilities to both the target-independent code and AArch64 code.
>> > The patch doesn't include:
>> >
>> > - support for unwinding frames whose size depends on the vector length
>> > - modelling the effect of __tls_get_addr on the SVE registers
>> >
>> > These are handled by later patches instead.
>> >
>> > Some notes:
>> >
>> > - The copyright years for aarch64-sve.md start at 2009 because some of
>> >   the code is based on aarch64.md, which also starts from then.
>> >
>> > - The patch inserts spaces between items in the AArch64 section
>> >   of sourcebuild.texi.  This matches at least the surrounding
>> >   architectures and looks a little nicer in the info output.
>> >
>> > - aarch64-sve.md includes a pattern:
>> >
>> >     while_ult<GPI:mode><PRED_ALL:mode>
>> >
>> >   A later patch adds a matching "while_ult" optab, but the pattern
>> >   is also needed by the predicate vec_duplicate expander.
>
> I'm keen to take this. The code is good quality overall, I'm confident in your
> reputation and implementation. There are some parts of the design that I'm
> less happy about, but pragmatically, we should take this now to get the
> behaviour correct, and look to optimise, refactor, and clean-up in future.
>
> Sorry it took me a long time to get to the review. I've got no meaningful
> design concerns here, and certainly nothing so critical that we couldn't
> fix it after the fact in GCC 9 and up.
>
> That said...
>
>>      (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
>
> I'm not a big fan of these sorts of functions which return a char* where
> we've dumped the text we want to print out in the short term. The interface
> (fill a static char[] we can then leak on return) is pretty ugly.

Yeah, it's not pretty, but I think the various possible ways of doing
the addition do justify using output functions here.  The distinction
between INC[BHWD], DEC[BHWD], ADDVL and ADDPL doesn't really affect
anything other than the final output, so it isn't something that
should be exposed as different constraints (for example).

We should probably "just" have a nicer interface for target code
to construct instruction format strings.

> One consideration for future work would be refactoring out aarch64.c - it is
> getting to be too big for my liking (near 18,000 lines).
>
>>      (aarch64_expand_sve_mem_move)
>
> Do we have a good description of how SVE big-endian vectors work, <snip more
> comments - I found the detailed comment at the top of aarch64-sve.md> 
>
> The sort of comment you write later ("see the comment at the head of
> aarch64-sve.md for details") would also be useful here as a reference.

Ah, yeah, will add a reference there too.

>> aarch64_get_reg_raw_mode
>
> Do we assert/warn anywhere for users of __builtin_apply that they are
> fundamentally unsupported?

Not as far as I know.  FWIW, this doesn't affect SVE (yet), because we
don't yet support any types that would be passed in the SVE-specific
part of the registers.

>> offset_4bit_signed_scaled_p
>
> So much code duplication here and in similair functions. Would a single
> interface (unsigned bits, bool signed, bool scaled) let you avoid the many
> identical functions?

We just kept to the existing style here.  I agree it might be a good idea
to consolidate them, but personally I'd prefer to keep the signed/scaled
distinction in the function name, since it's more readable than booleans
and shorter than a new enum.

>> aarch64_evpc_rev_local 
>
> I'm likely missing something obvious, but what is the distinction you're
> drawing between global and local? Could you comment it?

"global" reverses the whole vector: the first and last elements switch
places.  "local" reverses within groups of N consecutive elements but
not between them.

But yet again names are probably my downfall here. :-)  I'm happy to
call them something else instead.  Either way I'll expand the comments.

>> aarch64-sve.md - scheduling types
>
> None of the instructions here have types for scheduling. That's going to
> make for a future headache. Adding them to the existing scheduling types
> is going to cause all sorts of trouble when building GCC (we already have
> too many types for some compilers to handle the structure!). We'll need
> to finds a solution to how we'll direct scheduling for SVE.

Yeah.  I didn't want to add scheduling attributes now without scheduling
descriptions to go with them, since there's no way of knowing what the
division should be.

>> aarch64-sve.md - predicated operands
>
> It is a shame this ends up being so ugly and requiring UNSPEC_MERGE_PTRUE
> everywhere. That will block a lot of useful optimisation.

I don't think it blocks many in practice (at least, not the kind that
really do belong in RTL rather than gimple).  Most instructions map
directly to an optab and those that don't do combine OK in the
UNSPEC_MERGE_PTRUE form (e.g. AND + NOT -> BIC).

> Otherwise, this is OK for trunk. I'm happy to take it as is, and have the
> above suggestions applied as follow-ups if you think they are worth doing.

Thanks.  If we can reach quick agreement about the offset checks then
I'll roll in that change.

Richard

Reply via email to