On 06/23/2018 06:26 AM, Dimitar Dimitrov wrote: > On петък, 22 юни 2018 г. 19:41:55 EEST Jakub Jelinek wrote: >> On Fri, Jun 22, 2018 at 11:33:06AM -0600, Jeff Law wrote: >>> On 06/13/2018 12:58 PM, Dimitar Dimitrov wrote: >>>> The PRU load/store instructions can access memory with byte >>>> >>>> granularity for all 30 of its 32-bit GP registers. Examples: >>>> # Load 17 bytes from address r0[0] into registers r10.b1-r14.b2 >>>> lbbo r10.b1, r0, 0, 17 >>>> >>>> # Load 100 bytes from address r28[0] into registers r0-r25 >>>> lbbo r0.b0, r28, 0, 100 >>>> >>>> The load/store multiple patterns declare all subsequent registers >>>> as distinct operands. Hence the need to increase the limit. >> >> Can't you have a look on how other targets, e.g. arm, aarch64, s390x >> etc. handle load/store multiple patterns, e.g. with match_parallel or >> match_par_dup? >> The instructions then don't have dozens of operands, and the predicate >> is just supposed to check everything is the way it should be. > I took arm/ldmstm.md as an inspiration. See attached machine description for > PRU that requires the increase. I omitted this machine-generated MD file from > my first patch set, but per comments will include it in v2. > > PRU has a total of 32 32-bit registers with flexible subregister addressing. > The PRU GCC port represents the register file as 128 individual 8-bit > registers. Rationale: http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html > > Load/store instructions can load anywhere between 1 and 124 consecutive 8-bit > registers. The load/store-multiple patterns seem to require const_int_operand > offsets for each loaded register, hence the explosion of operands. > > I make no distintion for class - patterns accept any GP register. Right, but is that level of generality really all that useful? Based on what I know about the PRU I'd probably stick mostly to 32bit registers and only expose the byte level addressibility when it's clearly a win, particularly for bitfield insertions/extractions. I probably wouldn't expose operations which cross 32bit boundaries, except perhaps for arithmetic through the carry.
I guess my point is I'd like to see a stronger justification for exposing this much of the architecture before bumping up the maximum operand limits. jeff