emit_no_conflict_block breaks some conditional moves

2005-04-20 Thread Greg McGary
My port failed the DImode part of the rotate regression-tests (gcc.c-torture/execute/20020508-[123].c). I found that emit_no_conflict_block() reordered insns gen'd by expand_doubleword_shift() in a way that violated dependency between compares and associated conditional-move insns that had the tar

Re: emit_no_conflict_block breaks some conditional moves

2005-04-23 Thread Greg McGary
James E Wilson <[EMAIL PROTECTED]> writes: > Greg McGary wrote: > > I found that > > emit_no_conflict_block() reordered insns gen'd by > > expand_doubleword_shift() in a way that violated dependency between > > compares and associated conditional-move ins

How to use a fast scratchpad-RAM for fill/spill ?

2005-05-11 Thread Greg McGary
I have a port for a multi-processor with high-latency memory accesses, even for cache hits. Each CPU core has a small private scratchpad RAM with 1 cycle access. I'd like to persuade GCC to use the scratchpad (I'll probably allocate somewhere between 8 and 32 words) for reload, rather than stack

Re: How to use a fast scratchpad-RAM for fill/spill ?

2005-05-11 Thread Greg McGary
Daniel Jacobowitz <[EMAIL PROTECTED]> writes: > ... Or you could try telling the entire compiler to treat them as > registers, instead of just reload. That's likely to work as well or > better. So, I define these as a separate register class, and only the movM insn patterns get constraints that

Insn for direct increment of memory?

2005-09-24 Thread Greg McGary
I'm working with a machine that has a memory-increment insn. It's a network-processor performance hack that allows no-latency accumulation of statistical counters. The insn sends the increment and address to the memory controller which does the add, avoiding the usual long-latency read-increment-

Re: Insn for direct increment of memory?

2005-09-24 Thread Greg McGary
Paul Brook <[EMAIL PROTECTED]> writes: > It should just work if you have the appropriate movsi pattern/alternative. > m68k has an memory-increment instruction (aka add :-). Touche. I've had my head in RISC-land too long... 8^) G

How to deal with 48-bit pointers and 32-bit integers

2009-08-12 Thread Greg McGary
I'm doing a port for an unusual new machine which is 32-bit RISCy in every way, except that it has 48-bit pointers. Pointers have a high-order 16-bit segID and low-order 32-bit seg offset. Most ALU instructions only work on 32 bits, zeroing the upper 16-bit seg ID in the result. A few ALU op

redundant divmodsi4 not optimized away

2010-04-26 Thread Greg McGary
I have a port without div or mod machine instructions. I wrote divmodsi4 patterns that do the libcall directly, hoping that GCC would recognize the opportunity to use a single divmodsi4 to compute both quotient and remainder. Alas, GCC calls divmodsi4 twice with the same divisor and dividend

Re: redundant divmodsi4 not optimized away

2010-04-27 Thread Greg McGary
On 04/26/10 22:09, Ian Lance Taylor wrote: Greg McGary writes: I have a port without div or mod machine instructions. I wrote divmodsi4 patterns that do the libcall directly, hoping that GCC would recognize the opportunity to use a single divmodsi4 to compute both quotient and remainder

Re: redundant divmodsi4 not optimized away

2010-04-28 Thread Greg McGary
On 04/28/10 05:58, Michael Matz wrote: On Tue, 27 Apr 2010, Greg McGary wrote: (define_insn "*udivmodsi4_libcall" [(set (reg:SI 4) (udiv:SI (reg:SI 1) (reg:SI 2))) (set (reg:SI 1) (umod:SI (reg:SI 1) (reg:SI 2))) (clobber (reg:SI 2))

where are caller-save addresses legitimized?

2010-05-05 Thread Greg McGary
reload() > setup_save_areas() > assign_stack_local_1() creates a mem address whose offset too large to fit into the machine insn's offset operand. Later, reload() > save_call_clobbered_regs() > insert_save() > adjust_address_1() > change_address_1() asserts because the address is not legitimat

Re: where are caller-save addresses legitimized?

2010-05-05 Thread Greg McGary
On 05/05/10 20:21, Jeff Law wrote: On 05/05/10 17:45, Greg McGary wrote: reload()> setup_save_areas()> assign_stack_local_1() creates a mem address whose offset too large to fit into the machine insn's offset operand. Later, reload()> save_call_clobbered_regs()

Re: where are caller-save addresses legitimized?

2010-05-07 Thread Greg McGary
On 05/05/10 21:27, Jeff Law wrote: On 05/05/10 21:34, Greg McGary wrote: On 05/05/10 20:21, Jeff Law wrote: I'm not sure they are ever legitimized -- IIRC caller-save tries to only generate addressing modes which are safe for precisely this reason. Apparently not so: c

insns for register-move between general and floating

2006-03-21 Thread Greg McGary
I'm working on a port that has instructions to move bits between 64-bit floating-point and 64-bit general-purpose regs. I say "bits" because there's no conversion between float and int: the bit pattern is unaltered. Therefore, it's possible to use scratch FPRs for spilling GPRs & vice-versa, and

IRA and two-phase load/store

2012-04-27 Thread Greg McGary
I'm working on a port that does loads & stores in two phases. Every load/store is funneled through the intermediate registers "ld" and "st" standing between memory and the rest of the register file. Example: ld=4(rB) ... ... rC=ld st=rD 8(rB)=st rB

Re: IRA and two-phase load/store

2012-04-27 Thread Greg McGary
On 04/27/12 14:31, Greg McGary wrote: > I'm working on a port that does loads & stores in two phases. > Every load/store is funneled through the intermediate registers "ld" and "st" > standing between memory and the rest of the registe

Maybe expand MAX_RECOG_ALTERNATIVES ?

2012-05-11 Thread Greg McGary
I'm working on a DSP port whose unit reservations are very sensitive to operand signature. E.g., for an assembler mnemonic, there can be 35-50 different combinations of operand register classes, each having different impacts on latencies and function units. For assembler code generation, very few

Re: Maybe expand MAX_RECOG_ALTERNATIVES ?

2012-05-11 Thread Greg McGary
On 05/11/12 16:00, Greg McGary wrote: > My question is this: does it make sense to double MAX_RECOG_ALTERNATIVES so > that I can use insn attributes to identify operand signatures, or should I use > another approach? After some exploration, I don't see that another approach is

INSN_EXACT_TICK & scheduler backtrack

2012-09-13 Thread Greg McGary
When the timing requirements are not met upon queueing an insn with INSN_EXACT_TICK, the scheduler backtracks. This seems wasteful. Why not prioritize INSN_EXACT_TICK insns so that we queue them first on the cycle they need?

Dependences for call-preserved regs on exposed pipeline target?

2012-11-25 Thread Greg McGary
I'm working onaport to a VLIW DSP with anexposed pipeline (i.e., no interlocks). Some operations OPhave as much as 2-cycle latency on values of the call-preserved regs CPR. E.g., if the callee's epiloguerestores a CPR in the delay slot of the return instruction, then any OP with that CPR as input

Re: Dependences for call-preserved regs on exposed pipeline target?

2012-11-26 Thread Greg McGary
On 11/25/12 23:33, Maxim Kuvyrkov wrote: > You essentially need a fix-up pass just before the end of compilation > (machine-dependent reorg, if memory serves me right) to space instructions > consuming values from CPRs from the CALL_INSNS that set those CPRs. I.e., > for the 99% of compilation

Re: Dependences for call-preserved regs on exposed pipeline target?

2012-11-26 Thread Greg McGary
On 11/26/12 12:46, Maxim Kuvyrkov wrote: > I wonder if "kludgy fixups" refers to the dummy-instruction solution I > mentioned above. The complete dependence graph is a myth. You cannot have a > complete dependence graph for a function -- scheduler works on DAG regions > (and I doubt it will e

Trouble with powerpc64 mfpgpr patch

2007-07-12 Thread Greg McGary
I extracted the MFPGPR hunks from Peter Bergner's "[PATCH] Add POWER6 machine description", posted on 2006-11-01 and dropped them into gcc-4.0.3, but the result fails with "error: insn does not satisfy its constraints": .../src/gcc-4.0.3/gcc/config/rs6000/darwin-ldouble.c: In function '__gcc_

[RISC-V] vector segment load/store width as a riscv_tune_param

2025-03-24 Thread Greg McGary
I am revisiting an effort to make the number of lanes for vector segment load/store a tunable parameter. A year ago, Robin added minimal and not-yet-tunable common_vector_cost::segment_permute_[2-8] Some issues & questions: * Since this pertains only to segment load/store, why is the word "permu

Re: [RISC-V] vector segment load/store width as a riscv_tune_param

2025-03-26 Thread Greg McGary
On Wed, Mar 26, 2025 at 1:44 AM Robin Dapp wrote: > > You won't see failures in the testsuite. The failures only show-up when I > > attempt to impose huge costs on NF above threshold. A quick & dirty way > to > > expose the bug is apply the appended patch, then observe that you get > output > > f

Re: [RISC-V] vector segment load/store width as a riscv_tune_param

2025-03-25 Thread Greg McGary
On Tue, Mar 25, 2025 at 2:47 AM Robin Dapp wrote: > > A year ago, Robin added minimal and not-yet-tunable > > common_vector_cost::segment_permute_[2-8] > > But it is tunable, just not a param? :) I meant "param" generically, not necessarily a command-line --param=thingy, though point taken! :)

[RISC-V][RVV] wide bitfield insertion & extractions to/from vector regs

2025-07-24 Thread Greg McGary
I have been working on tuning vector transpose within groups of vregs. The canonical approach is to make multiple passes across pairs of rows, zipping row pairs first at the API element width, then at double SEW, continuing to double SEW at each new pass until the width reaches VLEN/2 at the final