My port failed the DImode part of the rotate regression-tests
(gcc.c-torture/execute/20020508-[123].c). I found that
emit_no_conflict_block() reordered insns gen'd by
expand_doubleword_shift() in a way that violated dependency between
compares and associated conditional-move insns that had the tar
James E Wilson <[EMAIL PROTECTED]> writes:
> Greg McGary wrote:
> > I found that
> > emit_no_conflict_block() reordered insns gen'd by
> > expand_doubleword_shift() in a way that violated dependency between
> > compares and associated conditional-move ins
I have a port for a multi-processor with high-latency memory accesses,
even for cache hits. Each CPU core has a small private scratchpad RAM
with 1 cycle access. I'd like to persuade GCC to use the scratchpad
(I'll probably allocate somewhere between 8 and 32 words) for reload,
rather than stack
Daniel Jacobowitz <[EMAIL PROTECTED]> writes:
> ... Or you could try telling the entire compiler to treat them as
> registers, instead of just reload. That's likely to work as well or
> better.
So, I define these as a separate register class, and only the movM
insn patterns get constraints that
I'm working with a machine that has a memory-increment insn. It's a
network-processor performance hack that allows no-latency accumulation
of statistical counters. The insn sends the increment and address to
the memory controller which does the add, avoiding the usual
long-latency read-increment-
Paul Brook <[EMAIL PROTECTED]> writes:
> It should just work if you have the appropriate movsi pattern/alternative.
> m68k has an memory-increment instruction (aka add :-).
Touche. I've had my head in RISC-land too long... 8^)
G
I'm doing a port for an unusual new machine which is 32-bit RISCy in
every way, except that it has 48-bit pointers. Pointers have a
high-order 16-bit segID and low-order 32-bit seg offset. Most ALU
instructions only work on 32 bits, zeroing the upper 16-bit seg ID in
the result. A few ALU op
I have a port without div or mod machine instructions. I wrote
divmodsi4 patterns that do the libcall directly, hoping that GCC would
recognize the opportunity to use a single divmodsi4 to compute both
quotient and remainder. Alas, GCC calls divmodsi4 twice with the same
divisor and dividend
On 04/26/10 22:09, Ian Lance Taylor wrote:
Greg McGary writes:
I have a port without div or mod machine instructions. I wrote
divmodsi4 patterns that do the libcall directly, hoping that GCC would
recognize the opportunity to use a single divmodsi4 to compute both
quotient and remainder
On 04/28/10 05:58, Michael Matz wrote:
On Tue, 27 Apr 2010, Greg McGary wrote:
(define_insn "*udivmodsi4_libcall"
[(set (reg:SI 4)
(udiv:SI (reg:SI 1)
(reg:SI 2)))
(set (reg:SI 1)
(umod:SI (reg:SI 1)
(reg:SI 2)))
(clobber (reg:SI 2))
reload() > setup_save_areas() > assign_stack_local_1() creates a mem
address whose offset too large to fit into the machine insn's offset
operand. Later, reload() > save_call_clobbered_regs() > insert_save() >
adjust_address_1() > change_address_1() asserts because the address is
not legitimat
On 05/05/10 20:21, Jeff Law wrote:
On 05/05/10 17:45, Greg McGary wrote:
reload()> setup_save_areas()> assign_stack_local_1() creates a mem
address whose offset too large to fit into the machine insn's offset
operand. Later, reload()> save_call_clobbered_regs()
On 05/05/10 21:27, Jeff Law wrote:
On 05/05/10 21:34, Greg McGary wrote:
On 05/05/10 20:21, Jeff Law wrote:
I'm not sure they are ever legitimized -- IIRC caller-save tries to only
generate addressing modes which are safe for precisely this reason.
Apparently not so: c
I'm working on a port that has instructions to move bits between
64-bit floating-point and 64-bit general-purpose regs. I say "bits"
because there's no conversion between float and int: the bit pattern
is unaltered. Therefore, it's possible to use scratch FPRs for
spilling GPRs & vice-versa, and
I'm working on a port that does loads & stores in two phases.
Every load/store is funneled through the intermediate registers "ld" and "st"
standing between memory and the rest of the register file.
Example:
ld=4(rB)
...
...
rC=ld
st=rD
8(rB)=st
rB
On 04/27/12 14:31, Greg McGary wrote:
> I'm working on a port that does loads & stores in two phases.
> Every load/store is funneled through the intermediate registers "ld" and "st"
> standing between memory and the rest of the registe
I'm working on a DSP port whose unit reservations are very sensitive to
operand signature. E.g., for an assembler mnemonic, there can be 35-50
different combinations of operand register classes, each having different
impacts on latencies and function units. For assembler code generation, very
few
On 05/11/12 16:00, Greg McGary wrote:
> My question is this: does it make sense to double MAX_RECOG_ALTERNATIVES so
> that I can use insn attributes to identify operand signatures, or should I use
> another approach?
After some exploration, I don't see that another approach is
When the timing requirements are not met upon queueing an insn with
INSN_EXACT_TICK, the scheduler backtracks. This seems wasteful.
Why not prioritize INSN_EXACT_TICK insns so that we queue them
first on the cycle they need?
I'm working onaport to a VLIW DSP with anexposed pipeline (i.e., no
interlocks). Some operations OPhave as much as 2-cycle latency on values
of the call-preserved regs CPR. E.g., if the callee's epiloguerestores a
CPR in the delay slot of the return instruction, then any OP with that CPR
as input
On 11/25/12 23:33, Maxim Kuvyrkov wrote:
> You essentially need a fix-up pass just before the end of compilation
> (machine-dependent reorg, if memory serves me right) to space instructions
> consuming values from CPRs from the CALL_INSNS that set those CPRs. I.e.,
> for the 99% of compilation
On 11/26/12 12:46, Maxim Kuvyrkov wrote:
> I wonder if "kludgy fixups" refers to the dummy-instruction solution I
> mentioned above. The complete dependence graph is a myth. You cannot have a
> complete dependence graph for a function -- scheduler works on DAG regions
> (and I doubt it will e
I extracted the MFPGPR hunks from Peter Bergner's "[PATCH] Add POWER6
machine description", posted on 2006-11-01 and dropped them into
gcc-4.0.3, but the result fails with "error: insn does not satisfy its
constraints":
.../src/gcc-4.0.3/gcc/config/rs6000/darwin-ldouble.c: In function
'__gcc_
I am revisiting an effort to make the number of lanes for vector segment
load/store a tunable parameter.
A year ago, Robin added minimal and not-yet-tunable
common_vector_cost::segment_permute_[2-8]
Some issues & questions:
* Since this pertains only to segment load/store, why is the word "permu
On Wed, Mar 26, 2025 at 1:44 AM Robin Dapp wrote:
> > You won't see failures in the testsuite. The failures only show-up when I
> > attempt to impose huge costs on NF above threshold. A quick & dirty way
> to
> > expose the bug is apply the appended patch, then observe that you get
> output
> > f
On Tue, Mar 25, 2025 at 2:47 AM Robin Dapp wrote:
> > A year ago, Robin added minimal and not-yet-tunable
> > common_vector_cost::segment_permute_[2-8]
>
> But it is tunable, just not a param? :)
I meant "param" generically, not necessarily a command-line --param=thingy,
though point taken! :)
I have been working on tuning vector transpose within groups of vregs.
The canonical approach is to make multiple passes across pairs of rows,
zipping row pairs first at the API element width, then at double SEW,
continuing to double SEW at each new pass until the width reaches VLEN/2
at the final
27 matches
Mail list logo