I have a port for a multi-processor with high-latency memory accesses, even for cache hits. Each CPU core has a small private scratchpad RAM with 1 cycle access. I'd like to persuade GCC to use the scratchpad (I'll probably allocate somewhere between 8 and 32 words) for reload, rather than stack slots which have much higher latency. I have some ill-formed ideas about how to do this, which could involve describing these as another class of register, only movable in/out of general registers. I'm still trying to understand secondary-reload well enough to determine if that's the mechanism I want.
Comments & suggestions are welcome! Pithy clues (e.g., "Look at the port for machine XYZ") are fine. I can dig-out the details if given broad hints. Greg