I chose to define Pmode as PDImode, and write PDI patterns for pointer moves & arithmetic. POINTER_SIZE is 64 bits, UNITS_PER_WORD is 4. FUNCTION_ARG_ADVANCE arranges for both SImode and PDImode values to occupy a single register. I have the port mostly working (passes 90% of execution tests), but find myself painted into a corner in some cases. What currently vexes me is when GCC wants to promote a PDImode register (say r1) to DImode, then needs to truncate down to SImode for some kind of ALU op, say pointer subtraction. The desired quantity is the low-order 32 bits of r1, but GCC thinks the promotion to DImode implies a pair of 32-bit regs (r1, r2) and since this is a big-endian machine, it wants to deliver the low-order bits as the subreg r2.
Maybe you can define TRULY_NOOP_TRUNCATION to be zero for source PDImode and destination SImode, and define a truncatepdisi2 pattern that just throws away the segment. I'm not sure however whether GCC will go through DImode anyway.
Alternatively, maybe you can define extendpdidi2 so that it will put the segment in r1 and the low-order bits in r2.
Paolo