Hi Everyone, I'm porting a 64-bit algorithm to 32-bit PowerPC (an old PowerMac). The algorithm is simple when 64-bit is available, but it gets a little ugly under 32-bit.
PowerPC has a "Vector Subtract Carryout Unsigned Word" (vsubcuw), https://www.nxp.com/docs/en/reference-manual/ALTIVECPEM.pdf. The altivec intrinsics are vec_vsubcuw and vec_subc. The problem is, I don't know how to use it. I've been experimenting with it but I don't see the use (yet). How does one use vsubcuw to implement a subtract with borrow? Thanks in advance. ========================================== Here's what an "add with carry" looks like. The addc simply adds the carry into the result after transposing the carry bits from columns 1 and 3 to columns 0 and 2. typedef __vector unsigned char uint8x16_p; typedef __vector unsigned int uint32x4_p; ... inline uint32x4_p VecAdd64(const uint32x4_p& vec1, const uint32x4_p& vec2) { // 64-bit elements available at POWER7 with VSX, but addudm requires POWER8 #if defined(_ARCH_PWR8) return (uint32x4_p)vec_add((uint64x2_p)vec1, (uint64x2_p)vec2); #else const uint8x16_p cmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16}; const uint32x4_p zero = {0, 0, 0, 0}; uint32x4_p cy = vec_addc(vec1, vec2); cy = vec_perm(cy, zero, cmask); return vec_add(vec_add(vec1, vec2), cy); #endif } ========================================== Here's what I have for subtract with borrow in terms of addition. There are 4 loads and then 9 instructions. I know it is too inefficient. const uint32x4_p mask = {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff}; const uint8x16_p cmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16}; const uint32x4_p zero = {0, 0, 0, 0}; const uint32x4_p one = {0, 1, 0, 1}; // one's compliment, still need to add 1 uint32x4_p comp = vec_andc(mask, vec2); uint32x4_p cy = vec_addc(one, comp); cy = vec_perm(cy, zero, cmask); comp = vec_add(vec_add(one, comp), cy); cy = vec_addc(vec1, comp); cy = vec_perm(cy, zero, cmask); return vec_add(vec_add(vec1, comp), cy);