On Wed, Apr 8, 2020 at 7:31 AM Mathieu Malaterre <ma...@debian.org> wrote: > > Jeffrey, > > On Wed, Apr 8, 2020 at 11:56 AM Jeffrey Walton <noloa...@gmail.com> wrote: > > > > On Tue, Apr 7, 2020 at 8:27 AM Lennart Sorensen > > <lsore...@csclub.uwaterloo.ca> wrote: > > > > > > On Tue, Apr 07, 2020 at 05:51:54AM -0400, Jeffrey Walton wrote: > > > > Hi Everyone, > > > > > > > > I'm porting a 64-bit algorithm to 32-bit PowerPC (an old PowerMac). > > > > The algorithm is simple when 64-bit is available, but it gets a little > > > > ugly under 32-bit. > > > > > > > > PowerPC has a "Vector Subtract Carryout Unsigned Word" (vsubcuw), > > > > https://www.nxp.com/docs/en/reference-manual/ALTIVECPEM.pdf. The > > > > altivec intrinsics are vec_vsubcuw and vec_subc. > > > > > > > > The problem is, I don't know how to use it. I've been experimenting > > > > with it but I don't see the use (yet). > > > > > > > > How does one use vsubcuw to implement a subtract with borrow? > > > > > > Does your 32 bit powerpc have altivec? A lot do not. It is certainly > > > not a universal feature. As far as I remember, G4 and G5 powermacs have > > > it, but nothing older. > > > > Yes, this is an old PowerMac G4 with Power4. It has a Altivec unit, > > but it is only 32-bit. Add, subtract, shift and rotate (and friends) > > on 64-bit values are missing. > > > > As old as the hardware is (circa 2000), that old PowerPC chip > > outperforms some modern hardware, like Atoms, Celerons and low-end ARM > > cpu's in modern gadgets. > > > > Testing some algorithms, like Simon-128 and Speck-128, show a need for > > Altivec. For example, Integer-based Speck-128 was running at about 70 > > cpb. Altivec-based Speck-128 dropped to 10 cpb even with me doing all > > the 64-bit fixups. (Speck-128 runs around 2.5 cpb when the native > > hardware supports 64-bit operations, like on Power8). > > [Somewhat off-topic here.] > > Did you ever tried crc32 with altivec ? crc32 with altivec in the > kernel is only for ppc64.
No. I think the CRC32 support comes from Power8 and in-core crypto using polynomial multiplies. Here's the fellow who has the reference implementation and tutorial: https://github.com/antonblanchard/crc32-vpmsum. I don't use CRC32 much. I do have GCM mode using polynomial multiples (along with Power8 AES). It runs around 1.3 cpb. Jeff