On 6/15/20 1:53 PM, Lijun Pan wrote: >>> +static inline void uint128_add(uint64_t ah, uint64_t al, uint64_t bh, >>> + uint64_t bl, uint64_t *rh, uint64_t *rl, uint64_t *ca) >>> +{ >>> + __uint128_t a = (__uint128_t)ah << 64 | (__uint128_t)al; >>> + __uint128_t b = (__uint128_t)bh << 64 | (__uint128_t)bl; >>> + __uint128_t r = a + b; >>> + >>> + *rh = (uint64_t)(r >> 64); >>> + *rl = (uint64_t)r; >>> + *ca = (~a < b); >>> +} >> >> This is *not* what I had in mind at all. >> >> int128.h should be operating on Int128, and *not* component uint64_t values. > > Should uint128_add() be included in a new file called uint128.h? or still at > host-utils.h?
If you want this sort of specific operation, you should leave it in target/ppc/. I had been hoping that you could make use of Int128 as-is, or with minimal adjustment in the same style. > vmsumudm/vmsumcud operate as follows: > 1. 128-bit prod1 = (high 64 bits of a) * (high 64 bits of b), // I reuse > mulu64() > 2. 128-bit prod2 = (high 64 bits of b) * (high 64 bits of b), // I reuse > mulu64() > 3. 128-bit result = prod1 + prod2 + c; // I added addu128() in v1, renamed it > to uint128_add() in v2 Really? That seems a very odd computation. Your code, > + prod1 = (__uint128_t)ah * (__uint128_t)bh; > + prod2 = (__uint128_t)al * (__uint128_t)bl; > + r = prod1 + prod2 + c; is slightly different, but still very odd. Why would we be adding the intermediate 128th bit of the 256-bit product (prod1, bit 0) with the 0th bit of the 256-bit product (prod2, bit 0). Unfortunately, I can't find the v3.1 spec online yet, so I can't look at this myself. What is the instruction supposed to produce? > To better understand your request, may I ask you several questions: > 1. keep mulsum() in target/ppc/int_helper.c? Probably. > If so, it will inevitably have #ifdef CONFIG_INT128 #else #endif in that > function. No, you don't have to ifdef. You can use uint64_t alone and not rely on compiler support for __uint128_t at all. r~