On 07/07/2018 12:13 AM, Jakub Kicinski wrote: > Jiong says: > > NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per > step, therefore we need 2 steps for u16 and 4 steps for u32. > > We also need one start instruction to initialize the sequence and one or > two instructions to fetch the result depending on either you need the high > halve of u32 multiplication. > > For ALU64, if either operand is beyond u32's value range, we reject it. One > thing to note, if the source operand is BPF_K, then we need to check "imm" > field directly, and we'd reject it if it is negative. Because for ALU64, > "imm" (with s32 type) is expected to be sign extended to s64 which NFP mul > doesn't support. For ALU32, it is fine for "imm" be negative though, > because the result is 32-bits and here is no difference on the low halve > of result for signed/unsigned mul, so we will get correct result. > > NFP doesn't have integer divide instruction, this patch set uses reciprocal > algorithm (the basic one, reciprocal_div) to emulate it. > > For each u32 divide, we would need 11 instructions to finish the operation. > > 7 (for multiplication) + 4 (various ALUs) = 11 > > Given NFP only supports multiplication no bigger than u32, we'd require > divisor and dividend no bigger than that as well. > > Also eBPF doesn't support signed divide and has enforced this on C language > level by failing compilation. However LLVM assembler hasn't enforced this, > so it is possible for negative constant to leak in as a BPF_K operand > through assembly code, we reject such cases as well. > > Meanwhile reciprocal_div.h only implemented the basic version of: > > "Division by Invariant Integers Using Multiplication" > - Torbjörn Granlund and Peter L. Montgomery > > This patch set further implements the optimized version (Figure 4.2 in the > paper) inside existing reciprocal_div.h. When the divider is even and the > calculated reciprocal magic number doesn't fit u32, we could reduce the > required ALU instructions from 4 to 2 or 1 for some cases. > > The advanced version requires more complex calculation to get the > reciprocal multiplier and other control variables, but then could reduce > the required emulation operations. It makes sense to use it for JIT divide > code generation (for example eBPF JIT backends) for which we are willing to > trade performance of JITed code with that of host. > > v2: > - add warning in l == 32 code path. (Song Liu/Jakub) > - jit separate insn sequence for l == 32. (Jakub/Edwin) > - should use unrestricted operand for mul. > - once place should use "1U << exp" instead of "1 << exp". > > Jiong Wang (6): > lib: reciprocal_div: implement the improved algorithm on the paper > mentioned > nfp: bpf: rename umin/umax to umin_src/umax_src > nfp: bpf: copy range info for all operands of all ALU operations > nfp: bpf: support u16 and u32 multiplications > nfp: bpf: support u32 divide using reciprocal_div.h > nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h
Applied to bpf-next, thanks everyone!