On 07/07/2018 12:13 AM, Jakub Kicinski wrote:
> Jiong says:
> 
> NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
> step, therefore we need 2 steps for u16 and 4 steps for u32.
> 
> We also need one start instruction to initialize the sequence and one or
> two instructions to fetch the result depending on either you need the high
> halve of u32 multiplication.
> 
> For ALU64, if either operand is beyond u32's value range, we reject it. One
> thing to note, if the source operand is BPF_K, then we need to check "imm"
> field directly, and we'd reject it if it is negative.  Because for ALU64,
> "imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
> doesn't support. For ALU32, it is fine for "imm" be negative though,
> because the result is 32-bits and here is no difference on the low halve
> of result for signed/unsigned mul, so we will get correct result.
> 
> NFP doesn't have integer divide instruction, this patch set uses reciprocal
> algorithm (the basic one, reciprocal_div) to emulate it.
> 
> For each u32 divide, we would need 11 instructions to finish the operation.
> 
>    7 (for multiplication) + 4 (various ALUs) = 11
> 
> Given NFP only supports multiplication no bigger than u32, we'd require
> divisor and dividend no bigger than that as well.
> 
> Also eBPF doesn't support signed divide and has enforced this on C language
> level by failing compilation. However LLVM assembler hasn't enforced this,
> so it is possible for negative constant to leak in as a BPF_K operand
> through assembly code, we reject such cases as well.
> 
> Meanwhile reciprocal_div.h only implemented the basic version of:
> 
>    "Division by Invariant Integers Using Multiplication"
>                           - Torbjörn Granlund and Peter L. Montgomery
> 
> This patch set further implements the optimized version (Figure 4.2 in the
> paper) inside existing reciprocal_div.h. When the divider is even and the
> calculated reciprocal magic number doesn't fit u32, we could reduce the
> required ALU instructions from 4 to 2 or 1 for some cases.
> 
> The advanced version requires more complex calculation to get the
> reciprocal multiplier and other control variables, but then could reduce
> the required emulation operations. It makes sense to use it for JIT divide
> code generation (for example eBPF JIT backends) for which we are willing to
> trade performance of JITed code with that of host.
> 
> v2:
>   - add warning in l == 32 code path. (Song Liu/Jakub)
>   - jit separate insn sequence for l == 32. (Jakub/Edwin)
>   - should use unrestricted operand for mul.
>   - once place should use "1U << exp" instead of "1 << exp".
>           
> Jiong Wang (6):
>   lib: reciprocal_div: implement the improved algorithm on the paper
>     mentioned
>   nfp: bpf: rename umin/umax to umin_src/umax_src
>   nfp: bpf: copy range info for all operands of all ALU operations
>   nfp: bpf: support u16 and u32 multiplications
>   nfp: bpf: support u32 divide using reciprocal_div.h
>   nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h

Applied to bpf-next, thanks everyone!

Reply via email to