On 28/09/2018 14:36, Edward Cree wrote:
> On 26/09/18 23:16, Jiong Wang wrote:
>> On 22/08/2018 20:00, Edward Cree wrote:
>>> In the future this idea may be extended to form use-def chains.
>>
>>   1. instruction level use->def chain
>>
>>      - new use->def chains for each instruction. one eBPF insn could have two
>>        uses at maximum.
> I was thinking of something a lot weaker/simpler, just making
>     ld rX, rY
>  copy rY.parent into rX.parent and not read-mark rY (whereas actual
>  arithmetic, pointer deref etc. would still create read marks).

Thanks for the feedback Edward.

> But what you've described sounds interesting; perhaps it would also
>  help later with loop-variable handling?

Haven't considered how to use this for loop-variable handling, guess you mean applying what I have described to your previous loop detection RFC? I will look
into your RFC later.

At the moment the design of the use->def chain is mainly to optimize 32-bit
code-gen. I was about to satisfied with a local implementation and to share it
to ML for further discussion. However, when manually check the optimization
result on testcase with medium size (~1000 eBPF insns) and proper complexity
(make sure path prunes etc are triggered inside verifier), I found the code-gen
doesn't meet my expectation.

For example, for the following sequence, insn at 25 should operate on full-64
bit but I found it is marked as 32-bit safe.

  25:    r7 = 1
  26:    if r4 > r8 goto +1200 <L>
  27:    r1 = *(u8 *)(r1 + 0)
  28:    r1 &= 15
  29:    r7 = 1
  ...

L:
  1227:  r0 = r7
  1228:  exit

As described at previous email, the algorithm assume all insns are 32-bit safe
first, then start to insns back to "64-bit" if there is any 64-bit use found
for a insn.

Insn 25 is defining r7 which is used at the 1227 where its value propagated to
r0 and then r0 is implicitly used at insn 1228 as it is a exit from main
function to external.

For above example, as we don't know the external use of r0 at 1228 (exit from
main to external), so r0 is treated as 64-bit implicit use. The define is at
1227, so insn 1227 is marked as "64-bit". The "64-bit" attribute should
propagate to source register operand through register move and arithmetic, so r7 at insn 1227 is a "64-bit" use and should make its definition instruction,
insn 25, marked as "64-bit". This is my thinking of how insn 25 should be
marked.

Now this hasn't happened. I am still debugging the root cause, but kind of feel "64-bit" attribute propagation is the issue, it seems to me it can't be nicely integrated into the existing register read/write propagation infrastructure. For example, for a slightly more complex sequence which is composed of three states:

State A
  ...
  10: r6 = *(u32 *)(r10 - 4)
  11: r7 = *(u32 *)(r10 - 8)
  12: *(u64 *)(r10 - 16) = r6
  13: *(u64 *)(r10 - 24) = r7

State B
  14: r6 += 1
  15: r7 += r6
  16: *(u32 *)(r10 - 28) = r7

State C
  ...
  17: r3 += r7
  18: r4 = 1
  19: *(u64 *)(r10 - 32) = r3
  20: *(u64 *)(r10 - 40) = r4

State A is parent of state B which is parent of state C.

Inside state C, at insn 20, r4 is a 64-bit read/use, so its define at 18 is
marked as "64-bit". There is no register source at 18, so "64-bit" attribute
propagation is stopped.

Then at insn 19, r3 is a 64-bit read/use, so its define at 17 is marked as
"64-bit" read/use. Insn 17 has two register sources, r3 and r7, they become
"64-bit" now, and their definition should be marked as "64-bit".

Now if the definition of r3 or r7 comes from parent state, then the parent state
should receive a "REG_LIVE_READ64", this is necessary if later another path
reaches state C and triggers prune path, for which case that path should know
there is "64-bit" use inside state C on some registers, and should use this
information to mark "64-bit" insn.

If the definition of r3 or r7 is still inside state C, we need to keep walking up the instruction sequences, and propagate "64-bit" attribute upward until it
goes beyond the state C.

The above propagation logic is quite different from existing register read/write propagation. For the latter, a write just screen up all following read, and a read would propagate directly to its parent is there is not previous write, no
instruction analysis is required.

I am just describing what I have run into and trying to resolve, any thoughts
and suggestions are appreciated.

Regards,
Jiong

Reply via email to