On Sun, Mar 18, 2018 at 12:57:25PM -0700, John Fastabend wrote: > Currently, if a bpf sk msg program is run the program > can only parse data that the (start,end) pointers already > consumed. For sendmsg hooks this is likely the first > scatterlist element. For sendpage this will be the range > (0,0) because the data is shared with userspace and by > default we want to avoid allowing userspace to modify > data while (or after) BPF verdict is being decided. > > To support pulling in additional bytes for parsing use > a new helper bpf_sk_msg_pull(start, end, flags) which > works similar to cls tc logic. This helper will attempt > to point the data start pointer at 'start' bytes offest > into msg and data end pointer at 'end' bytes offset into > message. > > After basic sanity checks to ensure 'start' <= 'end' and > 'end' <= msg_length there are a few cases we need to > handle. > > First the sendmsg hook has already copied the data from > userspace and has exclusive access to it. Therefor, it > is not necessesary to copy the data. However, it may > be required. After finding the scatterlist element with > 'start' offset byte in it there are two cases. One the > range (start,end) is entirely contained in the sg element > and is already linear. All that is needed is to update the > data pointers, no allocate/copy is needed. The other case > is (start, end) crosses sg element boundaries. In this > case we allocate a block of size 'end - start' and copy > the data to linearize it. > > Next sendpage hook has not copied any data in initial > state so that data pointers are (0,0). In this case we > handle it similar to the above sendmsg case except the > allocation/copy must always happen. Then when sending > the data we have possibly three memory regions that > need to be sent, (0, start - 1), (start, end), and > (end + 1, msg_length). This is required to ensure any > writes by the BPF program are correctly transmitted. > > Lastly this operation will invalidate any previous > data checks so BPF programs will have to revalidate > pointers after making this BPF call. > > Signed-off-by: John Fastabend <john.fastab...@gmail.com> .. > + > + page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy)); > + if (unlikely(!page)) > + return -ENOMEM;
I think that's fine. Just curious what order do you see in practice? Acked-by: Alexei Starovoitov <a...@kernel.org>