On 03/19/2018 01:24 PM, Alexei Starovoitov wrote: > On Sun, Mar 18, 2018 at 12:57:25PM -0700, John Fastabend wrote: >> Currently, if a bpf sk msg program is run the program >> can only parse data that the (start,end) pointers already >> consumed. For sendmsg hooks this is likely the first >> scatterlist element. For sendpage this will be the range >> (0,0) because the data is shared with userspace and by >> default we want to avoid allowing userspace to modify >> data while (or after) BPF verdict is being decided. >> >> To support pulling in additional bytes for parsing use >> a new helper bpf_sk_msg_pull(start, end, flags) which >> works similar to cls tc logic. This helper will attempt >> to point the data start pointer at 'start' bytes offest >> into msg and data end pointer at 'end' bytes offset into >> message. >> >> After basic sanity checks to ensure 'start' <= 'end' and >> 'end' <= msg_length there are a few cases we need to >> handle. >> >> First the sendmsg hook has already copied the data from >> userspace and has exclusive access to it. Therefor, it >> is not necessesary to copy the data. However, it may >> be required. After finding the scatterlist element with >> 'start' offset byte in it there are two cases. One the >> range (start,end) is entirely contained in the sg element >> and is already linear. All that is needed is to update the >> data pointers, no allocate/copy is needed. The other case >> is (start, end) crosses sg element boundaries. In this >> case we allocate a block of size 'end - start' and copy >> the data to linearize it. >> >> Next sendpage hook has not copied any data in initial >> state so that data pointers are (0,0). In this case we >> handle it similar to the above sendmsg case except the >> allocation/copy must always happen. Then when sending >> the data we have possibly three memory regions that >> need to be sent, (0, start - 1), (start, end), and >> (end + 1, msg_length). This is required to ensure any >> writes by the BPF program are correctly transmitted. >> >> Lastly this operation will invalidate any previous >> data checks so BPF programs will have to revalidate >> pointers after making this BPF call. >> >> Signed-off-by: John Fastabend <john.fastab...@gmail.com> > .. >> + >> + page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy)); >> + if (unlikely(!page)) >> + return -ENOMEM; > > I think that's fine. Just curious what order do you see in practice?
At the moment I'm mostly reading headers so this only happens when a header is split across multiple scatterlist elements. In these cases a copy size of less than 4k is good enough. Some of the nginx configurations I have use a max sendfile size of 128kb. So these are larger, but unless we look at the payload we can avoid reading/writing this. If it becomes commonplace we could look at optimizing it. Should be doable without changing the user facing API. > > Acked-by: Alexei Starovoitov <a...@kernel.org> >