Evgeniy, Some good ideas in there. You should talk/sync to Max Krasnyansky (CCed). I think theres a lot of stuff you are doing that he is trying to do as well with the new tuntap that he is working on. I think put together - some cool ideas can be implemented.
cheers, jamal On Thu, 2005-14-07 at 14:21 +0400, Evgeniy Polyakov wrote: > Hello, network developers. > > I'm pleased to announce first pre-alpha version > of the Zero-copy sniffer "device". > It acts as packet socket, i.e. gets all packets > using prot_hook.func(), but never copy it. > > Basic idea behind zero-copy is remapping of the > physical pages where skb->data lives to the > userspace process. > > According to my tests, which can be found commented > in the code (packet_mmap()), > remapping of one page gets from 5 upto 20 > times faster than copying the same amount of data > (i.e. PAGE_SIZE). > > Since current VM code requires PTE to be unmapped, > when remapping, but only exports unmap_mapping_range() > and __flush_tlb(), I used them, although they are quite > heavy monsters. > It also required mm->mmap_sem to be held, > so I placed main remapping code into workqueue. > > skbs are queued in prot_hook.func() and then workqueue > is being scheduled, where skb is unlinked and remapped. > It is not freed there, as it should be, since userspace > will never found real data then, but instead > some smart algo should be investigated to defer skb freeing, > or simple defering using timer and redefined skb destructor. > It also should remap several skbs at once, so rescheduling > would not appeared very frequently. > First mapped page is information page, where offset in page > of the skb->data is placed, so userspace can detect > where actual data lives on the next page. > > Such schema is very suitable for applications that > do not require the whole data flow, but only select some data > from the flow, based on packet content. > I'm quite sure it will be slower than copying for small packets, > so this two ideas must be combined to achieve > the maximum sniffer performance. > > Current code is basically proof-of-concept, so > it has tons of dirty quirks, and I'm not a VM hacker, > so I would gladly listen your thoughts about the code and idea itself. > > Attached files: > af_tlb.[ch] - kernel side sniffer implementation. > tlb_test.c - userspace "sniffer". > Makefile - build kernel side with "all" target and userspace > with "test" target. > > Thank you. > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html