Re: Netgraph performance question

Julian Elischer Fri, 04 Feb 2005 16:42:52 -0800

Guy Helmer wrote:

 A while back, Maxim Konovalov made a commit to usr.sbin/ngctl/main.c
 to increase its socket receive buffer size to help 'ngctl list' deal
 with a big number of nodes, and Ruslan Ermilov responded that setting
 sysctls net.graph.recvspace=200000 and net.graph.maxdgram=200000 was
 a good idea on a system with a large number of nodes.

 I'm getting what I consider to be sub-par performance under FreeBSD
 5.3 from a userland program using ngsockets connected into ng_tee to
 play with packets that are traversing a ng_bridge, and I finally have
 an opportunity to look into this. I say "sub-par" because when we've
 tested this configuration using three 2.8GHz Xeon machines with
 Gigabit Ethernet interfaces at 1000Mbps full-duplex, we obtained peak
 performance of a single TCP stream of about 12MB/sec through the
 bridging machine as measured by NetPIPE and netperf.

that's not bad if you are pushing everything through userland.. That's quite expensive, and the scheduling overheads need to be taken into account too.

 I'm wondering if bumping the recvspace should help, if changing the
 ngsocket hook to queue incoming data should help, if it would be best
 to replace ngsocket with a memory-mapped interface, or if anyone has
 any other ideas that would help performance.

Netgraph was designed to be a "lego for link layer stuff" where link laer stuff was considered to be WAN protocols etc.

In particualr the userland interface was written with an eye to prototyping and debugging and doesn't take any special care to be fast. (though I don;t know how you could be faster going to userland).

Since then people have broadenned its use considerably, and questionns of its performance have become quite regular.

It wasn't designed to be super fast, though it is not bad considerring what it does. There is however a push to look at performance so it would eb interresting to see in more detail what you are doing. in particular, what are you doing in userland? might it make sense to make your own custom netgraph node that does exaclty what you want in the kernel?


 Thanks in advance for any advice, Guy Helmer

I have considderred a memory mapper interface that would bold onto ng_dev.

I have done an almost identical interface once before (1986->1992)

There would have to be several commands supported.

define bufferspace size (ioctl/message) mmap buffer space (mmap) allocate bufferspace to user (size) (returns buffer ID) free bufferspace (ID) getoffset (ID) (returns offset in bufferspace) writebuffer(ID, hook, maxmbufsize) pick up the buffer, put it into mbufs (maybe as external pointers) and send out hook in question.

Incoming data would be written into buffers (a cpu copy would be needed) and the ID added to a list of arrived IDs. In addition you need a way to notify a listenning thread/process of arrived IDs.

In my original system the listenning process had a socket open with a particular protocol family and waited for N bytes. when the data arrived, the socket returned the buffer ID, followed by N-sizeof(ID) bytes from th header of the packet so that the app could check a header and see if it was interrested.

In later version s it used a recvmesg() call and the metadata was in the form of a protocol specific structure received in parallel to the actual data copied.

Arrived IDs/buffers were 'owned' by N owners where N was the number of open listenner sockets. each listenner had to respond to the message by 'freeing' the ID if it wan't interrested.. closing the socket freed all IDs still owned by it. closing the file did the same...

I forget some of the details.

I guess in this version, instead of sockets we could use hooks on the mmap node and we could use ng sockets to connect to them..

The external data 'free' method in th embuf could decrement teh ID reference count and actually free it if it reached 0 (when all parts ahd been transmitted?) The userland process woudl free it immediatly after doing the 'send this' command. the reference counts owned by the mbuffs would stop it from being freed until the packets were sent.

In our previous version, we ahd a disk/vfs interface too and there was a "write this to filedescriptor N" and "write this to raw disk X at offset Y" command too.. the disk would own a reference until the data was written of course.. There was also a "read from raw disk X at offsett Y into buffer ID" command. you had to own the buffer already for it to work..

in 1987 we were saturating several ethernets off disk with this with 5% cpu load :-)

disk->[dma]->mem->[dma]->ethernet

Since machines are now hundreds of times faster (30MHz 68010 with 32 bit mem bus vs 3GHz 64bit bus machine) some of this doesn't make sense any more, but it was an achievement at the time.

just an idea.

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Netgraph performance question

Reply via email to