Markus wrote:
On Wed, 18 15:13 , Geoff Steckel wrote:
Markus wrote:
Good evening,
I'm setting off for writing prototype code for an imaging
application. For this reason, I'm in need of an extremely fast way to
transport large amounts of UDP data to a userland application. A socket
implementation does not perform very well, which is the
reason why I'm looking for a better solution to minimize processing time.
My initial idea was to already filter the traffic with bpf and only hand
the datagrams (or parts of them) I really want out of kernel space.
Hi,
Could you forward or post the profile data that shows the hot
spot(s) in the kernel? It would be very instructive.
thanks
geoff steckel
Hi Geoff,
this sounds as if you would prefer sockets if possible. Is there
a reason why you would reject using bpf directly?
A couple of reasons, really...
Most important, and overshadowing all the other considerations:
It is very likely that the application will incur more overhead
creating graphics than it will receiving data from the network.
Graphics are very compute-intensive. Therefore, optimizing the
network part of the application is very likely a waste of effort.
If you have data showing that the graphics are cheap and the
network is expensive, then optimizing the network makes sense.
Otherwise, my advice would be to build the application as simply
and cleanly as possible, profile it, and work on the hot spots
which are probably in the graphics code.
Other reasons, less important:
Sockets are far more portable than BPF, even from OS version to
version, much less from OS to OS.
A long time ago and far away, I did a -lot- of performance tuning
with a group making network appliances. We found that the device
drivers were as likely to be the bottleneck as was the socket code.
It's moderately unlikely that there would be a large performance
advantage to BPF, though I haven't tested it recently. The one
place where BPF might be better is the ability to receive more than
one packet per kernel/user transition.
And.... if there is a really bad hot spot or spots in the socket
code, it would be a big favor to everyone if it were exposed for
discussion and possible improvement.
I guess for testing the socket approach you could pretty much take
any UDP client and throw datagrams on it over a gigabit link (load
will be around 30 fps with 3 MB each image frame + headers).
The client system does no disk I/O btw, it's just getting the data from
the socket and, in its simplest form, hands it over to a display.
Unfortunately I got neither some decent computer hardware here at my
flat nor a device that could pump out the amount of packets I want to
produce via eg. tcpreplay (or simply a camera). I will however try to get
some data before I'm heading off for travelling 4 weeks this weekend
if you're interested in it.
Thanks very much! I would note that even if you do not have hardware
sufficient to saturate a system, even 10-20% of saturation would give
useful information.
Now, as a very extreme example of optimization:
If the socket approach doesn't work well for you, and you are
feeling very masochistic, have infinite development time,
and are willing to survive large heavy objects thrown at you
by all sane developers, there is a (insert LARGE
skull-and-crossbones here) utterly non-portable and unclean
method, which I have tried and can work:
map the kernel socket buffer space into the application
take data directly from the kernel memory MBUF chains
add an IOCTL to acknowledge reading and wait for more data
for persons totally beyond reason: add a semaphore-like object
in writable memory shared with the kernel which signals
data consumed and data available without blocking or forcing
a transition to kernel mode from user mode
This can eliminate -all- kernel/user data copying, a large number
of context switches especially under heavy load, and is about
as efficient as possible.
This approach can (approximately) halve the kernel overhead
for an application processing packets.
It is only useful if the amount of CPU per packet is almost nothing.
I hope this is helpful.
geoff steckel