On Fri, Sep 23, 2011 at 6:49 AM, Michael Mol <mike...@gmail.com> wrote: > On Fri, Sep 23, 2011 at 12:06 AM, Pandu Poluan <pa...@poluan.info> wrote: >> Saw this on the pfSense list: >> >> http://shader.kaist.edu/packetshader/ >> >> anyone interested in trying? > > I see a lot of graphs touting high throughput, but what about latency? > That's the kind of stuff that gets in my way when I'm messing with > things like VOIP. > > My first thought when I saw they were using a GPU for processing was > concerns about latency: > 1) RTT between a video card and the CPU will cause an increase in > latency from doing processing on-CPU. Maybe DMA between the video card > and NICs could help with this, but I don't know. Certainly newer CPUs > with on-die GPUs will have an advantage here. > 2) GPGPU coding favors batch processing over small streams. That's > part of its nature, after all. That means that processed packets would > come out of the GPU side of the engine in bursts. > > They also tout a huge preallocated packet buffer, and I'm not sure > that's a good thing, either. It may or may not cause latency problems, > depending on how they use it. > > They don't talk about latency at all, except for one sentence: > "Forwarding table lookup is highly memory-intensive, and GPU can > acclerate it with both latency hiding capability and bandwidth." > > -- > :wq
While I'm not a programmer at all I have been playing with some CUDA programming this year. The couple of comments below are based around that GPU framework and might differ for others. 1) I don't think the GPU latencies are much different than CPU latencies. A lot of it can be done with DMA so that the CPU is hardly involved once the pointers are set up. Of course it depends on the system but the GPU is pretty close to the action so it should be quite fast getting started. 2) The big deal with GPUs is that they really pay off when you need to do a lot of the same calculations on different data in parallel. A book I read + some online stuff suggested they didn't pay off speed wise until you were doing at least 100 operations in parallel. 3) You do have to get the data into the GPU so for things that used fixed data blocks, like shading graphical elements, that data can be loaded once and reused over and over. That can be very fast. In my case it's financial data getting evaluated 1000 ways so that's effective. For data like a packet I don't know how many ways there are to evaluate that so I cannot suggest what the value would be. None the less it's an interesting idea and certainly offloads computer cycles that might be better used for other things. My NVidia 465GTX has 352 CUDA cores while the GS8200 has only 8 so there can be a huge difference based on what GPU you have available. Just some thoughts, Mark