[dpdk-dev] zero copy of received segmented IP packet

2014-03-30 Thread Yossi Barshishat
Hi,



Assuming I know ahead that all IP segments related to one single IP packet
ID arrive consequently and I need to forward the entire IP payload toward
the application layer.

One way to handle this is using a hash table for reassembly of the packet
data (like the ipv4_reassembly example), another way would be to assume one
single bucket (following the above assumption).



However any means the DPDK provides doesn't enable a zero copy mechanism (it
will be required to copy the segments payloads into one larger buffer).



Does anybody has any idea regarding a method to control the place where each
part of the packet will be written to?

e.g. allocating the first segment regularly while the packet data buffer is
set to the maximum packet length (rather than to MTU size), and then reading
n bytes after the start of each following segment into the data buffer.



That way I can forward the app layer the buffer without copying it.



Thanks,







[dpdk-dev] zero copy of received segmented IP packet

2014-03-30 Thread David P. Reed
Yossi -

You may already understand this, but fragments of IP datagrams ("IP packet" is 
non-standard slang that confuses IP fragments - packets - with the end-to-end 
data unit of IP) need to be checksummed together with items from the ?virtual 
header? before delivery to TCP and then userspace.  Also, TCP datagrams can 
overlap each other?s sequence space and also be partially ?old?.  There is no 
rule that says that a later IP datagram cannot transmit the part of the 
sequence-number range of earlier received IP datagrams.  The bytes must be 
identical, of course.

So, for example, if a prior TCP datagram had been received covering sequence 
numbers 504-508, a subsequent TCP segment might cover sequence number 500-535 
(if the sender has not seen the ack up to 508, which can happen for many 
reasons).   504-508 would be covered by the segment?s TCP checksum (along with 
that  segment?s virtual header).  

Whatever you do to handle zero-copy implementation of TCP direct into TCP 
receiver buffers must, for example, be able to deliver bytes 509-535 directly 
into the user buffer, if bytes 504-508 have already been delivered.  Otherwise 
it is a non-standard implementation.

A simpler approach might work with certain sender-stacks  (those that use the 
same ?datagram-boundaries? for retransmission), but hardly all, since the 
standard does not require retransmission on such boundaries.  In the old days, 
terminal concentrators that used telnet over TCP would retransmit larger 
segments than the ?single character? segments in order to reduce the overhead 
of catching up with packets dropped.  It?s dangerous to presume that one?s 
?sending stack? and one?s ?receiving stack? are in the same version of the same 
OS - especially dangerous to promote a technique that fails on certain standard 
cases as a performance improving win.

I suspect that a zero-copy TCP requires that at least sometimes, given 
fragmentation and this ?overlapping sequence number? issue, actual copying, 
especially with fragmentation involved.

So if you are talking about ?almost always zero-copy with certain senders? that 
might make the complexity far less.  Zero-copy fragment assembly only in the IP 
layer is much more doable, but it still requires a copy from the reassembled IP 
datagram into TCP sequence number space.


David P. Reed, Ph.D.
TidalScale, Inc.




On Mar 30, 2014, at 2:52 AM, Yossi Barshishat  wrote:

> Hi,
> 
> 
> 
> Assuming I know ahead that all IP segments related to one single IP packet
> ID arrive consequently and I need to forward the entire IP payload toward
> the application layer.
> 
> One way to handle this is using a hash table for reassembly of the packet
> data (like the ipv4_reassembly example), another way would be to assume one
> single bucket (following the above assumption).
> 
> 
> 
> However any means the DPDK provides doesn't enable a zero copy mechanism (it
> will be required to copy the segments payloads into one larger buffer).
> 
> 
> 
> Does anybody has any idea regarding a method to control the place where each
> part of the packet will be written to?
> 
> e.g. allocating the first segment regularly while the packet data buffer is
> set to the maximum packet length (rather than to MTU size), and then reading
> n bytes after the start of each following segment into the data buffer.
> 
> 
> 
> That way I can forward the app layer the buffer without copying it.
> 
> 
> 
> Thanks,
> 
> 
> 
> 
> 



[dpdk-dev] zero copy of received segmented IP packet

2014-03-30 Thread Yossi Barshishat
Thanks David for the detailed answer.
In fact I am talking about a proprietary UDP datagrams exchange
implementation between two machines.
It is not a standard protocol but a proprietary, and it will be the only
protocol to be exchanged between the both machines.
Performance is important in this application and I hoped there will be a way
to really make it zero-copy.
Following my very specific needs I believe it is more doable than the
general use-case you have described in your answer.
Anyway I am more convinced now that this is not doable using the means that
DPDK (or alternative tools) provides.

Thanks
Yossi


-Original Message-
From: David P. Reed [mailto:david.r...@tidalscale.com] 
Sent: Sunday, March 30, 2014 5:18 PM
To: Yossi Barshishat
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] zero copy of received segmented IP packet

Yossi -

You may already understand this, but fragments of IP datagrams ("IP packet"
is non-standard slang that confuses IP fragments - packets - with the
end-to-end data unit of IP) need to be checksummed together with items from
the "virtual header" before delivery to TCP and then userspace.  Also, TCP
datagrams can overlap each other's sequence space and also be partially
"old".  There is no rule that says that a later IP datagram cannot transmit
the part of the sequence-number range of earlier received IP datagrams.  The
bytes must be identical, of course.

So, for example, if a prior TCP datagram had been received covering sequence
numbers 504-508, a subsequent TCP segment might cover sequence number
500-535 (if the sender has not seen the ack up to 508, which can happen for
many reasons).   504-508 would be covered by the segment's TCP checksum
(along with that  segment's virtual header).  

Whatever you do to handle zero-copy implementation of TCP direct into TCP
receiver buffers must, for example, be able to deliver bytes 509-535
directly into the user buffer, if bytes 504-508 have already been delivered.
Otherwise it is a non-standard implementation.

A simpler approach might work with certain sender-stacks  (those that use
the same "datagram-boundaries" for retransmission), but hardly all, since
the standard does not require retransmission on such boundaries.  In the old
days, terminal concentrators that used telnet over TCP would retransmit
larger segments than the "single character" segments in order to reduce the
overhead of catching up with packets dropped.  It's dangerous to presume
that one's "sending stack" and one's "receiving stack" are in the same
version of the same OS - especially dangerous to promote a technique that
fails on certain standard cases as a performance improving win.

I suspect that a zero-copy TCP requires that at least sometimes, given
fragmentation and this "overlapping sequence number" issue, actual copying,
especially with fragmentation involved.

So if you are talking about "almost always zero-copy with certain senders"
that might make the complexity far less.  Zero-copy fragment assembly only
in the IP layer is much more doable, but it still requires a copy from the
reassembled IP datagram into TCP sequence number space.


David P. Reed, Ph.D.
TidalScale, Inc.




On Mar 30, 2014, at 2:52 AM, Yossi Barshishat 
wrote:

> Hi,
> 
> 
> 
> Assuming I know ahead that all IP segments related to one single IP 
> packet ID arrive consequently and I need to forward the entire IP 
> payload toward the application layer.
> 
> One way to handle this is using a hash table for reassembly of the 
> packet data (like the ipv4_reassembly example), another way would be 
> to assume one single bucket (following the above assumption).
> 
> 
> 
> However any means the DPDK provides doesn't enable a zero copy 
> mechanism (it will be required to copy the segments payloads into one
larger buffer).
> 
> 
> 
> Does anybody has any idea regarding a method to control the place 
> where each part of the packet will be written to?
> 
> e.g. allocating the first segment regularly while the packet data 
> buffer is set to the maximum packet length (rather than to MTU size), 
> and then reading n bytes after the start of each following segment into
the data buffer.
> 
> 
> 
> That way I can forward the app layer the buffer without copying it.
> 
> 
> 
> Thanks,
> 
> 
> 
> 
> 




[dpdk-dev] Core Performance

2014-03-30 Thread Fred Pedrisa
Hi, guys.



What is the expected performance using a 2650 (2.0ghz) per core ? In terms
of packet forwarding with a 82599 ?



-  Small 64b packets ?

-  Large 1540b packets ?



Sincerely,



Fred



[dpdk-dev] Core Performance

2014-03-30 Thread Jayakumar, Muthurajan
Hi, 

http://www.intel.com/content/dam/www/public/us/en/documents/presentation/dpdk-packet-processing-ia-overview-presentation.pdf

Foil # 27 has the forwarding performance for ES-2658 core.
Foil # 7 has the problem statement indicating small packet size.

Thx

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Fred Pedrisa
Sent: Sunday, March 30, 2014 3:01 PM
To: dev at dpdk.org
Subject: [dpdk-dev] Core Performance

Hi, guys.



What is the expected performance using a 2650 (2.0ghz) per core ? In terms of 
packet forwarding with a 82599 ?



-  Small 64b packets ?

-  Large 1540b packets ?



Sincerely,



Fred



[dpdk-dev] RES: Core Performance

2014-03-30 Thread Fred Pedrisa
Hello,

Ok, but the current dpdk code (1.6.0 r0) for FreeBSD is achieving this
current performance ?

Sincerely,

Fred Pedrisa

-Mensagem original-
De: Jayakumar, Muthurajan [mailto:muthurajan.jayakumar at intel.com] 
Enviada em: domingo, 30 de mar?o de 2014 19:27
Para: Fred Pedrisa; dev at dpdk.org
Assunto: RE: [dpdk-dev] Core Performance

Hi, 

http://www.intel.com/content/dam/www/public/us/en/documents/presentation/dpd
k-packet-processing-ia-overview-presentation.pdf

Foil # 27 has the forwarding performance for ES-2658 core.
Foil # 7 has the problem statement indicating small packet size.

Thx

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Fred Pedrisa
Sent: Sunday, March 30, 2014 3:01 PM
To: dev at dpdk.org
Subject: [dpdk-dev] Core Performance

Hi, guys.



What is the expected performance using a 2650 (2.0ghz) per core ? In terms
of packet forwarding with a 82599 ?



-  Small 64b packets ?

-  Large 1540b packets ?



Sincerely,



Fred