> On 18 Sep 2015, at 10:10 pm, Stuart Henderson <st...@openbsd.org> wrote:
> 
> On 2015/09/18 13:36, Martin Pieuchot wrote:
>> On 18/09/15(Fri) 15:55, David Gwynne wrote:
>>> hashing bits of packet headers to tie connections to particular
>>> physical interfaces within a trunk turns out to be fairly expensive.
>>> in my very unscientific testing it is about 20% of the cost of udp
>>> traffic generated with tcpbench -u.
>>> 
>>> we could tune or change the hash. eg, going from siphash 2 4 to
>>> siphash 1 1 halves the overhead of hashing. however, it occurred
>>> to me that sometimes we already know about connections. why not
>>> reuse that info if it is available?
>> 
>> Why not, but I'd argue that's orthogonal to the fact that siphash
>> 2 4 has a high cost. 
>> 
>>> this lets pf embed the state id into the mbuf as a "flow id" so
>>> other subsystems can use it. eg, trunk can pull it out and use it.
>>> 
>>> this diff steals the pad field in mbuf packet headers and uses it
>>> to embed a flow id. it makes pf fill it in, and trunk use it. this
>>> avoids the cost of hashing in trunk altogether.
>>> 
>>> it could be used in other places too, eg, picking an upstream when
>>> we're going multipath routing.
>> 
>> I've been through RFC 2992 again and indeed I believe we could use that.
> 
> as far as trunk(4) goes, we're ok from the perspective of 802.3-2000
> section 43.2.1 says
> 
> f)Frame ordering must be maintained for certain sequences of frame
> exchanges between MAC Clients (known as conversations, see 1.4). The
> Distributor ensures that all frames of a given conversation are passed
> to a single port. For any given port, the Collector is required to pass
> frames to the MAC Client in the order that they are received from that
> port. The Collector is otherwise free to select frames received from the
> aggregated ports in any order. Since there are no means for frames to be
> mis-ordered on a single link, this guarantees that frame ordering is
> maintained for any conversation.
> 
> so we're OK from that perspective.
> 
>> What about carp(4) and bridge(4)?
> 
> I don't think it applies to bridge, load balancing is done at a lower
> level there (i.e. you'd have trunk as a member of a bridge if you wanted
> to balance across links).
> 
> Probably the same for carp, there might be some opportunity, but
> it's already a bit of a minefield to have things working nicely with
> pfsync/defer in various different situations.

so are we going to go ahead with this?

can i put it in? do we need a way to change the load balancing algorithm trunk 
uses?

dlg

Reply via email to