> On 18 Sep 2015, at 10:10 pm, Stuart Henderson <st...@openbsd.org> wrote: > > On 2015/09/18 13:36, Martin Pieuchot wrote: >> On 18/09/15(Fri) 15:55, David Gwynne wrote: >>> hashing bits of packet headers to tie connections to particular >>> physical interfaces within a trunk turns out to be fairly expensive. >>> in my very unscientific testing it is about 20% of the cost of udp >>> traffic generated with tcpbench -u. >>> >>> we could tune or change the hash. eg, going from siphash 2 4 to >>> siphash 1 1 halves the overhead of hashing. however, it occurred >>> to me that sometimes we already know about connections. why not >>> reuse that info if it is available? >> >> Why not, but I'd argue that's orthogonal to the fact that siphash >> 2 4 has a high cost. >> >>> this lets pf embed the state id into the mbuf as a "flow id" so >>> other subsystems can use it. eg, trunk can pull it out and use it. >>> >>> this diff steals the pad field in mbuf packet headers and uses it >>> to embed a flow id. it makes pf fill it in, and trunk use it. this >>> avoids the cost of hashing in trunk altogether. >>> >>> it could be used in other places too, eg, picking an upstream when >>> we're going multipath routing. >> >> I've been through RFC 2992 again and indeed I believe we could use that. > > as far as trunk(4) goes, we're ok from the perspective of 802.3-2000 > section 43.2.1 says > > f)Frame ordering must be maintained for certain sequences of frame > exchanges between MAC Clients (known as conversations, see 1.4). The > Distributor ensures that all frames of a given conversation are passed > to a single port. For any given port, the Collector is required to pass > frames to the MAC Client in the order that they are received from that > port. The Collector is otherwise free to select frames received from the > aggregated ports in any order. Since there are no means for frames to be > mis-ordered on a single link, this guarantees that frame ordering is > maintained for any conversation. > > so we're OK from that perspective. > >> What about carp(4) and bridge(4)? > > I don't think it applies to bridge, load balancing is done at a lower > level there (i.e. you'd have trunk as a member of a bridge if you wanted > to balance across links). > > Probably the same for carp, there might be some opportunity, but > it's already a bit of a minefield to have things working nicely with > pfsync/defer in various different situations.
so are we going to go ahead with this? can i put it in? do we need a way to change the load balancing algorithm trunk uses? dlg