On Fri, Sep 23, 2011 at 11:04:43AM -0700, Jesse Gross wrote:
> On Fri, Sep 23, 2011 at 9:16 AM, Ben Pfaff <b...@nicira.com> wrote:
> > On Thu, Sep 22, 2011 at 03:59:30PM -0700, Jesse Gross wrote:
> >> On Thu, Sep 22, 2011 at 1:11 PM, Ben Pfaff <b...@nicira.com> wrote:
> >> > On Mon, Sep 19, 2011 at 03:00:08PM -0700, Jesse Gross wrote:
> >> >> Currently it is possible for a client on a single port to generate
> >> >> a huge number of packets that miss in the kernel flow table and
> >> >> monopolize the userspace/kernel communication path. ??This
> >> >> effectively DoS's the machine because no new flow setups can take
> >> >> place. ??This adds some additional fairness by separating each upcall
> >> >> type for each object in the datapath onto a separate socket, each
> >> >> with its own queue. ??Userspace then reads round-robin from each
> >> >> socket so other flow setups can still succeed.
> >> >>
> >> >> Since the number of objects can potentially be large, we don't always
> >> >> have a unique socket for each. ??Instead, we create 16 sockets and
> >> >> spread the load around them in a round robin fashion. ??It's 
> >> >> theoretically
> >> >> possible to do better than this with some kind of active load balancing
> >> >> scheme but this seems like a good place to start.
> >> >>
> >> >> Feature #6485
> >> >
> >> > I'm not sure that we should increment last_assigned_upcall from
> >> > dpif_linux_execute__() as well as from dpif_flow_put(). ??Most
> >> > dpif_flow_put() calls are right after a dpif_linux_execute__() for the
> >> > same flow (manually sending the first packet), so this will tend to
> >> > use only the even-numbered upcall sockets.
> >> >
> >> > Also, manually sending the first packet of a flow with one
> >> > upcall_sock, then setting up a kernel flow that uses a different
> >> > upcall_sock could mean that userspace sees packet reordering, if it
> >> > checks the upcall_socks in the "wrong" order.
> >> >
> >> > One way to avoid these problems would be to choose the upcall_sock
> >> > using a hash of the flow instead of a counter.
> >>
> >> I think that probably makes sense and at the same time it addresses
> >> Pravin's concerns about assigning ports to sockets in a purely
> >> round-robin fashion.
> >>
> >> This is easier to do correctly if we have "struct flow" here instead
> >> of the Netlink attributes. ??It actually seems more correct to do the
> >> conversion in dpif-linux anyways. ??In theory, it could result in some
> >> unnecessary conversions for things that are bouncing back and forth
> >> between userspace and kernel but I think in practice we end up doing
> >> the conversions for most operations anyways. ??Is there a reason to
> >> layer it the way it is?
> >
> > It's layered this way to support decoupling the kernel and user flow
> > keys, as described in commit 856081f683:
> >
> > ?? ??datapath: Report kernel's flow key when passing packets up to 
> > userspace.
> >
> > ?? ??One of the goals for Open vSwitch is to decouple kernel and userspace
> > ?? ??software, so that either one can be upgraded or rolled back 
> > independent of
> > ?? ??the other. ??To do this in full generality, it must be possible to 
> > change
> > ?? ??the kernel's idea of the flow key separately from the userspace 
> > version.
> >
> > ?? ??This commit takes one step in that direction by making the kernel 
> > report
> > ?? ??its idea of the flow that a packet belongs to whenever it passes a 
> > packet
> > ?? ??up to userspace. ??This means that userspace can intelligently figure 
> > out
> > ?? ??what to do:
> >
> > ?? ?? ?? - If userspace's notion of the flow for the packet matches the 
> > kernel's,
> > ?? ?? ?? ?? then nothing special is necessary.
> >
> > ?? ?? ?? - If the kernel has a more specific notion for the flow than 
> > userspace,
> > ?? ?? ?? ?? for example if the kernel decoded IPv6 headers but userspace 
> > stopped
> > ?? ?? ?? ?? at the Ethernet type (because it does not understand IPv6), 
> > then again
> > ?? ?? ?? ?? nothing special is necessary: userspace can still set up the 
> > flow in
> > ?? ?? ?? ?? the usual way.
> >
> > ?? ?? ?? - If userspace has a more specific notion for the flow than the 
> > kernel,
> > ?? ?? ?? ?? for example if userspace decoded an IPv6 header but the kernel
> > ?? ?? ?? ?? stopped at the Ethernet type, then userspace can forward the 
> > packet
> > ?? ?? ?? ?? manually, without setting up a flow in the kernel. ??(This case 
> > is
> > ?? ?? ?? ?? bad from a performance point of view, but at least it is 
> > correct.)
> >
> > ?? ??This commit does not actually make userspace flexible enough to handle
> > ?? ??changes in the kernel flow key structure, although userspace does now
> > ?? ??have enough information to do that intelligently. ??This will have to 
> > wait
> > ?? ??for later commits.
> >
> > At the time I thought that we'd want to implement the userspace half
> > of this quickly, but now it's become clear that it's a low priority,
> > so it's fine with me if you want to change the layering for now.
> 
> I don't want to move away from the direction that we plan to go long
> term.  I think that we can achieve both platform-independence and
> userspace/kernel decoupling at the same time but for what I'm trying
> to achieve right now it's not really necessary so I'll wait.
> 
> Pravin suggested tying flows to the socket for their associated in
> port, which is probably the best policy for DoS.  It also solves the
> issues with flow add/execute packet combinations and is easy enough to
> pull out of the Netlink flow key so I'm going to do that instead.

OK, sounds good to me.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to