Eitan, Thanks so much for writing this up. This should clarify the questions that the folks had during the IRC meeting.
Alin, Pls. feel free to send out a writeup if you have anything to discuss regarding the changes in dpif-linux.c. If not, if you can cleanup dpif-linux.c, and submit it with the changes/interface that was working with the Cloudbase kernel implementation, that should also be a major step forward. We can take up how to make the changes in dpif-linux.c to fit the (efficient) I/O model that Eitan has described. thanks, Nithin On Aug 6, 2014, at 11:15 AM, Eitan Eliahu <elia...@vmware.com> wrote: > > Hello all, > Here is a summary of our initial design. Not all areas are covered so we > would be glad to discuss anything listed here and any other code/features we > could leverage. > Thanks! > Eitan > > > A. Objectives: > [1] Create a NetLink (NL) driver interface for Windows which interoperates > with > the OVS NL user mode. > [2] User mode code should be mostly cross platform with some minimal changes > to > support specific Windows OS calls. > [3] The Driver should not have to maintain a state or resources for > transaction > or dumps > [4] Reduce the number of system calls: User mode NL code should use Device > IOCTL > system call to send an NL commands and to receive the associated NL reply > in the same system call, whenever possible (*). > [5] An event may be associated with a NL socket I/O request to signal a > completion for an outstanding receive operation on the socket. > (For simplicity a single outstanding I/O request could be associated > with > a socket for the signaling purpose) > > (*) We assume Multiple NL transactions for the same socket can never be > interleaved > > B. Netlink operation types: > There are four types of interactions carried by processes through the NL > layer: > > [1] Transaction based DPIF primitives: these DPIF commands are mapped to > nl_sock_transact NL interface to nl_sock_transact_multiple. The > transaction > based command creates an ad hoc socket and submits a synchronous device > I/O to the driver. The driver constructs the NL reply and copies it to > the > output buffer of the IRP representing the I/O transaction. > (Provisioning of transaction based command can be brought up and exercised > through the ovs-dpctl command in parallel to the exsisting DPIF device) > > > [2] State aware DPIF Dump commands: port and flow dump calls the following NL > interfaces: > a) nl_dump_start() > b) nl_dump_next() > c) nl_dump_done() > > With the exception of nl_dump_start these NL primitives are based on a > synchronous IOCTL system call rather than Write/Read. Thus, the > driver > does not have to maintain any dump transaction outstanding request nor > need to allocate any resources for it. > > [3] UpCall Port/PID/Unicast socket: > The driver maintains per socket queue for all packets which have no > matching flow in the flow table. The socket has a single overlapped > (event) > structure which will be signalled through a completion of a pending I/O > request sent by user mode on subscription (similar to the current > implementation). When dpif_recv_wait is called, the event associated > with > the pending I/O request is passed poll_fd_wait_event inorder to wake > the > thread which polls the port queue. > > dpif_recv calls nl_socket_recv which in turn drains the queue > maintained by the kernel in a synchronous fashion (through the use of > system ioctl call). The overlapped structure is rearmed when the > recv_set > DPIF callback function is called. > > [4] Event notification / NL multicast subscription: > An event (such as port addition/deletion link up/down) are propagated from > the kernel to user mode through a subscription of a socket to a > multicast > group (nl_sock_join_mcgroup()) and a synchronous Receive > (nl_sock_recv()) > for retrieving the events. The driver maintains a single event queue for > all events. Similar to the UpCall mechanism, a user mode process keeps > an > outstanding I/O request in the driver which is triggered whenever a new > event is generated. The event associated with the overlapped structure > of > the socket is passed to poll_fd_wait_event() whenever > dpif_port_poll_wait() > callback function is called. dpif_poll() will drain the event queue > through > the call of nl_sock_recv(). > > C. Implementation work flow: > The driver creates a device object which provides a NetLink interface for > user > mode processes. During the development phase this device is created in > addition > to the existing DPIF device. (This means that the bring-up of the NL based > user > mode can be done on a live kernel with resident DPs, ports and flows) > All transaction and dump based DPIF functions could be developed and brought > up > when the NL device is a secondary device (ovs-dpctl show and dump XXX should > work). After the initial phase is completed (i.e. all transaction and dump > based > DPIF primitives are implemented), the original device interface will be > removed > and packet and event propagation path will be brought up (driven by > vswicth.exe) > > [1] Socket creation > Since PID should be allocated on a system wide basis and unique across all > processes, the kernel > assigns the PID for a newly created socket. A new IOCTL command > OVS_GET_PID returns the PID to a user > mode client to be associated with the socket. > > [2] Detailed description > nl_sock_transact_multiple() which calls into a series of nl_sock_send__() > and nl_sock_recv__(). These can be implemented using ReadFile() and > WriteFile() > or an ioctl modeled on a transaction which does both read and write. One > thing > though is that, nl_sock_transact_multiple() might have to be modified to > the > series of nl_sock_send__() and nl_sock_recv__(), rather than doing a bunch > of > sends first and then doing the recvs. This is because Windows may not > preserve > message boundaries when we do the recv. > > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev