Eitan,
Thanks so much for writing this up. This should clarify the questions that the 
folks had during the IRC meeting.

Alin,
Pls. feel free to send out a writeup if you have anything to discuss regarding 
the changes in dpif-linux.c. If not, if you can cleanup dpif-linux.c, and 
submit it with the changes/interface that was working with the Cloudbase kernel 
implementation, that should also be a major step forward.

We can take up how to make the changes in dpif-linux.c to fit the (efficient) 
I/O model that Eitan has described.

thanks,
Nithin


On Aug 6, 2014, at 11:15 AM, Eitan Eliahu <elia...@vmware.com>
 wrote:

> 
> Hello all,
> Here is a summary of our initial design. Not all areas are covered so we 
> would be glad  to discuss anything listed here and any other code/features we 
> could leverage.
> Thanks!
> Eitan
> 
> 
> A. Objectives:
> [1] Create a NetLink (NL) driver interface for Windows which interoperates 
> with
>    the OVS NL user mode.
> [2] User mode code should be mostly cross platform with some minimal changes 
> to 
>    support specific Windows OS calls.
> [3] The Driver should not have to maintain a state or resources for 
> transaction
>    or dumps
> [4] Reduce the number of system calls: User mode NL code should use Device 
> IOCTL
>    system call to send an NL commands and to receive the associated NL reply
>       in the same system call, whenever possible (*).
> [5] An event may be associated with a NL socket I/O request to signal a 
>    completion for an outstanding receive operation on the socket. 
>       (For simplicity a single outstanding I/O request could be associated 
> with
>       a socket for the signaling purpose)
> 
> (*) We assume Multiple NL transactions for the same socket can never be 
>    interleaved   
>       
> B. Netlink operation types:
> There are four types of interactions carried by processes through the NL 
> layer:
> 
> [1] Transaction based DPIF primitives: these DPIF commands are mapped to 
>    nl_sock_transact NL interface to  nl_sock_transact_multiple. The 
> transaction 
>       based command creates an ad hoc socket and submits a synchronous device 
>       I/O to the driver. The driver constructs the NL reply and copies it to 
> the
>       output  buffer of the IRP representing the I/O transaction.
>    (Provisioning of transaction based command can be brought up and exercised 
>        through the ovs-dpctl command in parallel to the exsisting DPIF device)
>       
> 
> [2] State aware DPIF Dump commands: port and flow dump calls the following NL 
>    interfaces:
>    a) nl_dump_start()
>    b) nl_dump_next()
>    c) nl_dump_done() 
>       
>       With the exception of nl_dump_start these NL primitives are based on a
>       synchronous     IOCTL system call rather than Write/Read. Thus, the 
> driver
>       does not have to maintain any dump transaction outstanding request nor 
>       need to allocate any resources for it.
> 
> [3] UpCall Port/PID/Unicast socket: 
>    The driver maintains per socket queue for all packets which have no 
>       matching flow in the flow table. The socket has a single overlapped 
> (event)
>       structure which will be signalled through a completion of a pending I/O 
>       request sent by user mode on subscription (similar to the current 
>       implementation). When dpif_recv_wait is called, the event associated 
> with 
>        the pending I/O request is passed poll_fd_wait_event inorder to wake 
> the
>        thread which polls the port queue.
>       
>       dpif_recv calls nl_socket_recv which in turn drains the queue 
>       maintained by the kernel in a synchronous fashion (through the use of 
>       system ioctl call). The overlapped structure is rearmed when the 
> recv_set 
>       DPIF callback function is called.
> 
> [4] Event notification / NL multicast subscription:
>    An event (such as port addition/deletion link up/down) are propagated from
>       the kernel to user mode through a subscription of a socket to a 
> multicast 
>       group (nl_sock_join_mcgroup()) and a synchronous Receive 
> (nl_sock_recv()) 
>       for retrieving the events. The driver maintains a single event queue for
>       all events. Similar to the UpCall mechanism, a user mode process keeps 
> an 
>       outstanding I/O request in the driver which is triggered whenever a new 
>       event is generated. The event associated with the overlapped structure 
> of
>       the socket is passed to poll_fd_wait_event() whenever 
> dpif_port_poll_wait()
>       callback function is called. dpif_poll() will drain the event queue 
> through 
>       the call of nl_sock_recv().
> 
> C. Implementation work flow:
> The driver creates a device object which provides a NetLink interface  for 
> user 
> mode processes. During the development phase this device is created in 
> addition 
> to the existing DPIF device. (This means that the bring-up of the NL based 
> user 
> mode can be done on a live kernel with resident DPs, ports and flows) 
> All transaction and dump based DPIF functions could be developed and brought 
> up 
> when the NL device is a secondary device (ovs-dpctl show and dump XXX should 
> work). After the initial phase is completed (i.e. all transaction and dump 
> based 
> DPIF primitives are implemented), the original device interface will be 
> removed 
> and packet and event propagation path will be brought up (driven by 
> vswicth.exe)
> 
> [1] Socket creation
>    Since PID should be allocated on a system wide basis and unique across all 
> processes, the kernel
>    assigns the PID for a newly created socket. A new IOCTL command 
> OVS_GET_PID returns the PID to a user
>    mode client to be associated with the socket.  
>       
> [2] Detailed description
>    nl_sock_transact_multiple() which calls into a series of nl_sock_send__()
>    and nl_sock_recv__(). These can be implemented using ReadFile() and 
> WriteFile()
>    or an ioctl modeled on a transaction which does both read and write. One 
> thing
>    though is that, nl_sock_transact_multiple() might have to be modified to 
> the
>    series of nl_sock_send__() and nl_sock_recv__(), rather than doing a bunch 
> of
>    sends first and then doing the recvs. This is because Windows may not 
> preserve
>    message boundaries when we do the recv.
> 
> 

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to