Re: [ovs-dev] Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)

Eitan Eliahu Fri, 08 Aug 2014 09:25:42 -0700

Hi Sam,
Please find inline.

>Do you mean we will no longer use nl_sock_transact_multiple in userspace for 
>these DPIF transactions?
No, we will use nl_sock_transact_multiple, but nl_sock_transact_multiple will 
be implemented through the call of DeviceIOControl() system call rather than se 
series of WriteFile()/ReadFile() pairs.


>[QUOTE]>You mean, whenever, say, a Flow dump request is issued, in one reply 
>to give back all flows?>
>Not necessarily. I meant that the driver does not have to maintain the state 
>of the dump command.
>Each dump command sent down to the driver would be self-contained. [/QUOTE] We 
>currently have this in our implementation. The only thing 'left' would be the 
>>fact that we provide all the output buffer for dump at once. The userspace 
>can read sequentially from it. Unless there is a reason to write sequentially 
>from the >kernel to the userspace, and wait for the userspace to read, I think 
>that how we have this one is ok.
Sound good, We can leverage your implementation.

>[QUOTE]Yes, these are OVS events that are placed in a custom queue.
>There is a single Operating System event associated with the global socket 
>which collects all OVS events.
>It will be triggered through a completion of a pending I/O request in the 
>driver.[/QUOTE] I used to be a bit confused of your implementation in OvsEvent 
>and >OvsUser. Perhaps this discussion would clarify a bit more things. :) Ok, 
>so we'll hold OVERLAPPED structs in the kernel, as events. What kind of IRP 
>requests would be >returned as "pending" in the kernel? Requests coming as 
>"nl_sock_recv()" on the multicast groups?
>Will there be multiple multicast groups used? or all multicast operations 
>would queue events on the same event queue, where all the events are read from 
>the same >part of code in userspace?
Currently there are two IOCTLs implanted in the driver. One which just read an 
event from the event queue and is called synchronously. The other one which is 
used just for the purpose of signaling and it is always pended in the driver. 
Once the pended IRP is completed, the event in the overlapped structure is 
signaled and the user mode will read the event queue synchronously (through the 
call of nl_sock_recv(). 
nl_sock_recv() always returns immediately (it has a wait parameter but it is 
set to false).

>How exactly are events queued by the kernel associated with the userspace? I 
>mean, how do you register a "nic connected" event so that when an event 
>happens, >you know you need to update userspace data for a nic, not do 
>something else. Would there be IDs stored in the OvsEvent structs that would 
>specify what kind of >events they are? Would we also need context data 
>associated with these events?
This should be no different than the current implementation. These events are 
generated when NDIS calls the switch extension callback on switch port 
creation/deletion/link up/down etc..

>[QUOTE]>However, I think we need to take into account the situation where the 
>userspace might be providing a smaller buffer than it is the total to read. 
>Also, I >think the "dump" mechanism requires it.
>I (want) to assume that each transaction is self-contained which means that 
>the driver should not maintain a state of the transaction. Since, we will be 
>using an IOCTL for that transaction the user mode buffer length will be 
>specified in the command itself.
>All Write/Read dump pairs are replaced with a single IOCTL call.[/QUOTE] That 
>still did not answer my question :) You mean to use a very large read buffer, 
>so that >you would be able to read all in one single operation? I am more 
>concerned here about flow dumps, because you may not know whether you need an 
>1024 bytes >buffer or an 10240 byes buffer, or an 102400 bytes buffer, or etc.
Yes, I looked on this issue yesterday with help of Ben. It seems that the Dump 
function allocate an initial 1024 byte buffer. In turn nl_sock_receive__() is 
called with another large buffer allocated on the stuck. If the data returned 
from the driver exceeds the initial allocated buffer, the one on the stuck will 
be copied to a new increased sized buffer in the ofbuf.


>So I do not see how a DeviceIoControl operation could do both the 'write' and 
>the 'read' part for the dump.
We should probably modify the implementation of nl_dump_recv() , so it will 
issue DeviceIOControl() call. In input buffer we need to pass down information 
about the "offset" of the current transaction so the driver will know from 
which position it needs to start the dump. 

>If you pass to the DeviceIoControl a buffer length = 8000, and the flow dump 
>reply buffer is 32000 bytes, you need to do additional reads AND maintain 
>state in the >kernel (e.g. offset in the kernel read buffer).
I would like to hold the "offset" in user mode (probably add a field in nl_dump 
structure for WIN32) 

> [QUOTE]o) I believe we shouldn't use the netlink overhead (nlmsghdr, 
> genlmsghdr, attributes) when not needed (say, when registering a KEVENT 
> notification) , >and, if w>e choose not to use netlink protocol always, we 
> may need a way to differentiate between netlink and non-netlink requests.
>Possible, as phase for optimization[/QUOTE] Not necessarily: if we can make a 
>clear separation in code between netlink and non-netlink km-um, not using 
>netlink >where we don't need to might save us some development & 
>maintainability effort - both in kernel and in userspace. Because otherwise 
>we'd need to turn non->netlink messages of (windows) userspace code into 
>netlink messages.
My concern would be that we should not break the Netlink protocol as it was 
selected in order to maintain user/kernel mode interoperability.

Eitan

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)

Reply via email to