Re: [ovs-dev] [PATCH RFC 2/2] openvswitch: Userspace tunneling.

Pravin Shelar Mon, 13 Oct 2014 14:33:07 -0700

On Thu, Oct 9, 2014 at 2:53 PM, Jarno Rajahalme <[email protected]> wrote:
> Pravin,
>
> Please find my comments below. I did not snip any code to make it easier for 
> you to keep into context while reading this review.
>
Thanks for detailed review.


>   Jarno
>
> On Oct 3, 2014, at 8:24 PM, Pravin B Shelar <[email protected]> wrote:
>
>> Following patch adds support for userspace tunneling. Tunneling
>> needs three more component first is routing table which is configured by
>> user and second is ARP cache which build automatically by snooping arp.
>> And third is tunnel protocol table which list all listening protocols
>> which is populated by vswitchd as tunnel ports are added.
>>
>> Tunneling works as follows:
>> On packet receive vswitchd check if this packet is targeted to tunnel
>> port. If it is then vswitchd inserts tunnel pop action which pops
>> header and sends packet to tunnel port.
>> On packet xmit rather than generating Set tunnel action it generate
>> tunnel push action which has tunnel header data. datapath can use
>> tunnel-push action data to generate header for each packet and
>> forward this packet to output port. Since tunnel-push action
>> contains most of packet header it need to lookup routing table and
>> arp table to build this action.
>>
>> Signed-off-by: Pravin B Shelar <[email protected]>
>> ---
>> Makefile.am                                       |   1 +
>> README-Userspace-Tunneling                        |  81 ++++++++
>> datapath/linux/compat/include/linux/openvswitch.h |  10 +
>> debian/openvswitch-common.docs                    |   1 +
>> lib/automake.mk                                   |   9 +-
>> lib/dpif-netdev.c                                 | 125 +++++++++++-
>> lib/dpif.c                                        |  25 +++
>> lib/dpif.h                                        |   4 +
>> lib/netdev-bsd.c                                  |   3 +
>> lib/netdev-dpdk.c                                 |   3 +
>> lib/netdev-dummy.c                                |   3 +
>> lib/netdev-linux.c                                |   3 +
>> lib/netdev-provider.h                             |   9 +
>> lib/netdev-vport.c                                | 186 +++++++++++++++--
>> lib/netdev.c                                      |  34 ++++
>> lib/netdev.h                                      |   7 +
>> lib/odp-execute.c                                 |   2 +
>> lib/odp-util.c                                    | 118 +++++++++++
>> lib/odp-util.h                                    |   4 +
>> lib/ofpbuf.h                                      |   8 +
>> lib/packets.c                                     |  35 ++++
>> lib/packets.h                                     |   9 +-
>> lib/tnl-arp-cache.c                               | 214 +++++++++++++++++++
>> lib/tnl-arp-cache.h                               |  47 +++++
>> lib/tnl-ports.c                                   | 203 ++++++++++++++++++
>> lib/tnl-ports.h                                   |  49 +++++
>> lib/tnl-router.c                                  | 237 
>> ++++++++++++++++++++++
>> lib/tnl-router.h                                  |  42 ++++
>> lib/vlog.c                                        |   1 +
>> lib/vxlan.h                                       |  46 +++++
>> ofproto/ofproto-dpif-xlate.c                      | 176 +++++++++++++++-
>> ofproto/ofproto-dpif-xlate.h                      |   3 +-
>> ofproto/ofproto-dpif.c                            |  38 +++-
>> ofproto/tunnel.c                                  |  79 +++++++-
>> ofproto/tunnel.h                                  |  10 +-
>> rhel/openvswitch.spec.in                          |   2 +-
>> vswitchd/ovs-vswitchd.8.in                        |   4 +
>> vswitchd/ovs-vswitchd.c                           |   6 +
>> 38 files changed, 1785 insertions(+), 52 deletions(-)
>> create mode 100644 README-Userspace-Tunneling
>> create mode 100644 lib/tnl-arp-cache.c
>> create mode 100644 lib/tnl-arp-cache.h
>> create mode 100644 lib/tnl-ports.c
>> create mode 100644 lib/tnl-ports.h
>> create mode 100644 lib/tnl-router.c
>> create mode 100644 lib/tnl-router.h
>> create mode 100644 lib/vxlan.h
>>
>> diff --git a/Makefile.am b/Makefile.am
>> index 77ceec6..3941e0f 100644
>> --- a/Makefile.am
>> +++ b/Makefile.am
>> @@ -84,6 +84,7 @@ EXTRA_DIST = \
>>       PORTING \
>>       README.md \
>>       README-lisp \
>> +     README-Userspace-Tunneling \
>>       REPORTING-BUGS \
>>       TODO \
>>       .travis.yml \
>> diff --git a/README-Userspace-Tunneling b/README-Userspace-Tunneling
>> new file mode 100644
>> index 0000000..1aebbca
>> --- /dev/null
>> +++ b/README-Userspace-Tunneling
>> @@ -0,0 +1,81 @@
>> +
>> +Open vSwitch support tunneling in userspace. Tunneling is implemented in
>> +platform independent way. But it does expect following from platform:
>> +1. dpif-netdev bridge is backed by netdev should be able to
>> +   reply to ARP requests.
>
> What does it mean for a netdev to be able to reply to ARP requests? Or do you 
> mean that the bridge should be able to reply to ARP requests?
>
yes, bridge is suppose to reply to ARP, but bridge is backed by a
netdev implementation the netdev should be able to reply to arp. Now I
think it is too technical, I will try to simplify it.

>> +2. OVS tunneling routing table is configured.
>> +
>> +Setup:
>> +======
>> +First start vswitch with --userspace-tunneling option to enable tunneling.
>
> I’ll come back to this later, but I think we should treat the new tunneling 
> feature like any other optional datapath feature: Probe for it and (only) use 
> it if the probe was successful. That way it will be easier for other 
> datapaths to implement this and make use of it. I don’t know any details of, 
> e.g., Windows datapath tunneling plans, but it should be feasible to use the 
> same methods also with kernel datapaths.
>

I do not see tnl-push/pop getting implemented in other datapath any
time soo, so probing can be static.

>> +Setup physical bridges for app physical interfaces. create integration 
>> bridge.
>
> Would it be possible to use the new tunneling feature with a single bridge? I 
> see it may require recirculation, but is that really different from the case 
> of different physical and integration bridges? I’m also trying to figure out 
> if this model could fit the case of switching ASIC supporting tunneling, 
> where a single switch view might be preferable.
>
If there is no physical bridge then netdev needs to run IP routing and
ARP, I wanted to use it from base OS.
It is doable, Lets introduce simpler model.

>> +Add vxlan port to int-bridge. Assign IP address to physical bridge where
>> +vxlan traffic is expected. Add tunnel routes for the hysical bridge.
>
> “physical”
>
ok.

>> +
>> +Example:
>> +========
>> +If you want to connect to vxlan tunnel endpoint  (logical ip: 192.168.1.2
>> +and remote physical ip: 172.168.1.2) that you want to connect to.
>> +
>> +In this case you need to configure OVS bridges as follows.
>> +
>> +1. Let assume 172.168.1.2/24 network is reachable via eth1 create physical 
>> bridge br-eth1
>> +   assign ip address (172.168.1.1/24) to br-eth1, Add eth1 to br-eth1
>> +2. Add route to ovs
>> +   ovs-appctl tnl/route/add 172.168.1.1 255.255.255.0 br-eth1
>
> Could OVS do this step automatically based on the assigned IP address, as all 
> the parameters are already there?
>
done.

> I see that if the route involves a gateway router it needs to be explicitly 
> added, though.
>
OVS already does route cache by listening to RTNL msg, we can use
same. I can extract gw information from there.

>> +3. Add integration brdge int-br and add tunnel port using standard syntax.
>> +   ovs-vsctl add-port int-br vxlan0 -- set interface vxlan0 type=vxlan  
>> options:remote_ip=172.168.1.2
>> +4. Assign IP address to int-br, So final topology looks like:
>> +
>> +
>> +       192.168.1.1/24
>> +       +--------------+
>> +       |    int-br    |                                      192.168.1.2/24
>> +       +--------------+                                      
>> +--------------+
>> +       |    vxlan0    |                                      |    vxlan0    
>> |
>> +       +--------------+                                      
>> +--------------+
>> +             |                                                     |
>> +             |                                                     |
>> +             |                                                     |
>> +        172.168.1.1/24                                             |
>> +       +--------------+                                            |
>> +       |    br-eth1   |                                      172.168.1.2/24
>> +       +--------------+                                      
>> +---------------+
>> +       |    eth1      |--------------------------------------|    eth1      
>>  |
>> +       +--------------+                                      
>> +----------------
>> +
>> +       Host A with OVS.                                      Remote host.
>> +
>> +With this setup, ping to vxlan target device (192.168.1.2) should wotk
>
> “wotk” -> “work"
>
ok.

>> +There are following commands that shows internal tables:
>> +
>> +Tunneling related commands:
>> +===========================
>> +Tunnel routing table:
>> +    To Add route:
>> +       ovs-appctl tnl/route/add <IP address> <mask> <output-bridge-name>
>
> Does this syntax support routing via a gateway router, when the peer is not 
> in the same subnet?
>
yes, I will update doc.

>> +    To see all routes configured:
>> +       ovs-appctl tnl/route/show
>> +    To del route
>> +       ovs-appctl tnl/route/del <IP address> <mask>
>> +
>> +ARP:
>> +    To see arp cache content:
>> +       ovs-appctl tnl/arp/show
>> +    To flush arp cache
>> +        ovs-appctl tnl/arp/flush
>> +
>> +To check tunnel ports listening in vswitchd:
>> +ovs-appctl tnl/ports/show
>> +
>> +Limits:
>> +=======
>> + - Does not handle IP fragments
>> + - Only VxLan support
>> + - Does not works correctly with vlans.
>> +
>> +I hope we will fix it soon.
>> +
>> +Contact
>> +=======
>> [email protected]
>> diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
>> b/datapath/linux/compat/include/linux/openvswitch.h
>> index b2257e6..22647e9 100644
>> --- a/datapath/linux/compat/include/linux/openvswitch.h
>> +++ b/datapath/linux/compat/include/linux/openvswitch.h
>> @@ -580,6 +580,12 @@ struct ovs_action_hash {
>>       uint32_t  hash_basis;
>> };
>>
>> +enum ovs_tunnel_action_attr {
>> +     OVS_ACTION_TUNNEL_HEADER_PORT,
>> +     OVS_ACTION_TUNNEL_HEADER_PHY_PORT,
>> +     OVS_ACTION_TUNNEL_HEADER_DATA,
>> +};
>> +
>> /**
>>  * enum ovs_action_attr - Action types.
>>  *
>> @@ -633,6 +639,10 @@ enum ovs_action_attr {
>>                                      * data immediately followed by a mask.
>>                                      * The data must be zero for the unmasked
>>                                      * bits. */
>> +#ifndef __KERNEL__
>> +     OVS_ACTION_ATTR_TUNNEL_PUSH = 64,
>> +     OVS_ACTION_ATTR_TUNNEL_POP,
>> +#endif
>
> Managing the number allocation will become tricky if we’ll get e.g. Windows 
> kernel DP supporting these.
>
yes, that is general problem. I do not want to solve in this patch.

>>       __OVS_ACTION_ATTR_MAX
>> };
>>
>> diff --git a/debian/openvswitch-common.docs b/debian/openvswitch-common.docs
>> index 3bd2ca3..9d053e2 100644
>> --- a/debian/openvswitch-common.docs
>> +++ b/debian/openvswitch-common.docs
>> @@ -1,2 +1,3 @@
>> FAQ
>> INSTALL.DPDK
>> +README-Userspace-Tunneling
>> diff --git a/lib/automake.mk b/lib/automake.mk
>> index c44a77f..da54204 100644
>> --- a/lib/automake.mk
>> +++ b/lib/automake.mk
>> @@ -231,6 +231,12 @@ lib_libopenvswitch_la_SOURCES = \
>>       lib/timer.h \
>>       lib/timeval.c \
>>       lib/timeval.h \
>> +     lib/tnl-arp-cache.c \
>> +     lib/tnl-arp-cache.h \
>> +     lib/tnl-ports.c \
>> +     lib/tnl-ports.h \
>> +     lib/tnl-router.c \
>> +     lib/tnl-router.h \
>>       lib/token-bucket.c \
>>       lib/token-bucket.h \
>>       lib/type-props.h \
>> @@ -257,7 +263,8 @@ lib_libopenvswitch_la_SOURCES = \
>>       lib/vswitch-idl.c \
>>       lib/vswitch-idl.h \
>>       lib/vtep-idl.c \
>> -     lib/vtep-idl.h
>> +     lib/vtep-idl.h \
>> +     lib/vxlan.h
>>
>> if WIN32
>> lib_libopenvswitch_la_SOURCES += \
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index a7c9c92..81600f0 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -2791,15 +2791,83 @@ dpif_netdev_register_upcall_cb(struct dpif *dpif, 
>> upcall_callback *cb,
>> static void
>> dp_netdev_drop_packets(struct dpif_packet ** packets, int cnt, bool 
>> may_steal)
>> {
>> -    int i;
>> -
>>     if (may_steal) {
>> +        int i;
>> +
>>         for (i = 0; i < cnt; i++) {
>>             dpif_packet_delete(packets[i]);
>>         }
>>     }
>> }
>>
>> +static int
>> +odp_push_tunnel_action(const struct dp_netdev *dp,
>> +                       const struct nlattr *attr,
>> +                       struct dpif_packet **packets, int cnt)
>> +{
>> +    struct dp_netdev_port *tun_port;
>> +    unsigned int left;
>> +    const struct nlattr *a;
>> +    const void *header = NULL;
>> +    int header_len = 0;
>> +    odp_port_t tunnel_oprt = 0;
>> +    odp_port_t phy_oprt = 0;
>> +    int i;
>> +
>> +    NL_NESTED_FOR_EACH(a, left, attr) {
>> +        uint16_t type = nl_attr_type(a);
>> +        size_t len = nl_attr_get_size(a);
>> +
>> +        switch (type) {
>> +        case OVS_ACTION_TUNNEL_HEADER_PORT:
>> +            tunnel_oprt = u32_to_odp(nl_attr_get_u32(a));
>> +        break;
>> +
>> +        case OVS_ACTION_TUNNEL_HEADER_PHY_PORT:
>> +            phy_oprt = u32_to_odp(nl_attr_get_u32(a));
>> +        break;
>> +
>> +        case OVS_ACTION_TUNNEL_HEADER_DATA:
>> +            header = nl_attr_get(a);
>> +            header_len = len;
>> +        break;
>> +
>> +        default:
>> +            VLOG_ERR("Unknown tunnel push param.");
>> +            return -EINVAL;
>> +        }
>> +    }
>> +    if (!header_len || !tunnel_oprt || !phy_oprt) {
>> +        VLOG_ERR("Insufficient tunnel push param.");
>> +        return -EINVAL;
>> +    }
>> +
>> +    tun_port = dp_netdev_lookup_port(dp, tunnel_oprt);
>> +    if (!tun_port) {
>> +        return -EINVAL;
>> +    }
>> +    netdev_push_header(tun_port->netdev, packets, cnt, header, header_len);
>> +
>> +    for (i = 0; i < cnt; i++) {
>> +        packets[i]->md = PKT_METADATA_INITIALIZER(phy_oprt);
>
> I see the need to clear out the metadata, as the inner packet is now hidden 
> by encapsulation. However, I’m not sure if it is a good idea to overload the 
> ‘in_port’ member for the output tunnel port, though.
>
This is recirc type of action, so output port for punnel push is
inport for next action execution cycle.
> Also, maybe this step should be performed by netdev_push_header(), as it is 
> transforming the packets anyway.
>
ok.

>> +    }
>> +    return 0;
>> +}
>> +
>> +static void
>> +dp_netdev_clone_pkt_batch(struct dpif_packet **tnl_pkt,
>> +                          struct dpif_packet **packets, int cnt)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < cnt; i++) {
>> +        struct dpif_packet *inner_pkt;
>> +
>> +         inner_pkt = dpif_packet_clone(packets[i]);
>> +         tnl_pkt[i] = inner_pkt;
>> +    }
>> +}
>> +
>> static void
>> dp_execute_cb(void *aux_, struct dpif_packet **packets, int cnt,
>>               const struct nlattr *a, bool may_steal)
>> @@ -2822,6 +2890,59 @@ dp_execute_cb(void *aux_, struct dpif_packet 
>> **packets, int cnt,
>>         }
>>         break;
>>
>> +    case OVS_ACTION_ATTR_TUNNEL_PUSH:
>> +        if (*depth < MAX_RECIRC_DEPTH) {
>> +            struct dpif_packet *tnl_pkt[NETDEV_MAX_RX_BATCH];
>> +            int err;
>> +
>> +            if (may_steal) {
>> +                dp_netdev_clone_pkt_batch(tnl_pkt, packets, cnt);
>> +                packets = tnl_pkt;
>> +            }
>
> Should this be the reverse? Clone if can NOT take the packets?
>
right,

>> +
>> +            err = odp_push_tunnel_action(dp, a, packets, cnt);
>> +            if (err) {
>> +                dp_netdev_drop_packets(tnl_pkt, cnt, may_steal);
>> +                break;
>> +            }
>> +
>> +            (*depth)++;
>> +            dp_netdev_input(pmd, packets, cnt);
>> +            (*depth)--;
>> +            return;
>> +        }
>
> Should “break” here.
>
packets are already consumed so we can not break here.

>> +
>> +    case OVS_ACTION_ATTR_TUNNEL_POP:
>> +        if (*depth >= MAX_RECIRC_DEPTH) {
>> +            break;
>> +        }
>> +
>> +        p = dp_netdev_lookup_port(dp, u32_to_odp(nl_attr_get_u32(a)));
>> +        if (p) {
>> +            struct dpif_packet *tnl_pkt[NETDEV_MAX_RX_BATCH];
>> +            int err;
>> +
>> +            if (may_steal) {
>> +                dp_netdev_clone_pkt_batch(tnl_pkt, packets, cnt);
>> +                packets = tnl_pkt;
>> +            }
>
> Ditto.
>
>> +
>> +            err = netdev_pop_header(p->netdev, packets, cnt);
>> +            if (!err) {
>> +
>> +                for (i = 0; i < cnt; i++) {
>> +                    packets[i]->md.in_port.odp_port = 
>> u32_to_odp(nl_attr_get_u32(a));
>
> Maybe use the same port already taken from nl_attr above?
>
ok.

>> +                }
>> +
>> +                (*depth)++;
>> +                dp_netdev_input(pmd, packets, cnt);
>> +                (*depth)--;
>> +                return;
>> +            }
>> +            dp_netdev_drop_packets(tnl_pkt, cnt, may_steal);
>> +        }
>> +        break;
>> +
>>     case OVS_ACTION_ATTR_USERSPACE:
>>         if (!fat_rwlock_tryrdlock(&dp->upcall_rwlock)) {
>>             const struct nlattr *userdata;
>> diff --git a/lib/dpif.c b/lib/dpif.c
>> index 34d8c13..669ca3f 100644
>> --- a/lib/dpif.c
>> +++ b/lib/dpif.c
>> @@ -41,6 +41,8 @@
>> #include "shash.h"
>> #include "sset.h"
>> #include "timeval.h"
>> +#include "tnl-arp-cache.h"
>> +#include "tnl-ports.h"
>> #include "util.h"
>> #include "valgrind.h"
>> #include "vlog.h"
>> @@ -72,6 +74,9 @@ struct registered_dpif_class {
>> static struct shash dpif_classes = SHASH_INITIALIZER(&dpif_classes);
>> static struct sset dpif_blacklist = SSET_INITIALIZER(&dpif_blacklist);
>>
>> +/* User configured value, by default off. */
>> +bool userspace_tunneling = false;
>> +
>
> I’d make this a datapath feature we probe with a flow set-up. Then use the 
> feature whenever it is available.
>
>> /* Protects 'dpif_classes', including the refcount, and 'dpif_blacklist'. */
>> static struct ovs_mutex dpif_mutex = OVS_MUTEX_INITIALIZER;
>>
>> @@ -114,6 +119,8 @@ dp_initialize(void)
>>         }
>>         dpctl_unixctl_register();
>>         ovsthread_once_done(&once);
>> +        tnl_port_init();
>> +        tnl_arp_cache_init();
>>     }
>> }
>>
>> @@ -408,6 +415,7 @@ dpif_run(struct dpif *dpif)
>>     if (dpif->dpif_class->run) {
>>         dpif->dpif_class->run(dpif);
>>     }
>> +    tnl_arp_cache_run();
>> }
>>
>> /* Arranges for poll_block() to wake up when dp_run() needs to be called for
>> @@ -1002,6 +1010,8 @@ dpif_execute_helper_cb(void *aux_, struct dpif_packet 
>> **packets, int cnt,
>>
>>     switch ((enum ovs_action_attr)type) {
>>     case OVS_ACTION_ATTR_OUTPUT:
>> +    case OVS_ACTION_ATTR_TUNNEL_PUSH:
>> +    case OVS_ACTION_ATTR_TUNNEL_POP:
>>     case OVS_ACTION_ATTR_USERSPACE:
>>     case OVS_ACTION_ATTR_RECIRC: {
>>         struct dpif_execute execute;
>> @@ -1591,3 +1601,18 @@ log_flow_get_message(const struct dpif *dpif, const 
>> struct dpif_flow_get *get,
>>                          get->flow->actions, get->flow->actions_len);
>>     }
>> }
>> +
>> +void dpif_enable_userspace_tunneling(void)
>> +{
>> +    userspace_tunneling = true;
>> +}
>> +
>> +bool dpif_is_userspace_tunneling_on(void)
>> +{
>> +    return userspace_tunneling;
>> +}
>> +
>> +bool dpif_support_userspace_tunneling(const struct dpif *dpif)
>> +{
>> +   return !strcmp(dpif->dpif_class->type, "netdev");
>> +}
>
> All these would not be necessary if ofproto-dpif probes for the new actions, 
> and each datapath then either supports them or not.
>
already replied to comment above.

>> diff --git a/lib/dpif.h b/lib/dpif.h
>> index f88fa78..7402a85 100644
>> --- a/lib/dpif.h
>> +++ b/lib/dpif.h
>> @@ -790,6 +790,10 @@ void dpif_get_netflow_ids(const struct dpif *,
>> int dpif_queue_to_priority(const struct dpif *, uint32_t queue_id,
>>                            uint32_t *priority);
>>
>> +void dpif_enable_userspace_tunneling(void);
>> +bool dpif_is_userspace_tunneling_on(void);
>> +bool dpif_support_userspace_tunneling(const struct dpif *);
>> +
>> #ifdef  __cplusplus
>> }
>> #endif
>> diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
>> index 1bd9108..473c0f6 100644
>> --- a/lib/netdev-bsd.c
>> +++ b/lib/netdev-bsd.c
>> @@ -1562,6 +1562,9 @@ netdev_bsd_update_flags(struct netdev *netdev_, enum 
>> netdev_flags off,
>>     NULL, /* get_config */                           \
>>     NULL, /* set_config */                           \
>>     NULL, /* get_tunnel_config */                    \
>> +    NULL, /* build header */                         \
>> +    NULL, /* push header */                          \
>> +    NULL, /* pop header */                           \
>>     NULL, /* get_numa_id */                          \
>>     NULL, /* set_multiq */                           \
>>                                                      \
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 9c93768..c8610d7 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -1392,6 +1392,9 @@ unlock_dpdk:
>>     netdev_dpdk_get_config,                                   \
>>     NULL,                       /* netdev_dpdk_set_config */  \
>>     NULL,                       /* get_tunnel_config */       \
>> +    NULL, /* build header */                                  \
>> +    NULL, /* push header */                                   \
>> +    NULL, /* pop header */                                    \
>>     netdev_dpdk_get_numa_id,    /* get_numa_id */             \
>>     MULTIQ,                     /* set_multiq */              \
>>                                                               \
>> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
>> index a2b1f2c..22d980a 100644
>> --- a/lib/netdev-dummy.c
>> +++ b/lib/netdev-dummy.c
>> @@ -1032,6 +1032,9 @@ static const struct netdev_class dummy_class = {
>>     netdev_dummy_get_config,
>>     netdev_dummy_set_config,
>>     NULL,                       /* get_tunnel_config */
>> +    NULL,                       /* build header */
>> +    NULL,                       /* push header */
>> +    NULL,                       /* pop header */
>>     NULL,                       /* get_numa_id */
>>     NULL,                       /* set_multiq */
>>
>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>> index ab5ef7d..e89063a 100644
>> --- a/lib/netdev-linux.c
>> +++ b/lib/netdev-linux.c
>> @@ -2715,6 +2715,9 @@ netdev_linux_update_flags(struct netdev *netdev_, enum 
>> netdev_flags off,
>>     NULL,                       /* get_config */                \
>>     NULL,                       /* set_config */                \
>>     NULL,                       /* get_tunnel_config */         \
>> +    NULL,                       /* build header */              \
>> +    NULL,                       /* push header */               \
>> +    NULL,                       /* pop header */                \
>>     NULL,                       /* get_numa_id */               \
>>     NULL,                       /* set_multiq */                \
>>                                                                 \
>> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
>> index 0e4cda5..50b63a6 100644
>> --- a/lib/netdev-provider.h
>> +++ b/lib/netdev-provider.h
>> @@ -254,6 +254,15 @@ struct netdev_class {
>>     const struct netdev_tunnel_config *
>>         (*get_tunnel_config)(const struct netdev *netdev);
>>
>> +    void (*build_header)(const struct netdev *, struct ofpbuf *);
>> +
>> +    int (*push_header)(const struct netdev *netdev,
>> +                       struct dpif_packet **buffers, int cnt,
>> +                       const void *header, int size);
>> +
>> +    int  (*pop_header)(struct netdev *netdev,
>> +                       struct dpif_packet **buffers, int cnt);
>> +
>>     /* Returns the id of the numa node the 'netdev' is on.  If there is no
>>      * such info, returns NETDEV_NUMA_UNSPEC. */
>>     int (*get_numa_id)(const struct netdev *netdev);
>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>> index 83f1296..33ff49b 100644
>> --- a/lib/netdev-vport.c
>> +++ b/lib/netdev-vport.c
>> @@ -25,6 +25,7 @@
>> #include <sys/ioctl.h>
>>
>> #include "byte-order.h"
>> +#include "csum.h"
>> #include "daemon.h"
>> #include "dirs.h"
>> #include "dpif.h"
>> @@ -32,13 +33,17 @@
>> #include "hmap.h"
>> #include "list.h"
>> #include "netdev-provider.h"
>> +#include "vxlan.h"
>> #include "ofpbuf.h"
>> #include "packets.h"
>> +#include "packet-dpif.h"
>> #include "poll-loop.h"
>> #include "route-table.h"
>> #include "shash.h"
>> #include "socket-util.h"
>> #include "vlog.h"
>> +#include "unaligned.h"
>> +#include "util.h"
>>
>> VLOG_DEFINE_THIS_MODULE(netdev_vport);
>>
>> @@ -221,10 +226,24 @@ netdev_vport_alloc(void)
>> static int
>> netdev_vport_construct(struct netdev *netdev_)
>> {
>> -    struct netdev_vport *netdev = netdev_vport_cast(netdev_);
>> +    struct netdev_vport *dev = netdev_vport_cast(netdev_);
>> +    const char *type = netdev_get_type(netdev_);
>> +
>> +    ovs_mutex_init(&dev->mutex);
>> +    eth_addr_random(dev->etheraddr);
>> +
>> +    /* Add a default destination port for tunnel ports if none specified. */
>> +    if (!strcmp(type, "geneve")) {
>> +        dev->tnl_cfg.dst_port = htons(GENEVE_DST_PORT);
>> +    }
>> +
>> +    if (!strcmp(type, "vxlan")) {
>> +        dev->tnl_cfg.dst_port = htons(VXLAN_DST_PORT);
>> +    }
>>
>> -    ovs_mutex_init(&netdev->mutex);
>> -    eth_addr_random(netdev->etheraddr);
>> +    if (!strcmp(type, "lisp")) {
>> +        dev->tnl_cfg.dst_port = htons(LISP_DST_PORT);
>> +    }
>>
>>     route_table_register();
>>
>> @@ -761,9 +780,134 @@ get_stats(const struct netdev *netdev, struct 
>> netdev_stats *stats)
>>
>>     return 0;
>> }
>> +
>> +static bool
>> +vxlan_extract_md(struct dpif_packet *dpif_pkt)
>> +{
>> +    struct ofpbuf *packet = &dpif_pkt->ofpbuf;
>> +    struct pkt_metadata *md = &dpif_pkt->md;
>> +    struct flow_tnl *tnl = &md->tunnel;
>> +    struct ip_header *nh;
>> +    struct udp_header *udp;
>> +    struct vxlanhdr *vxh;
>> +
>> +    nh = ofpbuf_l3(packet);
>> +    udp = ofpbuf_l4(packet);
>> +
>> +    memset(md, 0, sizeof(*md));
>> +
>> +    if (!nh || !udp) {
>> +        return false;
>> +    }
>> +    if (((const char *) ofpbuf_tail(packet) - (const char *) udp) <
>> +        (UDP_HEADER_LEN + sizeof (*vxh))) {
>> +        return false;
>> +    }
>> +    vxh = (struct vxlanhdr *) ((const char *)udp + sizeof(*udp));
>> +
>> +    if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
>> +       (vxh->vx_vni & htonl(0xff))) {
>> +        VLOG_ERR("invalid vxlan flags=%#x vni=%#x\n",
>> +                 ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
>> +        return false;
>> +    }
>> +
>> +    tnl->tun_id = vxh->vx_vni;
>> +    tnl->ip_src = get_16aligned_be32(&nh->ip_src);
>> +    tnl->ip_dst = get_16aligned_be32(&nh->ip_dst);
>> +    tnl->ip_tos = nh->ip_tos;
>> +    tnl->ip_ttl = nh->ip_ttl;
>> +
>> +    ofpbuf_reset_packet(packet, ((const char *) (vxh + 1) - ((const char *) 
>> packet->frame)));
>> +    return true;
>> +}
>> +
>> +static int
>> +netdev_vxlan_pop_header(struct netdev *netdev_ OVS_UNUSED,
>> +                        struct dpif_packet **pkt, int cnt)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < cnt; i++) {
>> +        vxlan_extract_md(pkt[i]);
>> +    }
>> +    return 0;
>> +}
>> +
>> +static void
>> +netdev_vxlan_build_header(const struct netdev *netdev, struct ofpbuf 
>> *header)
>> +{
>> +    struct netdev_vport *dev = netdev_vport_cast(netdev);
>> +    struct netdev_tunnel_config *tnl_cfg;
>> +    const struct eth_header *eth;
>> +    struct ip_header *ip;
>> +    struct udp_header *udp;
>> +    struct vxlanhdr *vxh;
>> +
>> +    ovs_mutex_lock(&dev->mutex);
>> +    tnl_cfg = &dev->tnl_cfg;
>
> We ought to make tnl_cfg RCU to get rid of the mutex here (and likely 
> elsewhere as well). Just a note, I think it would be better to do this in a 
> separate patch anyway.
>
OK, I added TODO.

>> +
>> +    eth = (const struct eth_header *) ofpbuf_data(header);
>> +
>> +    ip = (struct ip_header *) ((const char *)eth + sizeof *eth);
>> +    ip->ip_proto = IPPROTO_UDP;
>> +
>> +    udp = ofpbuf_put_zeros(header, sizeof *udp);
>> +    /* Set src port in push tunnel. */
>> +    udp->udp_src = 44444;
>
> If push sets udp_src, why set it here at all?
>
ok.

>> +    udp->udp_dst = tnl_cfg->dst_port;
>> +    /* udp_csum is zero */
>
> Would it not be the push that should be responsible for setting the checksum, 
> if we used one?
>
ok, moved to push.
>> +
>> +    vxh = ofpbuf_put_zeros(header, sizeof *vxh);
>
> Could use ofpbuf_put_uninit() here instead...
>
I changed struct definition so it is no longer required.

>> +    vxh->vx_flags = htonl(VXLAN_FLAGS);
>> +    vxh->vx_vni = tnl_cfg->out_key & htonl(0xff);
>> +
>
> … as we are setting all the header values here.
>
>> +    ovs_mutex_unlock(&dev->mutex);
>> +}
>> +
>> +static void
>> +netdev_vxlan_push_header__(struct ofpbuf *packet,
>> +                           const void *header, int size)
>> +{
>> +    struct eth_header *eth;
>> +    struct ip_header *ip;
>> +    struct udp_header *udp;
>> +    struct vxlanhdr *vxh;
>> +    int hdr_size;
>
> I’d call this “l2_size”
>
ok.

>> +    int tot_size;
>
> and this “ip_tot_size”, would make code easier to follow.
>
>> +
>> +    eth = ofpbuf_push_uninit(packet, size);
>> +    hdr_size = sizeof *ip + sizeof *udp + sizeof *vxh;
>
> l2_size = size - (sizeof *ip + sizeof *udp + sizeof *vxh);
>
ok

>> +    tot_size  = ofpbuf_size(packet) - size + hdr_size;
>
> ip_tot_size = ofpbuf_size(packet) - l2_size;
>
ok.

>> +
>> +    memcpy(eth, header, size);
>> +    ip = (struct ip_header *) ((const char *)eth + sizeof *eth);
>
> Use this instead, to be better compatible with VLANs:
>
ok. But I do not think we need to deal with VLAN here.

> ip = (struct ip_header *) ((const char *)eth + l2_size);
>
>> +    ip->ip_tot_len = htons(tot_size);
>> +    ip->ip_csum = csum(ip, sizeof *ip);
>
> Could we precompute the ip_csum for zero ip_tot_len and then update the 
> checksum here based on the real value?
>
ok.

>> +
>> +    udp = (struct udp_header *) (ip + 1);
>> +    /* set udp src port*/
>> +    udp->udp_len = htons(ntohs(ip->ip_tot_len) - sizeof *ip);
>
> IMO we should also set the UDP checksum (to zero) here.
>
ok.

>> +}
>> +
>> +static int
>> +netdev_vxlan_push_header(const struct netdev *netdev OVS_UNUSED,
>> +                         struct dpif_packet **buffers, int cnt,
>
> ‘packets' would be a better name for the ‘buffers’ argument.
>
ok

>> +                         const void *header, int size)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < cnt; i++) {
>> +        netdev_vxlan_push_header__(&buffers[i]->ofpbuf, header, size);
>> +    }
>> +    return 0;
>> +}
>> +
>>
>> #define VPORT_FUNCTIONS(GET_CONFIG, SET_CONFIG,             \
>> -                        GET_TUNNEL_CONFIG, GET_STATUS)      \
>> +                        GET_TUNNEL_CONFIG, GET_STATUS,      \
>> +                        BUILD_HEADER,                       \
>> +                        PUSH_HEADER, POP_HEADER)            \
>>     NULL,                                                   \
>>     netdev_vport_run,                                       \
>>     netdev_vport_wait,                                      \
>> @@ -775,6 +919,9 @@ get_stats(const struct netdev *netdev, struct 
>> netdev_stats *stats)
>>     GET_CONFIG,                                             \
>>     SET_CONFIG,                                             \
>>     GET_TUNNEL_CONFIG,                                      \
>> +    BUILD_HEADER,                                           \
>> +    PUSH_HEADER,                                            \
>> +    POP_HEADER,                                             \
>>     NULL,                       /* get_numa_id */           \
>>     NULL,                       /* set_multiq */            \
>>                                                             \
>> @@ -826,12 +973,14 @@ get_stats(const struct netdev *netdev, struct 
>> netdev_stats *stats)
>>     NULL,                   /* rx_wait */                   \
>>     NULL,                   /* rx_drain */
>>
>> -#define TUNNEL_CLASS(NAME, DPIF_PORT)                       \
>> -    { DPIF_PORT,                                            \
>> -        { NAME, VPORT_FUNCTIONS(get_tunnel_config,          \
>> -                                set_tunnel_config,          \
>> -                                get_netdev_tunnel_config,   \
>> -                                tunnel_get_status) }}
>> +
>> +#define TUNNEL_CLASS(NAME, DPIF_PORT, BUILD_HEADER, PUSH_HEADER, 
>> POP_HEADER)   \
>> +    { DPIF_PORT,                                                            
>>    \
>> +        { NAME, VPORT_FUNCTIONS(get_tunnel_config,                          
>>    \
>> +                                set_tunnel_config,                          
>>    \
>> +                                get_netdev_tunnel_config,                   
>>    \
>> +                                tunnel_get_status,                          
>>    \
>> +                                BUILD_HEADER, PUSH_HEADER, POP_HEADER) }}
>>
>> void
>> netdev_vport_tunnel_register(void)
>> @@ -839,13 +988,14 @@ netdev_vport_tunnel_register(void)
>>     /* The name of the dpif_port should be short enough to accomodate adding
>>      * a port number to the end if one is necessary. */
>>     static const struct vport_class vport_classes[] = {
>> -        TUNNEL_CLASS("geneve", "genev_sys"),
>> -        TUNNEL_CLASS("gre", "gre_sys"),
>> -        TUNNEL_CLASS("ipsec_gre", "gre_sys"),
>> -        TUNNEL_CLASS("gre64", "gre64_sys"),
>> -        TUNNEL_CLASS("ipsec_gre64", "gre64_sys"),
>> -        TUNNEL_CLASS("vxlan", "vxlan_sys"),
>> -        TUNNEL_CLASS("lisp", "lisp_sys”)
>
> Add line for geneve here
>
right.

>> +        TUNNEL_CLASS("gre", "gre_sys", NULL, NULL, NULL),
>> +        TUNNEL_CLASS("ipsec_gre", "gre_sys", NULL, NULL, NULL),
>> +        TUNNEL_CLASS("gre64", "gre64_sys", NULL,  NULL, NULL),
>> +        TUNNEL_CLASS("ipsec_gre64", "gre64_sys", NULL, NULL, NULL),
>> +        TUNNEL_CLASS("vxlan", "vxlan_sys", netdev_vxlan_build_header,
>> +                                           netdev_vxlan_push_header,
>> +                                           netdev_vxlan_pop_header),
>> +        TUNNEL_CLASS("lisp", "lisp_sys", NULL, NULL, NULL)
>>     };
>>     static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>>
>> @@ -867,6 +1017,6 @@ netdev_vport_patch_register(void)
>>             { "patch", VPORT_FUNCTIONS(get_patch_config,
>>                                        set_patch_config,
>>                                        NULL,
>> -                                       NULL) }};
>> +                                       NULL, NULL, NULL, NULL) }};
>>     netdev_register_provider(&patch_class.netdev_class);
>> }
>> diff --git a/lib/netdev.c b/lib/netdev.c
>> index a9ad7d1..970fafc 100644
>> --- a/lib/netdev.c
>> +++ b/lib/netdev.c
>> @@ -724,6 +724,40 @@ netdev_send(struct netdev *netdev, int qid, struct 
>> dpif_packet **buffers,
>>     return error;
>> }
>>
>> +int
>> +netdev_pop_header(struct netdev *netdev, struct dpif_packet **buffers, int 
>> cnt)
>> +{
>> +    int error;
>> +
>> +    error = (netdev->netdev_class->pop_header
>> +             ? netdev->netdev_class->pop_header(netdev, buffers, cnt)
>> +             : EOPNOTSUPP);
>> +    if (!error) {
>> +        COVERAGE_INC(netdev_sent);
>
> Is this coverage intentional?
>
nope, removed it.

>> +    }
>> +    return error;
>> +}
>> +
>> +void
>> +netdev_build_header(const struct netdev *netdev, struct ofpbuf *header)
>> +{
>> +    if (netdev->netdev_class->build_header) {
>> +        netdev->netdev_class->build_header(netdev, header);
>> +    }
>> +}
>> +
>> +int
>> +netdev_push_header(const struct netdev *netdev,
>> +                   struct dpif_packet **buffers, int cnt,
>> +                   const void *header, int size)
>> +{
>> +    if (netdev->netdev_class->push_header) {
>> +        return netdev->netdev_class->push_header(netdev, buffers, cnt, 
>> header, size);
>> +    } else {
>> +        return -EINVAL;
>> +    }
>> +}
>> +
>> /* Registers with the poll loop to wake up from the next call to poll_block()
>>  * when the packet transmission queue has sufficient room to transmit a 
>> packet
>>  * with netdev_send().
>> diff --git a/lib/netdev.h b/lib/netdev.h
>> index fc4180a..3af0665 100644
>> --- a/lib/netdev.h
>> +++ b/lib/netdev.h
>> @@ -180,6 +180,13 @@ int netdev_send(struct netdev *, int qid, struct 
>> dpif_packet **, int cnt,
>>                 bool may_steal);
>> void netdev_send_wait(struct netdev *, int qid);
>>
>> +void netdev_build_header(const struct netdev *, struct ofpbuf *);
>> +int netdev_push_header(const struct netdev *netdev,
>> +                       struct dpif_packet **buffers, int cnt,
>> +                       const void *header, int size);
>> +int netdev_pop_header(struct netdev *netdev, struct dpif_packet **buffers,
>> +                      int cnt);
>> +
>> /* Hardware address. */
>> int netdev_set_etheraddr(struct netdev *, const uint8_t mac[6]);
>> int netdev_get_etheraddr(const struct netdev *, uint8_t mac[6]);
>> diff --git a/lib/odp-execute.c b/lib/odp-execute.c
>> index b179cf2..861257c 100644
>> --- a/lib/odp-execute.c
>> +++ b/lib/odp-execute.c
>> @@ -412,6 +412,8 @@ odp_execute_actions(void *dp, struct dpif_packet 
>> **packets, int cnt, bool steal,
>>         switch ((enum ovs_action_attr) type) {
>>             /* These only make sense in the context of a datapath. */
>>         case OVS_ACTION_ATTR_OUTPUT:
>> +        case OVS_ACTION_ATTR_TUNNEL_PUSH:
>> +        case OVS_ACTION_ATTR_TUNNEL_POP:
>>         case OVS_ACTION_ATTR_USERSPACE:
>>         case OVS_ACTION_ATTR_RECIRC:
>>             if (dp_execute_action) {
>> diff --git a/lib/odp-util.c b/lib/odp-util.c
>> index b9e1e21..20d6146 100644
>> --- a/lib/odp-util.c
>> +++ b/lib/odp-util.c
>> @@ -30,10 +30,12 @@
>> #include "dynamic-string.h"
>> #include "flow.h"
>> #include "netlink.h"
>> +#include "vxlan.h"
>> #include "ofpbuf.h"
>> #include "packets.h"
>> #include "simap.h"
>> #include "timeval.h"
>> +#include "unaligned.h"
>> #include "util.h"
>> #include "vlog.h"
>>
>> @@ -74,6 +76,8 @@ odp_action_len(uint16_t type)
>>
>>     switch ((enum ovs_action_attr) type) {
>>     case OVS_ACTION_ATTR_OUTPUT: return sizeof(uint32_t);
>> +    case OVS_ACTION_ATTR_TUNNEL_PUSH: return -2;
>> +    case OVS_ACTION_ATTR_TUNNEL_POP: return sizeof(uint32_t);
>>     case OVS_ACTION_ATTR_USERSPACE: return -2;
>>     case OVS_ACTION_ATTR_PUSH_VLAN: return sizeof(struct 
>> ovs_action_push_vlan);
>>     case OVS_ACTION_ATTR_POP_VLAN: return 0;
>> @@ -507,6 +511,86 @@ format_odp_hash_action(struct ds *ds, const struct 
>> ovs_action_hash *hash_act)
>> }
>>
>> static void
>> +format_odp_tunnel_vxlan_header(struct ds *ds, const struct nlattr *a)
>> +{
>> +    const struct eth_header *eth;
>> +    const struct ip_header *ip;
>> +    const struct udp_header *udp;
>> +    const struct vxlanhdr *vxh;
>> +
>> +    /* TODO: handle other tunnel data. */
>> +    eth = nl_attr_get(a);
>> +    ip = (const struct ip_header *)   ((const char *)eth + sizeof(*eth));
>
> Need to parse VLANs here?
>
I do not think tunnel header will have vlan set.
>> +    udp = (const struct udp_header *) ((const char *)ip  + sizeof(*ip));
>> +    vxh = (const struct vxlanhdr *)   ((const char *)udp + sizeof(*udp));
>> +
>> +    /* Ehernet */
>
> “Ethernet”
>
ok.

>> +    ds_put_format(ds, "header(eth(src=");
>> +    ds_put_format(ds, ETH_ADDR_FMT, ETH_ADDR_ARGS(eth->eth_src));
>> +    ds_put_format(ds, ",dst=");
>> +    ds_put_format(ds, ETH_ADDR_FMT, ETH_ADDR_ARGS(eth->eth_dst));
>> +    ds_put_format(ds, ",dl_type=0x%04"PRIx16"),", ntohs(eth->eth_type));
>> +
>> +    /* IPv4 */
>> +    ds_put_format(ds, "ipv4(src="IP_FMT",dst="IP_FMT",proto=%"PRIu8
>> +                  ",tos=%#"PRIx8",ttl=%"PRIu8",frag=%"PRIx16"),",
>> +                  IP_ARGS(get_16aligned_be32(&ip->ip_src)),
>> +                  IP_ARGS(get_16aligned_be32(&ip->ip_dst)),
>> +                  ip->ip_proto, ip->ip_tos,
>> +                  ip->ip_ttl,
>> +                  ip->ip_frag_off);
>> +
>> +    /* UDP */
>> +    ds_put_format(ds, "udp(src=%"PRIu16",dst=%"PRIu16"),",
>> +                  ntohs(udp->udp_src), ntohs(udp->udp_dst));
>> +
>> +    /* VxLan */
>> +    ds_put_format(ds, "vxlan(flags=%"PRIx32",vni=%"PRIx32")",
>> +                  vxh->vx_flags, vxh->vx_vni);
>> +
>> +    ds_put_format(ds, ")");
>> +}
>> +
>> +static void
>> +format_odp_tunnel_push_action(struct ds *ds, const struct nlattr *attrs,
>> +                              size_t len)
>> +{
>> +    enum ovs_tunnel_action_attr type;
>> +    const struct nlattr *a;
>> +    unsigned int left;
>> +
>> +    ds_put_format(ds, "tunnel_push(");
>> +    NL_ATTR_FOR_EACH (a, left, attrs, len) {
>> +        if (a != attrs) {
>> +            ds_put_cstr(ds, ",");
>> +        }
>> +
>> +        type = nl_attr_type(a);
>> +        switch(type) {
>> +            case OVS_ACTION_TUNNEL_HEADER_PORT:
>> +                ds_put_format(ds, "tnl_port(%"PRIu32, nl_attr_get_u32(a));
>> +                ds_put_cstr(ds, ")");
>> +            break;
>> +
>> +            case OVS_ACTION_TUNNEL_HEADER_PHY_PORT:
>> +                ds_put_format(ds, "phy_port(%"PRIu32, nl_attr_get_u32(a));
>> +                ds_put_cstr(ds, ")");
>> +            break;
>> +
>> +            case OVS_ACTION_TUNNEL_HEADER_DATA:
>> +                format_odp_tunnel_vxlan_header(ds, a);
>> +            break;
>> +
>> +            default:
>> +                ds_put_cstr(ds, "<error>");
>> +            break;
>> +        }
>> +    }
>> +    ds_put_format(ds, ")");
>> +
>> +}
>> +
>> +static void
>> format_odp_action(struct ds *ds, const struct nlattr *a)
>> {
>>     int expected_len;
>> @@ -526,6 +610,13 @@ format_odp_action(struct ds *ds, const struct nlattr *a)
>>     case OVS_ACTION_ATTR_OUTPUT:
>>         ds_put_format(ds, "%"PRIu32, nl_attr_get_u32(a));
>>         break;
>> +    case OVS_ACTION_ATTR_TUNNEL_POP:
>> +        ds_put_format(ds, "tnl_pop(%"PRIu32, nl_attr_get_u32(a));
>> +        ds_put_cstr(ds, ")");
>> +        break;
>> +    case OVS_ACTION_ATTR_TUNNEL_PUSH:
>> +        format_odp_tunnel_push_action(ds, nl_attr_get(a), 
>> nl_attr_get_size(a));
>> +        break;
>>     case OVS_ACTION_ATTR_USERSPACE:
>>         format_odp_userspace_action(ds, a);
>>         break;
>> @@ -748,6 +839,16 @@ parse_odp_action(const char *s, const struct simap 
>> *port_names,
>>         uint32_t port;
>>         int n;
>>
>> +        if (ovs_scan(s, "tnl_queue(%"SCNi32")%n", &port, &n)) {
>> +            nl_msg_put_u32(actions, OVS_ACTION_ATTR_OUTPUT, port);
>> +            return n;
>> +        }
>
> What is this?
>
it should be parsing tnl_pop action. I fixed it.

>> +    }
>> +
>> +    {
>> +        uint32_t port;
>> +        int n;
>> +
>>         if (ovs_scan(s, "%"SCNi32"%n", &port, &n)) {
>>             nl_msg_put_u32(actions, OVS_ACTION_ATTR_OUTPUT, port);
>>             return n;
>> @@ -3543,6 +3644,23 @@ odp_put_tunnel_action(const struct flow_tnl *tunnel,
>>     tun_key_to_attr(odp_actions, tunnel);
>>     nl_msg_end_nested(odp_actions, offset);
>> }
>> +
>> +void
>> +odp_put_build_tunnel_header_action(struct ofpbuf *odp_actions, struct 
>> ofpbuf *header,
>> +                                   odp_port_t tunnel_port, odp_port_t 
>> phy_port)
>
> Should be called “odp_put_tunnel_push_action” instead?
>
ok.

>> +{
>> +    size_t offset = nl_msg_start_nested(odp_actions, 
>> OVS_ACTION_ATTR_TUNNEL_PUSH);
>> +
>> +    nl_msg_put_odp_port(odp_actions, OVS_ACTION_TUNNEL_HEADER_PORT,
>> +                        tunnel_port);
>> +    nl_msg_put_unspec(odp_actions, OVS_ACTION_TUNNEL_HEADER_DATA,
>> +                      ofpbuf_data(header), ofpbuf_size(header));
>> +    nl_msg_put_odp_port(odp_actions, OVS_ACTION_TUNNEL_HEADER_PHY_PORT,
>> +                        phy_port);
>> +
>> +    nl_msg_end_nested(odp_actions, offset);
>> +}
>> +
>>
>> /* The commit_odp_actions() function and its helpers. */
>>
>> diff --git a/lib/odp-util.h b/lib/odp-util.h
>> index 11b54dd..64e7dc3 100644
>> --- a/lib/odp-util.h
>> +++ b/lib/odp-util.h
>> @@ -253,5 +253,9 @@ size_t odp_put_userspace_action(uint32_t pid,
>>                                 struct ofpbuf *odp_actions);
>> void odp_put_tunnel_action(const struct flow_tnl *tunnel,
>>                            struct ofpbuf *odp_actions);
>> +void odp_put_build_tunnel_header_action(struct ofpbuf *odp_actions,
>> +                                        struct ofpbuf *header,
>> +                                        odp_port_t tunnel_odp_port,
>> +                                        odp_port_t phy_port);
>>
>> #endif /* odp-util.h */
>> diff --git a/lib/ofpbuf.h b/lib/ofpbuf.h
>> index adaf526..72c254f 100644
>> --- a/lib/ofpbuf.h
>> +++ b/lib/ofpbuf.h
>> @@ -422,6 +422,14 @@ static inline void ofpbuf_set_size(struct ofpbuf *b, 
>> uint32_t v)
>> }
>> #endif
>>
>> +static inline void ofpbuf_reset_packet(struct ofpbuf *b, int off)
>> +{
>> +    ofpbuf_set_size(b, ofpbuf_size(b) - off);
>> +    ofpbuf_set_data(b, (void *) ((unsigned char *) b->frame + off));
>> +    b->frame = NULL;
>> +    b->l2_5_ofs = b->l3_ofs = b->l4_ofs = UINT16_MAX;
>> +}
>> +
>> #ifdef  __cplusplus
>> }
>> #endif
>> diff --git a/lib/packets.c b/lib/packets.c
>> index 65d8109..a0cfadc 100644
>> --- a/lib/packets.c
>> +++ b/lib/packets.c
>> @@ -956,3 +956,38 @@ packet_format_tcp_flags(struct ds *s, uint16_t 
>> tcp_flags)
>>         ds_put_cstr(s, "[800]");
>>     }
>> }
>> +
>> +#define ARP_PACKET_SIZE  (2 + ETH_HEADER_LEN + VLAN_HEADER_LEN + \
>> +                          ARP_ETH_HEADER_LEN)
>> +
>> +void
>> +compose_arp(struct ofpbuf *b, const uint8_t eth_src[ETH_ADDR_LEN],
>> +            ovs_be32 ip_src, ovs_be32 ip_dst)
>> +{
>> +    struct eth_header *eth;
>> +    struct arp_eth_header *arp;
>> +
>> +    ofpbuf_clear(b);
>> +    ofpbuf_prealloc_tailroom(b, ARP_PACKET_SIZE);
>> +    ofpbuf_reserve(b, 2 + VLAN_HEADER_LEN);
>> +
>> +    eth = ofpbuf_put_uninit(b, sizeof *eth);
>> +    memcpy(eth->eth_dst, eth_addr_broadcast, ETH_ADDR_LEN);
>> +    memcpy(eth->eth_src, eth_src, ETH_ADDR_LEN);
>> +    eth->eth_type = htons(ETH_TYPE_ARP);
>> +
>> +    arp = ofpbuf_put_uninit(b, sizeof *arp);
>> +    arp->ar_hrd = htons(ARP_HRD_ETHERNET);
>> +    arp->ar_pro = htons(ARP_PRO_IP);
>> +    arp->ar_hln = sizeof arp->ar_sha;
>> +    arp->ar_pln = sizeof arp->ar_spa;
>> +    arp->ar_op = htons(ARP_OP_REQUEST);
>> +    memcpy(arp->ar_sha, eth_src, ETH_ADDR_LEN);
>> +    memcpy(arp->ar_tha, eth_src, ETH_ADDR_LEN);
>
> I think the tha should be set to all-zeroes instead. RFC 826 says:
>
> “It does not set ar$tha to anything in particular, because it is this value 
> that it is trying to determine.”
>
ok.

>> +
>> +    put_16aligned_be32(&arp->ar_spa, ip_src);
>> +    put_16aligned_be32(&arp->ar_tpa, ip_dst);
>> +
>> +    ofpbuf_set_frame(b, eth);
>> +    ofpbuf_set_l3(b, arp);
>> +}
>> diff --git a/lib/packets.h b/lib/packets.h
>> index 26c6ff1..b24015e 100644
>> --- a/lib/packets.h
>> +++ b/lib/packets.h
>> @@ -486,6 +486,7 @@ void ip_format_masked(ovs_be32 ip, ovs_be32 mask, struct 
>> ds *);
>>         ((ip_frag_off) & htons(IP_MORE_FRAGMENTS | IP_FRAG_OFF_MASK))
>>
>> #define IP_HEADER_LEN 20
>> +OVS_PACKED(
>> struct ip_header {
>>     uint8_t ip_ihl_ver;
>>     uint8_t ip_tos;
>> @@ -497,7 +498,8 @@ struct ip_header {
>>     ovs_be16 ip_csum;
>>     ovs_16aligned_be32 ip_src;
>>     ovs_16aligned_be32 ip_dst;
>> -};
>> +});
>> +
>
> I’d rather not pack this, if possible.
>
ok.

>> BUILD_ASSERT_DECL(IP_HEADER_LEN == sizeof(struct ip_header));
>>
>> #define ICMP_HEADER_LEN 8
>> @@ -545,12 +547,13 @@ struct sctp_header {
>> BUILD_ASSERT_DECL(SCTP_HEADER_LEN == sizeof(struct sctp_header));
>>
>> #define UDP_HEADER_LEN 8
>> +OVS_PACKED(
>> struct udp_header {
>>     ovs_be16 udp_src;
>>     ovs_be16 udp_dst;
>>     ovs_be16 udp_len;
>>     ovs_be16 udp_csum;
>> -};
>> +});
>
> Same here.
>
>> BUILD_ASSERT_DECL(UDP_HEADER_LEN == sizeof(struct udp_header));
>>
>> #define TCP_FIN 0x001
>> @@ -735,5 +738,7 @@ void packet_set_sctp_port(struct ofpbuf *, ovs_be16 src, 
>> ovs_be16 dst);
>>
>> void packet_format_tcp_flags(struct ds *, uint16_t);
>> const char *packet_tcp_flag_to_string(uint32_t flag);
>> +void compose_arp(struct ofpbuf *b, const uint8_t eth_src[ETH_ADDR_LEN],
>> +                 ovs_be32 ip_src, ovs_be32 ip_dst);
>>
>> #endif /* packets.h */
>> diff --git a/lib/tnl-arp-cache.c b/lib/tnl-arp-cache.c
>> new file mode 100644
>> index 0000000..2327452
>> --- /dev/null
>> +++ b/lib/tnl-arp-cache.c
>> @@ -0,0 +1,214 @@
>> +/*
>> + * Copyright (c) 2014 Nicira, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#include <config.h>
>> +#include <inttypes.h>
>> +#include <stdlib.h>
>> +
>> +#include "bitmap.h"
>> +#include "coverage.h"
>> +#include "dpif-netdev.h"
>> +#include "dynamic-string.h"
>> +#include "errno.h"
>> +#include "flow.h"
>> +#include "hash.h"
>> +#include "hmap.h"
>> +#include "list.h"
>> +#include "netdev.h"
>> +#include "ovs-thread.h"
>> +#include "packets.h"
>> +#include "packet-dpif.h"
>> +#include "poll-loop.h"
>> +#include "timeval.h"
>> +#include "tnl-arp-cache.h"
>> +#include "unaligned.h"
>> +#include "unixctl.h"
>> +#include "util.h"
>> +#include "vlog.h"
>> +
>> +VLOG_DEFINE_THIS_MODULE(arp);
>> +
>> +#define ARP_ENTRY_DEFAULT_IDLE_TIME  30000
>> +
>> +struct tnl_arp_entry {
>> +    struct hmap_node hmap_node;
>
> IMO it would be better to use cmap, that way the lookups could be lockless.
>
ok, I replaced it with cmap.

>> +    ovs_be32 ip;
>> +    uint8_t mac[ETH_ADDR_LEN];
>> +    time_t expires;             /* Expiration time. */
>> +    char br_name[IFNAMSIZ];
>> +};
>> +
>> +static struct hmap table = HMAP_INITIALIZER(&table);
>> +static struct ovs_rwlock rwlock = OVS_RWLOCK_INITIALIZER;
>
> Would still need a mutex for excluding concurrent modifications.
>
ok.

>> +static uint32_t secret;
>> +
>> +static uint32_t
>> +tnl_arp_table_hash(ovs_be32 ip4)
>> +{
>> +    return hash_2words(ntohl(ip4), secret);
>> +}
>> +
>
> cmap is already internally rehashing the given hash, so it would be possible 
> to just give the dst address as the ‘hash’ to cmap functions.
>
ok.

>> +static struct tnl_arp_entry *
>> +__tnl_arp_lookup(const char br_name[IFNAMSIZ], ovs_be32 dst)
>> +{
>> +    struct tnl_arp_entry *arp;
>> +    uint32_t hash;
>> +
>> +    hash = tnl_arp_table_hash(dst);
>> +    HMAP_FOR_EACH_WITH_HASH (arp, hmap_node, hash, &table) {
>> +        if (arp->ip == dst && !strncmp(arp->br_name, br_name, IFNAMSIZ)) {
>> +            return arp;
>> +        }
>> +    }
>> +    return NULL;
>> +}
>> +
>> +int
>> +tnl_arp_lookup(const char br_name[IFNAMSIZ], ovs_be32 dst, uint8_t 
>> mac[ETH_ADDR_LEN])
>> +{
>> +    struct tnl_arp_entry *arp;
>> +    int res = -ENOENT;
>> +
>> +    ovs_rwlock_rdlock(&rwlock);
>> +    VLOG_ERR("pbs: %s port %s\n",__func__,br_name);
>
> Logging should be conditional or removed.
>
removed it.

>> +    arp = __tnl_arp_lookup(br_name, dst);
>> +    if (arp) {
>> +            memcpy(mac, arp->mac, ETH_ADDR_LEN);
>> +            res = 0;
>> +    }
>> +    ovs_rwlock_unlock(&rwlock);
>> +
>> +    return res;
>> +}
>> +
>> +static void
>> +tnl_arp_delete(struct tnl_arp_entry *arp)
>> +{
>> +    hmap_remove(&table, &arp->hmap_node);
>> +    free(arp);
>> +}
>> +
>> +int
>> +tnl_arp_snoop(const struct flow *flow, struct flow_wildcards *wc,
>> +          const char name[IFNAMSIZ])
>> +{
>> +    struct tnl_arp_entry *arp;
>> +    uint32_t hash;
>> +
>> +    if (flow->dl_type != htons(ETH_TYPE_ARP)) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    memset(&wc->masks.dl_type, 0xff, sizeof wc->masks.dl_type);
>> +    memset(&wc->masks.nw_proto, 0xff, sizeof wc->masks.nw_proto);
>> +    memset(&wc->masks.nw_src, 0xff, sizeof wc->masks.nw_src);
>> +    memset(&wc->masks.nw_dst, 0xff, sizeof wc->masks.nw_dst);
>> +    memset(&wc->masks.arp_sha, 0xff, sizeof wc->masks.arp_sha);
>> +    memset(&wc->masks.arp_tha, 0xff, sizeof wc->masks.arp_tha);
>> +
>> +    ovs_rwlock_wrlock(&rwlock);
>> +    arp = __tnl_arp_lookup(name, flow->nw_src);
>> +    if (arp) {
>> +        if (!memcmp(arp->mac, flow->arp_sha, ETH_ADDR_LEN)) {
>> +            arp->expires = time_now() + ARP_ENTRY_DEFAULT_IDLE_TIME;
>> +            ovs_rwlock_unlock(&rwlock);
>> +            return 0;
>> +        }
>> +        tnl_arp_delete(arp);
>> +    }
>> +
>> +    arp = xmalloc(sizeof *arp);
>> +
>> +    arp->ip = flow->nw_src;
>> +    memcpy(arp->mac, flow->arp_sha, ETH_ADDR_LEN);
>> +    arp->expires = time_now() + ARP_ENTRY_DEFAULT_IDLE_TIME;
>> +    strncpy(arp->br_name, name, IFNAMSIZ);
>> +    hash = tnl_arp_table_hash(arp->ip);
>> +    hmap_insert(&table, &arp->hmap_node, hash);
>> +    ovs_rwlock_unlock(&rwlock);
>> +    return 0;
>> +}
>> +
>> +void
>> +tnl_arp_cache_run(void)
>> +{
>> +    struct tnl_arp_entry *arp;
>> +
>> +    ovs_rwlock_wrlock(&rwlock);
>> +    HMAP_FOR_EACH(arp, hmap_node, &table) {
>> +        if (arp->expires <= time_now()) {
>> +             tnl_arp_delete(arp);
>> +        }
>> +    }
>> +    ovs_rwlock_unlock(&rwlock);
>> +}
>> +
>> +static void
>> +tnl_arp_cache_flush(struct unixctl_conn *conn OVS_UNUSED, int argc 
>> OVS_UNUSED,
>> +                const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
>> +{
>> +    struct tnl_arp_entry *arp, *temp;
>> +
>> +    ovs_rwlock_wrlock(&rwlock);
>> +    HMAP_FOR_EACH_SAFE(arp, temp, hmap_node, &table) {
>> +          tnl_arp_delete(arp);
>> +    }
>> +    ovs_rwlock_unlock(&rwlock);
>> +    unixctl_command_reply(conn, "OK");
>> +}
>> +
>> +#define MAX_IP_ADDR_LEN 16
>> +
>> +static void
>> +tnl_arp_cache_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
>> +               const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
>> +{
>> +    struct ds ds = DS_EMPTY_INITIALIZER;
>> +    struct tnl_arp_entry *arp;
>> +
>> +    ds_put_cstr(&ds, "IP               MAC                 Bridge\n");
>> +    ds_put_cstr(&ds, "=============================================\n");
>> +    ovs_rwlock_rdlock(&rwlock);
>> +    HMAP_FOR_EACH(arp, hmap_node, &table) {
>> +        int start_len, need_ws;
>> +
>> +        start_len = ds.length;
>> +        ds_put_format(&ds, IP_FMT, IP_ARGS(arp->ip));
>> +
>> +        need_ws = MAX_IP_ADDR_LEN - (ds.length - start_len);
>> +
>> +        while (need_ws >= 0) {
>> +            ds_put_format(&ds, " ");
>> +            need_ws--;
>> +        }
>> +
>> +        ds_put_format(&ds, ETH_ADDR_FMT"   %s\n",
>> +                      ETH_ADDR_ARGS(arp->mac), arp->br_name);
>> +
>> +    }
>> +    ovs_rwlock_unlock(&rwlock);
>> +    unixctl_command_reply(conn, ds_cstr(&ds));
>> +    ds_destroy(&ds);
>> +}
>> +
>> +void
>> +tnl_arp_cache_init(void)
>> +{
>> +    /* XXX: Add documentation for these commands. */
>> +    unixctl_command_register("tnl/arp/show", "", 0, 0, tnl_arp_cache_show, 
>> NULL);
>> +    unixctl_command_register("tnl/arp/flush", "", 0, 0, 
>> tnl_arp_cache_flush, NULL);
>> +    secret = random_uint32();
>> +}
>> diff --git a/lib/tnl-arp-cache.h b/lib/tnl-arp-cache.h
>> new file mode 100644
>> index 0000000..f68c3b4
>> --- /dev/null
>> +++ b/lib/tnl-arp-cache.h
>> @@ -0,0 +1,47 @@
>> +/*
>> + * Copyright (c) 2014 Nicira, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#ifndef OVS_TNL_ARP_CACHE_H
>> +#define OVS_TNL_ARP_CACHE_H 1
>> +
>> +#include <errno.h>
>> +#include <inttypes.h>
>> +#include <stddef.h>
>> +#include <stdint.h>
>> +#include <string.h>
>> +#include <net/if.h>
>> +#include <sys/socket.h>
>> +
>> +#include "flow.h"
>> +#include "netdev.h"
>> +#include "packets.h"
>> +#include "util.h"
>> +
>> +#ifdef  __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +int tnl_arp_snoop(const struct flow *flow, struct flow_wildcards *wc,
>> +                  const char dev_name[]);
>> +int tnl_arp_lookup(const char dev_name[], ovs_be32 dst, uint8_t 
>> mac[ETH_ADDR_LEN]);
>> +void tnl_arp_cache_init(void);
>> +void tnl_arp_cache_run(void);
>> +
>> +#ifdef  __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> diff --git a/lib/tnl-ports.c b/lib/tnl-ports.c
>> new file mode 100644
>> index 0000000..4dbda7d
>> --- /dev/null
>> +++ b/lib/tnl-ports.c
>> @@ -0,0 +1,203 @@
>> +/*
>> + * Copyright (c) 2014 Nicira, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#include <config.h>
>> +#include <stddef.h>
>> +#include <stdint.h>
>> +
>> +#include "bitmap.h"
>> +#include "classifier.h"
>> +#include "coverage.h"
>> +#include "dpif-netdev.h"
>> +#include "fat-rwlock.h"
>> +#include "dynamic-string.h"
>> +#include "errno.h"
>> +#include "flow.h"
>> +#include "hash.h"
>> +#include "list.h"
>> +#include "netdev.h"
>> +#include "ovs-thread.h"
>> +#include "odp-util.h"
>> +#include "packet-dpif.h"
>> +#include "packets.h"
>> +#include "poll-loop.h"
>> +#include "timeval.h"
>> +#include "tnl-arp-cache.h"
>> +#include "tnl-ports.h"
>> +#include "tnl-ports.h"
>> +#include "unaligned.h"
>> +#include "unixctl.h"
>> +#include "util.h"
>> +
>> +static struct classifier cls;
>> +static struct ovs_mutex tunnel_mutex = OVS_MUTEX_INITIALIZER;
>> +
>> +struct tunnel_ports {
>> +    struct cls_rule cr;
>> +    odp_port_t portno;
>> +    struct ovs_refcount ref_cnt;
>> +    char dev_name[IFNAMSIZ];
>> +};
>
> Why plural name for the struct?
>
fixed.

>> +
>> +static struct tunnel_ports *
>> +tnl_port_cast(const struct cls_rule *cr)
>> +{
>
> if (offsetof(struct tunnel_ports, cr) == 0) {
>   return (struct tunnel_ports *)cr;
> }
>
>> +    return cr ? CONTAINER_OF(cr, struct tunnel_ports, cr) : NULL;
>> +}
>> +
>> +void
>> +tnl_port_insert_udp(odp_port_t port, ovs_be32 ip_dst, ovs_be16 udp_port,
>> +                    const char dev_name[])
>> +{
>> +    const struct cls_rule *cr;
>> +    struct tunnel_ports *p;
>> +    struct flow_wildcards wc;
>> +    struct match match;
>> +    struct flow flow;
>> +
>> +    memset(&flow, 0, sizeof flow);
>> +
>> +    flow.dl_type = htons(ETH_TYPE_IP);
>> +    flow.nw_proto = IPPROTO_UDP;
>> +    flow.tp_dst = udp_port;
>> +    flow.nw_src = ip_dst;
>> +
>
> Please document the weirdness of setting flow.nw_src to ip_dst.
>
ok.

>> +    cr = classifier_lookup(&cls, &flow, NULL);
>> +    if (cr) {
>> +        struct tunnel_ports *p;
>> +
>> +        p = tnl_port_cast(cr);
>> +
>> +        ovs_refcount_ref(&p->ref_cnt);
>> +        return;
>> +    }
>
>
> You should loop with ovs_refcount_try_ref_rcu() instead, if it is possible 
> that a parallel thread is removing the same entry at the same time, like this:
>
ok.

>     do {
>         cr = classifier_lookup(&cls, &flow, NULL);
>
>         p = tnl_port_cast(cr);
>
>         /* Try again if the rule was released before we get the reference. */
>     } while (p && !ovs_refcount_try_ref_rcu(&p->ref_cnt));
>
>     if (p) {
>        return;
>     }
>
> Otherwise you could end up calling ovs_refcount_ref() when the count has 
> already reached zero, which will assert fail.
>
ok.

>> +
>> +    p = xzalloc(sizeof *p);
>> +    p->portno = port;
>> +
>> +    memset(&wc, 0, sizeof(wc));
>> +    wc.masks.dl_type = 0xffff;
>
> need htons() here.
>
>> +    wc.masks.nw_proto = 0xff;
>> +    wc.masks.nw_frag = 0xff;
>> +    wc.masks.tp_dst = 0xffff;
>
> and here.
>
>> +    wc.masks.nw_src = 0xffffffff;
>
> and htonl() here.
>
>> +
>> +    /* Need to check IP address. */
>> +    match_init(&match, &flow, &wc);
>
> It would be a bit more efficient to use 'match.flow' and 'match.wc' directly 
> instead of ‘flow' and ‘wc’, and then calling match_init().
>
>> +    cls_rule_init(&p->cr, &match, 1);
>> +    ovs_refcount_init(&p->ref_cnt);
>> +    strncpy(p->dev_name, dev_name, IFNAMSIZ);
>> +
>> +    ovs_mutex_lock(&tunnel_mutex);
>> +    classifier_insert(&cls, &p->cr);
>> +    ovs_mutex_unlock(&tunnel_mutex);
>
> Classifier has it’s own internal mutex that serializes mutations, so the 
> locking here is not necessary.
>
ok.

>> +}
>> +
>> +static void
>> +tnl_port_unref(struct cls_rule *cr)
>> +{
>> +    if (cr) {
>> +        struct tunnel_ports *p;
>> +
>> +        p = tnl_port_cast(cr);
>> +        if (ovs_refcount_unref_relaxed(&p->ref_cnt) == 1) {
>> +           classifier_remove(&cls, cr);
>> +        }
>> +    }
>> +}
>> +
>> +void
>> +tnl_port_delete(ovs_be32 ip_dst, ovs_be16 udp_port)
>> +{
>> +    struct cls_rule *cr;
>> +    struct flow flow;
>> +
>> +    memset(&flow, 0, sizeof flow);
>> +    flow.dl_type = htons(ETH_TYPE_IP);
>> +    flow.nw_proto = IPPROTO_UDP;
>> +    flow.tp_dst = udp_port;
>> +    flow.nw_src = ip_dst;
>> +
>> +    ovs_mutex_lock(&tunnel_mutex);
>> +    cr = classifier_lookup(&cls, &flow, NULL);
>> +    tnl_port_unref(cr);
>> +    ovs_mutex_unlock(&tunnel_mutex);
>
> The refcount already protects against multiple deletes, so the mutex 
> protection here is not necessary,
>
ok.

>> +}
>> +
>> +odp_port_t
>> +tnl_port_lookup(const struct flow *flow, struct flow_wildcards *wc)
>> +{
>> +    const struct cls_rule *cr;
>> +    const struct tunnel_ports *p;
>> +
>> +    /* Un wildcard protocol and stuff */
>> +    cr = classifier_lookup(&cls, flow, wc);
>> +    if (!cr) {
>> +        return ODPP_NONE;
>> +    }
>> +    p = tnl_port_cast(cr);
>> +    return p->portno;
>> +}
>> +
>> +static void
>> +tnl_port_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
>> +              const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
>> +{
>> +    struct ds ds = DS_EMPTY_INITIALIZER;
>> +    const struct tunnel_ports *p;
>> +
>> +    ovs_mutex_lock(&tunnel_mutex);
>
> Classifier internally prevents mutations during iteration, so the mutex here 
> is not necessary.
>
ok.

>> +    CLS_FOR_EACH(p, cr, &cls) {
>> +        struct odputil_keybuf keybuf;
>> +        struct odputil_keybuf maskbuf;
>> +        struct flow flow;
>> +        const struct nlattr *key, *mask;
>> +        size_t key_len, mask_len;
>> +        struct flow_wildcards wc;
>> +        struct ofpbuf buf;
>> +
>> +        ds_put_format(&ds, "%s: ", p->dev_name);
>> +        minimask_expand(&p->cr.match.mask, &wc);
>> +        miniflow_expand(&p->cr.match.flow, &flow);
>> +
>> +        /* Key. */
>> +        ofpbuf_use_stack(&buf, &keybuf, sizeof keybuf);
>> +        odp_flow_key_from_flow(&buf, &flow, &wc.masks,
>> +                               flow.in_port.odp_port, true);
>> +        key = ofpbuf_data(&buf);
>> +        key_len = ofpbuf_size(&buf);
>> +        /* mask*/
>> +        ofpbuf_use_stack(&buf, &maskbuf, sizeof maskbuf);
>> +        odp_flow_key_from_mask(&buf, &wc.masks, &flow,
>> +                               odp_to_u32(wc.masks.in_port.odp_port),
>> +                               SIZE_MAX, false);
>> +        mask = ofpbuf_data(&buf);
>> +        mask_len = ofpbuf_size(&buf);
>> +
>> +        /* build string. */
>> +        odp_flow_format(key, key_len, mask, mask_len, NULL, &ds, false);
>> +    }
>> +    ovs_mutex_unlock(&tunnel_mutex);
>> +    unixctl_command_reply(conn, ds_cstr(&ds));
>> +    ds_destroy(&ds);
>> +}
>> +
>> +void
>> +tnl_port_init(void)
>> +{
>> +    classifier_init(&cls, flow_segment_u32s);
>> +    unixctl_command_register("tnl/ports/show", "", 0, 0, tnl_port_show, 
>> NULL);
>> +}
>> diff --git a/lib/tnl-ports.h b/lib/tnl-ports.h
>> new file mode 100644
>> index 0000000..73355e3
>> --- /dev/null
>> +++ b/lib/tnl-ports.h
>> @@ -0,0 +1,49 @@
>> +/*
>> + * Copyright (c) 2014 Nicira, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#ifndef OVS_TNL_PORT_H
>> +#define OVS_TNL_PORT_H 1
>> +
>> +#include <errno.h>
>> +#include <stddef.h>
>> +#include <stdint.h>
>> +#include <inttypes.h>
>> +#include <net/if.h>
>> +#include <sys/socket.h>
>> +
>> +#include "flow.h"
>> +#include "packets.h"
>> +#include "util.h"
>> +
>> +#ifdef  __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +odp_port_t tnl_port_lookup(const struct flow *flow,
>> +                           struct flow_wildcards *wc);
>> +
>> +void tnl_port_insert_udp(odp_port_t port, ovs_be32 ip_dst, ovs_be16 
>> udp_port,
>> +                         const char dev_name[]);
>> +
>> +void tnl_port_delete(ovs_be32 ip_dst OVS_UNUSED, ovs_be16 udp_port 
>> OVS_UNUSED);
>> +
>> +void tnl_port_init(void);
>> +
>> +#ifdef  __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> diff --git a/lib/tnl-router.c b/lib/tnl-router.c
>> new file mode 100644
>> index 0000000..4397509
>> --- /dev/null
>> +++ b/lib/tnl-router.c
>> @@ -0,0 +1,237 @@
>> +/*
>> + * Copyright (c) 2014 Nicira, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#include <config.h>
>> +#include <arpa/inet.h>
>> +#include <errno.h>
>> +#include <inttypes.h>
>> +#include <sys/socket.h>
>> +#include <net/if.h>
>> +#include <netinet/in.h>
>> +#include <stdarg.h>
>> +#include <stdlib.h>
>> +#include <string.h>
>> +#include <unistd.h>
>> +
>> +#include "classifier.h"
>> +#include "command-line.h"
>> +#include "compiler.h"
>> +#include "dirs.h"
>> +#include "dpif.h"
>> +#include "dynamic-string.h"
>> +#include "fat-rwlock.h"
>> +#include "flow.h"
>> +#include "match.h"
>> +#include "netdev.h"
>> +#include "netlink.h"
>> +#include "odp-util.h"
>> +#include "ofp-parse.h”
>
> Is this (and may other) includes necessary? Seems like a long list of 
> includes for a simple source file!
>
ok, fixed.

>> +#include "ofpbuf.h"
>> +#include "ovs-thread.h"
>> +#include "packets.h"
>> +#include "shash.h"
>> +#include "simap.h"
>> +#include "smap.h"
>> +#include "sset.h"
>> +#include "timeval.h"
>> +#include "tnl-router.h"
>> +#include "tnl-ports.h"
>> +#include "unixctl.h"
>> +#include "util.h"
>> +#include "list.h"
>> +#include "util.h"
>> +
>> +static struct classifier cls;
>> +static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
>> +static int version, c_ver;
>> +
>> +struct tnl_route_entry {
>> +    struct cls_rule cr;
>> +    char output_bridge[IFNAMSIZ];
>> +    ovs_be32 gw;
>> +    ovs_be32 nw_addr;
>> +    ovs_be32 mask;
>> +};
>> +
>> +bool
>> +tnl_route_reconfigured(void)
>> +{
>> +    if (version == c_ver) {
>> +        return false;
>> +    }
>> +    /* version is not commited, send refconfigure event. */
>> +    c_ver++;
>> +    return true;
>> +}
>> +
>> +static struct tnl_route_entry *
>> +tnl_route_entry_cast(const struct cls_rule *cr)
>> +{
>> +    return cr ? CONTAINER_OF(cr, struct tnl_route_entry, cr) : NULL;
>> +}
>> +
>> +int
>> +tnl_route_lookup(ovs_be32 ip_dst, char output_bridge[], ovs_be32 *gw)
>> +{
>> +    const struct cls_rule *cr;
>> +    struct flow flow;
>> +
>> +    memset(&flow, 0, sizeof(flow));
>> +    flow.nw_dst = ip_dst;
>> +
>> +    cr = classifier_lookup(&cls, &flow, NULL);
>> +    if (cr) {
>> +        struct tnl_route_entry *p;
>> +
>> +        p = tnl_route_entry_cast(cr);
>> +        strncpy(output_bridge, p->output_bridge, IFNAMSIZ);
>> +        *gw = p->gw;
>> +        return 0;
>> +    }
>> +    return -ENOENT;
>> +}
>> +
>> +bool
>> +tnl_route_check_dev(const char output_bridge[])
>> +{
>> +    struct tnl_route_entry *rt;
>> +    bool res = false;
>> +
>> +    ovs_mutex_lock(&mutex);
>> +
>
> Locking here is not necessary.
>
ok, you patch is fixing it.
>> +    CLS_FOR_EACH(rt, cr, &cls) {
>> +        if (!strncmp(output_bridge, rt->output_bridge, IFNAMSIZ)) {
>> +            res = true;
>> +        }
>> +    }
>> +    ovs_mutex_unlock(&mutex);
>> +    return res;
>> +}
>> +
>> +static void
>> +rt_entry_insert(ovs_be32 ip_dst, ovs_be32 mask,
>> +                const char output_bridge[], ovs_be32 gw)
>> +{
>> +    const struct cls_rule *cr;
>> +    struct tnl_route_entry *p;
>> +    struct flow_wildcards wc;
>> +    struct flow flow;
>> +    struct match match;
>> +
>> +    memset(&flow, 0, sizeof(flow));
>> +    flow.nw_dst = ip_dst;
>> +
>> +    cr = classifier_lookup(&cls, &flow, NULL);
>> +    if (cr) {
>> +        return;
>> +    }
>> +
>> +    p = xzalloc(sizeof *p);
>> +    strncpy(p->output_bridge, output_bridge, IFNAMSIZ);
>> +    p->gw = gw;
>> +    p->nw_addr = ip_dst;
>> +    p->mask = mask;
>> +
>> +    memset(&wc, 0, sizeof(wc));
>> +    wc.masks.nw_dst = mask;
>> +    /* Need to check IP address. */
>> +    match_init(&match, &flow, &wc);
>> +    cls_rule_init(&p->cr, &match, 1);
>> +
>> +    ovs_mutex_lock(&mutex);
>> +    classifier_insert(&cls, &p->cr);
>> +    version++;
>> +    ovs_mutex_unlock(&mutex);
>
> if version was an atomic variable, then the locking would not be necessary.
>
I removed this part from the patch.

>> +}
>> +
>> +static void
>> +tnl_route_add(struct unixctl_conn *conn, int argc,
>> +              const char *argv[], void *aux OVS_UNUSED)
>> +{
>> +    ovs_be32 ip, mask, gw;
>> +
>> +    inet_aton(argv[1], (struct in_addr *)&ip);
>> +    inet_aton(argv[2], (struct in_addr *)&mask);
>> +
>> +    if (argc == 5) {
>> +        inet_aton(argv[4], (struct in_addr *)&gw);
>> +    } else {
>> +        gw = 0;
>> +    }
>> +
>> +    rt_entry_insert(ip, mask, argv[3], gw);
>> +    unixctl_command_reply(conn, "OK");
>> +}
>> +
>> +static void
>> +tnl_route_del(struct unixctl_conn *conn, int argc OVS_UNUSED,
>> +              const char *argv[], void *aux OVS_UNUSED)
>> +{
>> +    struct cls_rule *cr;
>> +    struct flow flow;
>> +    ovs_be32 ip, mask;
>> +
>> +    inet_aton(argv[1], (struct in_addr *)&ip);
>> +    inet_aton(argv[2], (struct in_addr *)&mask);
>> +
>> +    ovs_mutex_lock(&mutex);
>> +    memset(&flow, 0, sizeof(flow));
>> +    flow.nw_dst = ip & mask;
>> +
>> +    cr = classifier_lookup(&cls, &flow, NULL);
>> +    if (cr) {
>> +        classifier_remove(&cls, cr);
>> +        version++;
>> +        unixctl_command_reply(conn, "OK");
>> +    } else {
>> +        unixctl_command_reply(conn, "Not found");
>> +    }
>> +
>> +    ovs_mutex_unlock(&mutex);
>
> This is a bit of a bummer, but locking here is necessary to make sure that 
> the ‘cr’ still exists in the classifier when classifier_remove is called. We 
> should add a new function to take care of this (like 
> classifier_remove_flow(&cls, &flow, priority) finds the exact matching rule 
> and removes it if found.)
>
ok.

>
>> +}
>> +
>> +static void
>> +tnl_route_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
>> +               const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
>> +{
>> +    struct tnl_route_entry *rt;
>> +    struct ds ds = DS_EMPTY_INITIALIZER;
>> +
>> +    ovs_mutex_lock(&mutex);
>> +
>
> Locking is implicitly done for the duration of the iteration.
>
ok.

>> +    CLS_FOR_EACH(rt, cr, &cls) {
>> +        ds_put_format(&ds, "IP "IP_FMT" Mask "IP_FMT" dev %s",
>> +                      IP_ARGS(rt->nw_addr), IP_ARGS(rt->mask), 
>> rt->output_bridge);
>> +        if (rt->gw) {
>> +         ds_put_format(&ds, " GW "IP_FMT, IP_ARGS(rt->gw));
>> +        }
>> +     ds_put_format(&ds, "\n");
>> +    }
>> +    unixctl_command_reply(conn, ds_cstr(&ds));
>> +    ds_destroy(&ds);
>> +
>> +    ovs_mutex_unlock(&mutex);
>> +}
>> +
>> +void
>> +tnl_route_unixctl_register(void)
>> +{
>> +    /* XXX: Add documentation for these commands. */
>> +    unixctl_command_register("tnl/route/add", "ip mask dev gw", 3, 4, 
>> tnl_route_add, NULL);
>> +    unixctl_command_register("tnl/route/show", "", 0, 0, tnl_route_show, 
>> NULL);
>> +    unixctl_command_register("tnl/route/del", "ip mask", 2, 2, 
>> tnl_route_del, NULL);
>> +    classifier_init(&cls, NULL);
>
> Just to be on the safe side, I’d init classifier first, then register the 
> commands.
>
>> +}
>> diff --git a/lib/tnl-router.h b/lib/tnl-router.h
>> new file mode 100644
>> index 0000000..f780121
>> --- /dev/null
>> +++ b/lib/tnl-router.h
>> @@ -0,0 +1,42 @@
>> +/*
>> + * Copyright (c) 2014 Nicira, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#ifndef OVS_TNL_ROUTER_H
>> +#define OVS_TNL_ROUTER_H 1
>> +
>> +#include <stddef.h>
>> +#include <stdint.h>
>> +#include <net/if.h>
>> +
>> +#include "packets.h"
>> +#include "timeval.h"
>> +#include "unixctl.h"
>> +#include "util.h"
>> +
>> +#ifdef  __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +int tnl_route_lookup(ovs_be32 ip_dst, char out_dev[], ovs_be32 *gw);
>> +void tnl_route_unixctl_register(void);
>> +bool tnl_route_reconfigured(void);
>> +bool tnl_route_check_dev(const char out_dev[]);
>> +
>> +#ifdef  __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> diff --git a/lib/vlog.c b/lib/vlog.c
>> index 42e4869..69b40b1 100644
>> --- a/lib/vlog.c
>> +++ b/lib/vlog.c
>> @@ -935,6 +935,7 @@ vlog(const struct vlog_module *module, enum vlog_level 
>> level,
>>     va_start(args, message);
>>     vlog_valist(module, level, message, args);
>>     va_end(args);
>> +    fflush(NULL);
>
> Have you considered the cost of this?
>

this was for debugging, I removed it.

>> }
>>
>> /* Logs 'message' to 'module' at maximum verbosity, then exits with a failure
>> diff --git a/lib/vxlan.h b/lib/vxlan.h
>> new file mode 100644
>> index 0000000..0ae5a84
>> --- /dev/null
>> +++ b/lib/vxlan.h
>> @@ -0,0 +1,46 @@
>> +/*
>> + * Copyright (c) 2014 Nicira, Inc.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at:
>> + *
>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#ifndef NETDEV_VXLAN_H
>> +#define NETDEV_VXLAN_H 1
>> +
>> +#include <stddef.h>
>> +#include <stdint.h>
>> +#include "list.h"
>> +#include "packets.h"
>> +#include "util.h"
>> +#include "netdev-dpdk.h"
>> +
>> +#ifdef  __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +#define VXLAN_HLEN (sizeof(struct udphdr) + sizeof(struct vxlanhdr))
>> +
>> +#define VXLAN_FLAGS 0x08000000  /* struct vxlanhdr.vx_flags required value. 
>> */
>> +
>> +/* VXLAN protocol header */
>> +OVS_PACKED(
>> +struct vxlanhdr {
>> +        ovs_be32 vx_flags;
>> +        ovs_be32 vx_vni;
>> +});
>
> Does this really need to be packed?
>
I got compiler warnings, I will try to remove them.

>> +
>> +#ifdef  __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> index 880824d..9d41c30 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -17,7 +17,12 @@
>> #include "ofproto/ofproto-dpif-xlate.h"
>>
>> #include <errno.h>
>> +#include <arpa/inet.h>
>> +#include <net/if.h>
>> +#include <sys/socket.h>
>> +#include <netinet/in.h>
>>
>> +#include "tnl-arp-cache.h"
>> #include "bfd.h"
>> #include "bitmap.h"
>> #include "bond.h"
>> @@ -48,7 +53,9 @@
>> #include "ofproto/ofproto-dpif.h"
>> #include "ofproto/ofproto-provider.h"
>> #include "packet-dpif.h"
>> +#include "tnl-router.h"
>> #include "tunnel.h"
>> +#include "tnl-ports.h"
>> #include "vlog.h"
>>
>> COVERAGE_DEFINE(xlate_actions);
>> @@ -112,6 +119,11 @@ struct xbridge {
>>     /* True if the datapath supports masked data in OVS_ACTION_ATTR_SET
>>      * actions. */
>>     bool masked_set_action;
>> +
>> +    /* True if tunnel route configure on this bridge. */\
>
> “is configured”
>
ok.

>> +    bool has_tunnel_port;
>> +
>> +    bool enable_userspace_tunneling;
>> };
>>
>> struct xbundle {
>> @@ -374,7 +386,8 @@ static void xlate_xbridge_set(struct xbridge *, struct 
>> dpif *,
>>                               bool enable_recirc,
>>                               bool variable_length_userdata,
>>                               size_t max_mpls_depth,
>> -                              bool masked_set_action);
>> +                              bool masked_set_action,
>> +                              bool enable_userspace_tunneling);
>> static void xlate_xbundle_set(struct xbundle *xbundle,
>>                               enum port_vlan_mode vlan_mode, int vlan,
>>                               unsigned long *trunks, bool use_priority_tags,
>> @@ -440,7 +453,8 @@ xlate_xbridge_set(struct xbridge *xbridge,
>>                   bool enable_recirc,
>>                   bool variable_length_userdata,
>>                   size_t max_mpls_depth,
>> -                  bool masked_set_action)
>> +                  bool masked_set_action,
>> +                  bool enable_userspace_tunneling)
>> {
>>     if (xbridge->ml != ml) {
>>         mac_learning_unref(xbridge->ml);
>> @@ -492,6 +506,9 @@ xlate_xbridge_set(struct xbridge *xbridge,
>>     xbridge->variable_length_userdata = variable_length_userdata;
>>     xbridge->max_mpls_depth = max_mpls_depth;
>>     xbridge->masked_set_action = masked_set_action;
>> +    xbridge->has_tunnel_port = enable_userspace_tunneling &&
>> +                               tnl_route_check_dev(xbridge->name);
>> +    xbridge->enable_userspace_tunneling = enable_userspace_tunneling;
>> }
>>
>> static void
>> @@ -574,7 +591,8 @@ xlate_xbridge_copy(struct xbridge *xbridge)
>>                       xbridge->frag, xbridge->forward_bpdu,
>>                       xbridge->has_in_band, xbridge->enable_recirc,
>>                       xbridge->variable_length_userdata,
>> -                      xbridge->max_mpls_depth, xbridge->masked_set_action);
>> +                      xbridge->max_mpls_depth, xbridge->masked_set_action,
>> +                      xbridge->enable_userspace_tunneling);
>>     LIST_FOR_EACH (xbundle, list_node, &xbridge->xbundles) {
>>         xlate_xbundle_copy(new_xbridge, xbundle);
>>     }
>> @@ -728,7 +746,7 @@ xlate_ofproto_set(struct ofproto_dpif *ofproto, const 
>> char *name,
>>                   const struct netflow *netflow, enum ofp_config_flags frag,
>>                   bool forward_bpdu, bool has_in_band, bool enable_recirc,
>>                   bool variable_length_userdata, size_t max_mpls_depth,
>> -                  bool masked_set_action)
>> +                  bool masked_set_action, bool enable_userspace_tunneling)
>> {
>>     struct xbridge *xbridge;
>>
>> @@ -749,7 +767,7 @@ xlate_ofproto_set(struct ofproto_dpif *ofproto, const 
>> char *name,
>>                       rstp, ms, mbridge, sflow, ipfix, netflow, frag,
>>                       forward_bpdu, has_in_band, enable_recirc,
>>                       variable_length_userdata, max_mpls_depth,
>> -                      masked_set_action);
>> +                      masked_set_action, enable_userspace_tunneling);
>> }
>>
>> static void
>> @@ -2133,6 +2151,10 @@ xlate_normal(struct xlate_ctx *ctx)
>>         return;
>>     }
>>
>> +    if (ctx->xbridge->has_tunnel_port) {
>> +        tnl_arp_snoop(flow, wc, ctx->xbridge->name);
>> +    }
>> +
>>     /* Learn source MAC. */
>>     if (ctx->xin->may_learn) {
>>         update_learning_table(ctx->xbridge, flow, wc, vlan, in_xbundle);
>> @@ -2469,6 +2491,120 @@ process_special(struct xlate_ctx *ctx, const struct 
>> flow *flow,
>>     }
>> }
>>
>> +static int
>> +tnl_route_lookup_flow(const struct flow *oflow,
>> +                      ovs_be32 *ip, struct xport **out_port)
>> +{
>> +    char out_dev[IFNAMSIZ];
>> +    struct xbridge *xbridge;
>> +    struct xlate_cfg *xcfg;
>> +    ovs_be32 gw;
>> +    int res;
>> +
>> +    res = tnl_route_lookup(oflow->tunnel.ip_dst, out_dev, &gw);
>> +    if (res) {
>> +        return res;
>> +    }
>> +
>> +    if (gw) {
>> +        *ip = gw;
>> +    } else {
>> +        *ip = oflow->tunnel.ip_dst;
>> +    }
>> +
>> +    xcfg = ovsrcu_get(struct xlate_cfg *, &xcfgp);
>> +    ovs_assert(xcfg);
>> +
>> +    HMAP_FOR_EACH (xbridge, hmap_node, &xcfg->xbridges) {
>> +        if (!strncmp(xbridge->name, out_dev, IFNAMSIZ)) {
>> +            struct xport *port;
>> +
>> +            HMAP_FOR_EACH (port, ofp_node, &xbridge->xports) {
>> +                if (!strncmp(netdev_get_name(port->netdev), out_dev, 
>> IFNAMSIZ)) {
>> +                    *out_port = port;
>> +                    return 0;
>> +                }
>> +            }
>> +        }
>> +    }
>
> Could the tnl_route_lookup return the ‘port’ directly? Would be much cheaper.
>
it does not know port number, Once we move odp port number info to
netdev layer, we can use it here. But I think it can be done later.

>> +    return -ENOENT;
>> +}
>> +
>> +static int
>> +xlate_recv_packet(struct xbridge *xbridge, struct ofpbuf *packet)
>
> Should this be called slate_flood_packet() instead? Should we make 
> xlate_send_packet() support flooding instead? E.g., NULL port -> flood?
>
ok.

>> +{
>> +    struct ofpact_output output;
>> +    struct flow flow;
>> +
>> +    ofpact_init(&output.ofpact, OFPACT_OUTPUT, sizeof output);
>> +    /* Use OFPP_NONE as the in_port to avoid special packet processing. */
>> +    flow_extract(packet, NULL, &flow);
>> +    flow.in_port.ofp_port = OFPP_NONE;
>> +    output.port = OFPP_NORMAL;
>
> Could this be OFPP_FLOOD instead?
>
ok.

>> +    output.max_len = 0;
>> +
>> +    return ofproto_dpif_execute_actions(xbridge->ofproto, &flow, NULL,
>> +                                        &output.ofpact, sizeof output,
>> +                                        packet);
>> +}
>> +
>> +
>> +static void
>> +tnl_arp_request(const struct xport *out_dev, const uint8_t 
>> eth_src[ETH_ADDR_LEN],
>> +            ovs_be32 ip_src, ovs_be32 ip_dst)
>> +{
>> +    struct xbridge *xbridge = out_dev->xbridge;
>> +    struct ofpbuf packet;
>> +
>> +    ofpbuf_init(&packet, 0);
>> +    compose_arp(&packet, eth_src, ip_src, ip_dst);
>> +
>> +    xlate_recv_packet(xbridge, &packet);
>> +    ofpbuf_uninit(&packet);
>> +}
>> +
>> +static int
>> +build_tunnel_send(const struct xlate_ctx *ctx, const struct xport *xport,
>> +                  const struct flow *flow, odp_port_t tunnel_odp_port)
>> +{
>> +    struct xport *out_dev = NULL;
>> +    struct ofpbuf tnl_header;
>> +    char header_stub[256];
>> +    ovs_be32 s_ip, d_ip = 0;
>> +    uint8_t smac[ETH_ADDR_LEN];
>> +    bool send_arp;
>> +    int res;
>> +
>> +    res = tnl_route_lookup_flow(flow, &d_ip, &out_dev);
>> +    if (res) {
>> +        return res;
>> +    }
>> +
>> +    /* Use mac addr of bridge port of the peer. */
>> +    res = netdev_get_etheraddr(out_dev->netdev, smac);
>> +    if (res) {
>> +        return res;
>> +    }
>> +    res = netdev_get_in4(out_dev->netdev, (struct in_addr *) &s_ip, NULL);
>> +    if (res) {
>> +        return res;
>> +    }
>> +
>> +    ofpbuf_use_stub(&tnl_header, header_stub, sizeof header_stub);
>> +    res = tnl_port_build_header(xport->ofport, d_ip, flow,
>> +                                out_dev->xbridge->name, smac, s_ip,
>> +                                &tnl_header, &send_arp);
>> +    if (res) {
>> +        if (send_arp) {
>> +            tnl_arp_request(out_dev, smac, s_ip, d_ip);
>> +        }
>> +        return res;
>> +    }
>> +    odp_put_build_tunnel_header_action(ctx->xout->odp_actions, &tnl_header,
>> +                                       tunnel_odp_port, out_dev->odp_port);
>> +    return 0;
>> +}
>> +
>> static void
>> compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
>>                         bool check_stp)
>> @@ -2613,9 +2749,15 @@ compose_output_action__(struct xlate_ctx *ctx, 
>> ofp_port_t ofp_port,
>>             entry->u.dev.tx = netdev_ref(xport->netdev);
>>         }
>>         out_port = odp_port;
>> -        commit_odp_tunnel_action(flow, &ctx->base_flow,
>> -                                 ctx->xout->odp_actions);
>> -        flow->tunnel = flow_tnl; /* Restore tunnel metadata */
>> +        if (ctx->xbridge->enable_userspace_tunneling) {
>> +            build_tunnel_send(ctx, xport, flow, odp_port);
>> +            goto out;
>> +        } else {
>> +            commit_odp_tunnel_action(flow, &ctx->base_flow,
>> +                                     ctx->xout->odp_actions);
>> +            flow->tunnel = flow_tnl; /* Restore tunnel metadata */
>> +        }
>> +
>>     } else {
>>         odp_port = xport->odp_port;
>>         out_port = odp_port;
>> @@ -2635,7 +2777,7 @@ compose_output_action__(struct xlate_ctx *ctx, 
>> ofp_port_t ofp_port,
>>     if (out_port != ODPP_NONE) {
>>         ctx->xout->slow |= commit_odp_actions(flow, &ctx->base_flow,
>>                                               ctx->xout->odp_actions,
>> -                                              &ctx->xout->wc,
>> +                                              wc,
>>                                               
>> ctx->xbridge->masked_set_action);
>>
>>         if (ctx->use_recirc) {
>> @@ -2653,6 +2795,18 @@ compose_output_action__(struct xlate_ctx *ctx, 
>> ofp_port_t ofp_port,
>>             nl_msg_put_u32(ctx->xout->odp_actions, OVS_ACTION_ATTR_RECIRC,
>>                            xr->recirc_id);
>>         } else {
>> +            /* XXX: Write better Filter for tunnel port. We can use inport
>> +             * int tunnel-port flow to avoid these checks completely. */
>> +            if (ofp_port == OFPP_LOCAL && ctx->xbridge->has_tunnel_port) {
>> +                odp_port_t odp_tnl_port;
>> +
>> +                odp_tnl_port = tnl_port_lookup(flow, wc);
>> +                if (odp_tnl_port != ODPP_NONE) {
>> +                    nl_msg_put_odp_port(ctx->xout->odp_actions, 
>> OVS_ACTION_ATTR_TUNNEL_POP,
>> +                                        odp_tnl_port);
>> +                    goto out;
>> +                }
>> +            }
>>             add_ipfix_output_action(ctx, out_port);
>>             nl_msg_put_odp_port(ctx->xout->odp_actions, 
>> OVS_ACTION_ATTR_OUTPUT,
>>                                 out_port);
>> @@ -3013,9 +3167,9 @@ execute_controller_action(struct xlate_ctx *ctx, int 
>> len,
>>                           enum ofp_packet_in_reason reason,
>>                           uint16_t controller_id)
>> {
>> +    struct pkt_metadata md = PKT_METADATA_INITIALIZER(0);
>>     struct ofproto_packet_in *pin;
>>     struct dpif_packet *packet;
>> -    struct pkt_metadata md = PKT_METADATA_INITIALIZER(0);
>>
>>     ctx->xout->slow |= SLOW_CONTROLLER;
>>     if (!ctx->xin->packet) {
>> @@ -3023,13 +3177,13 @@ execute_controller_action(struct xlate_ctx *ctx, int 
>> len,
>>     }
>>
>>     packet = dpif_packet_clone_from_ofpbuf(ctx->xin->packet);
>> +    packet->md = md;
>
> dpif_packet_clone_from_ofpbuf() already zero-initializes the md so it need 
> not be set again.
>
>>
>>     ctx->xout->slow |= commit_odp_actions(&ctx->xin->flow, &ctx->base_flow,
>>                                           ctx->xout->odp_actions,
>>                                           &ctx->xout->wc,
>>                                           ctx->xbridge->masked_set_action);
>>
>> -    packet->md = md;
>>     odp_execute_actions(NULL, &packet, 1, false,
>>                         ofpbuf_data(ctx->xout->odp_actions),
>>                         ofpbuf_size(ctx->xout->odp_actions), NULL);
>> diff --git a/ofproto/ofproto-dpif-xlate.h b/ofproto/ofproto-dpif-xlate.h
>> index 5ef20b1..6cf80ad 100644
>> --- a/ofproto/ofproto-dpif-xlate.h
>> +++ b/ofproto/ofproto-dpif-xlate.h
>> @@ -154,7 +154,8 @@ void xlate_ofproto_set(struct ofproto_dpif *, const char 
>> *name,
>>                        bool has_in_band, bool enable_recirc,
>>                        bool variable_length_userdata,
>>                        size_t mpls_label_stack_length,
>> -                       bool masked_set_action);
>> +                       bool masked_set_action,
>> +                       bool enable_userspace_tunneling);
>> void xlate_remove_ofproto(struct ofproto_dpif *);
>>
>> void xlate_bundle_set(struct ofproto_dpif *, struct ofbundle *,
>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>> index d965d38..f22ac9c 100644
>> --- a/ofproto/ofproto-dpif.c
>> +++ b/ofproto/ofproto-dpif.c
>> @@ -59,6 +59,7 @@
>> #include "ofproto-dpif-upcall.h"
>> #include "ofproto-dpif-xlate.h"
>> #include "poll-loop.h"
>> +#include "tnl-router.h"
>> #include "seq.h"
>> #include "simap.h"
>> #include "smap.h"
>> @@ -280,6 +281,9 @@ struct dpif_backer {
>>     /* Maximum number of MPLS label stack entries that the datapath supports
>>      * in a match */
>>     size_t max_mpls_depth;
>> +
>> +    /* True if the datapath supports tnl_push and pop actions. */
>> +    bool enable_userspace_tunneling;
>> };
>>
>> /* All existing ofproto_backer instances, indexed by ofproto->up.type. */
>> @@ -581,7 +585,7 @@ type_run(const char *type)
>>
>>                 iter->odp_port = node ? u32_to_odp(node->data) : ODPP_NONE;
>>                 if (tnl_port_reconfigure(iter, iter->up.netdev,
>> -                                         iter->odp_port)) {
>> +                                         iter->odp_port, dp_port)) {
>>                     backer->need_revalidate = REV_RECONFIGURE;
>>                 }
>>             }
>> @@ -624,7 +628,8 @@ type_run(const char *type)
>>                               ofproto->backer->enable_recirc,
>>                               ofproto->backer->variable_length_userdata,
>>                               ofproto->backer->max_mpls_depth,
>> -                              ofproto->backer->masked_set_action);
>> +                              ofproto->backer->masked_set_action,
>> +                              ofproto->backer->enable_userspace_tunneling);
>>
>>             HMAP_FOR_EACH (bundle, hmap_node, &ofproto->bundles) {
>>                 xlate_bundle_set(ofproto, bundle, bundle->name,
>> @@ -949,6 +954,9 @@ open_dpif_backer(const char *type, struct dpif_backer 
>> **backerp)
>>     backer->masked_set_action = check_masked_set_action(backer);
>>     backer->rid_pool = recirc_id_pool_create();
>>
>> +    backer->enable_userspace_tunneling = dpif_is_userspace_tunneling_on() &&
>> +                                         
>> dpif_support_userspace_tunneling(backer->dpif);
>> +
>>     error = dpif_recv_set(backer->dpif, backer->recv_set_enable);
>>     if (error) {
>>         VLOG_ERR("failed to listen on datapath of type %s: %s",
>> @@ -1225,6 +1233,7 @@ construct(struct ofproto *ofproto_)
>>     guarded_list_init(&ofproto->pins);
>>
>>     ofproto_dpif_unixctl_init();
>> +    tnl_route_unixctl_register();
>>
>>     hmap_init(&ofproto->vlandev_map);
>>     hmap_init(&ofproto->realdev_vid_map);
>> @@ -1517,6 +1526,9 @@ run(struct ofproto *ofproto_)
>>         }
>>     }
>>
>> +    if (tnl_route_reconfigured()) {
>> +        ofproto->backer->need_revalidate = REV_MCAST_SNOOPING;
>> +    }
>>     return 0;
>> }
>>
>> @@ -1668,7 +1680,7 @@ port_construct(struct ofport *port_)
>>     port->odp_port = dpif_port.port_no;
>>
>>     if (netdev_get_tunnel_config(netdev)) {
>> -        tnl_port_add(port, port->up.netdev, port->odp_port);
>> +        tnl_port_add(port, port->up.netdev, port->odp_port, namebuf);
>>         port->is_tunnel = true;
>>         if (ofproto->ipfix) {
>>            dpif_ipfix_add_tunnel_port(ofproto->ipfix, port_, port->odp_port);
>> @@ -1759,24 +1771,28 @@ static void
>> port_modified(struct ofport *port_)
>> {
>>     struct ofport_dpif *port = ofport_dpif_cast(port_);
>> +    char namebuf[NETDEV_VPORT_NAME_BUFSIZE];
>> +    struct netdev *netdev = port->up.netdev;
>>
>>     if (port->bundle && port->bundle->bond) {
>> -        bond_slave_set_netdev(port->bundle->bond, port, port->up.netdev);
>> +        bond_slave_set_netdev(port->bundle->bond, port, netdev);
>>     }
>>
>>     if (port->cfm) {
>> -        cfm_set_netdev(port->cfm, port->up.netdev);
>> +        cfm_set_netdev(port->cfm, netdev);
>>     }
>>
>>     if (port->bfd) {
>> -        bfd_set_netdev(port->bfd, port->up.netdev);
>> +        bfd_set_netdev(port->bfd, netdev);
>>     }
>>
>>     ofproto_dpif_monitor_port_update(port, port->bfd, port->cfm,
>>                                      port->up.pp.hw_addr);
>>
>> -    if (port->is_tunnel && tnl_port_reconfigure(port, port->up.netdev,
>> -                                                port->odp_port)) {
>> +    netdev_vport_get_dpif_port(netdev, namebuf, sizeof namebuf);
>> +
>> +    if (port->is_tunnel && tnl_port_reconfigure(port, netdev,
>> +                                                port->odp_port, namebuf)) {
>>         ofproto_dpif_cast(port->up.ofproto)->backer->need_revalidate =
>>             REV_RECONFIGURE;
>>     }
>> @@ -3462,8 +3478,10 @@ ofproto_dpif_execute_actions(struct ofproto_dpif 
>> *ofproto,
>>     struct xlate_in xin;
>>     ofp_port_t in_port;
>>     struct dpif_execute execute;
>> +    struct ds ds;
>>     int error;
>>
>> +    ds_init(&ds);
>>     ovs_assert((rule != NULL) != (ofpacts != NULL));
>>
>>     dpif_flow_stats_extract(flow, packet, time_msec(), &stats);
>> @@ -3481,6 +3499,10 @@ ofproto_dpif_execute_actions(struct ofproto_dpif 
>> *ofproto,
>>
>>     execute.actions = ofpbuf_data(xout.odp_actions);
>>     execute.actions_len = ofpbuf_size(xout.odp_actions);
>> +
>> +    format_odp_actions(&ds, ofpbuf_data(xout.odp_actions),
>> +                           ofpbuf_size(xout.odp_actions));
>> +
>
> These additions look like dead code now.
>
right, removed. it.

>>     execute.packet = packet;
>>     execute.md = pkt_metadata_from_flow(flow);
>>     execute.needs_help = (xout.slow & SLOW_ACTION) != 0;
>> diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
>> index 46b0719..120af46 100644
>> --- a/ofproto/tunnel.c
>> +++ b/ofproto/tunnel.c
>> @@ -17,19 +17,25 @@
>>
>> #include <errno.h>
>>
>> +#include "tnl-arp-cache.h"
>> #include "byte-order.h"
>> #include "connectivity.h"
>> +#include "dpif.h"
>> #include "dynamic-string.h"
>> #include "hash.h"
>> #include "hmap.h"
>> #include "netdev.h"
>> #include "odp-util.h"
>> +#include "ofpbuf.h"
>> #include "packets.h"
>> #include "seq.h"
>> #include "smap.h"
>> #include "socket-util.h"
>> #include "tunnel.h"
>> +#include "tnl-ports.h"
>> #include "vlog.h"
>> +#include "unaligned.h"
>> +#include "ofproto-dpif.h"
>>
>> VLOG_DEFINE_THIS_MODULE(tunnel);
>>
>> @@ -124,7 +130,7 @@ static void tnl_port_del__(const struct ofport_dpif *) 
>> OVS_REQ_WRLOCK(rwlock);
>>
>> static bool
>> tnl_port_add__(const struct ofport_dpif *ofport, const struct netdev *netdev,
>> -               odp_port_t odp_port, bool warn)
>> +               odp_port_t odp_port, bool warn, const char name[])
>>     OVS_REQ_WRLOCK(rwlock)
>> {
>>     const struct netdev_tunnel_config *cfg;
>> @@ -173,6 +179,12 @@ tnl_port_add__(const struct ofport_dpif *ofport, const 
>> struct netdev *netdev,
>>     }
>>     hmap_insert(*map, &tnl_port->match_node, tnl_hash(&tnl_port->match));
>>     tnl_port_mod_log(tnl_port, "adding");
>> +
>> +    if (dpif_is_userspace_tunneling_on()) {
>> +        /* XXX: Add generic API to handle other tunneling protocols. */
>> +        tnl_port_insert_udp(odp_port, tnl_port->match.ip_dst,
>> +                               cfg->dst_port, name);
>> +    }
>>     return true;
>> }
>>
>> @@ -181,10 +193,10 @@ tnl_port_add__(const struct ofport_dpif *ofport, const 
>> struct netdev *netdev,
>>  * tunnel. */
>> void
>> tnl_port_add(const struct ofport_dpif *ofport, const struct netdev *netdev,
>> -             odp_port_t odp_port) OVS_EXCLUDED(rwlock)
>> +             odp_port_t odp_port, const char name[]) OVS_EXCLUDED(rwlock)
>> {
>>     ovs_rwlock_wrlock(&rwlock);
>> -    tnl_port_add__(ofport, netdev, odp_port, true);
>> +    tnl_port_add__(ofport, netdev, odp_port, true, name);
>>     ovs_rwlock_unlock(&rwlock);
>> }
>>
>> @@ -194,7 +206,8 @@ tnl_port_add(const struct ofport_dpif *ofport, const 
>> struct netdev *netdev,
>>  * tnl_port_add(). */
>> bool
>> tnl_port_reconfigure(const struct ofport_dpif *ofport,
>> -                     const struct netdev *netdev, odp_port_t odp_port)
>> +                     const struct netdev *netdev, odp_port_t odp_port,
>> +                     const char name[])
>>     OVS_EXCLUDED(rwlock)
>> {
>>     struct tnl_port *tnl_port;
>> @@ -203,13 +216,13 @@ tnl_port_reconfigure(const struct ofport_dpif *ofport,
>>     ovs_rwlock_wrlock(&rwlock);
>>     tnl_port = tnl_find_ofport(ofport);
>>     if (!tnl_port) {
>> -        changed = tnl_port_add__(ofport, netdev, odp_port, false);
>> +        changed = tnl_port_add__(ofport, netdev, odp_port, false, name);
>>     } else if (tnl_port->netdev != netdev
>>                || tnl_port->match.odp_port != odp_port
>>                || tnl_port->change_seq != seq_read(connectivity_seq_get())) {
>>         VLOG_DBG("reconfiguring %s", tnl_port_get_name(tnl_port));
>>         tnl_port_del__(ofport);
>> -        tnl_port_add__(ofport, netdev, odp_port, true);
>> +        tnl_port_add__(ofport, netdev, odp_port, true, name);
>>         changed = true;
>>     }
>>     ovs_rwlock_unlock(&rwlock);
>> @@ -227,8 +240,11 @@ tnl_port_del__(const struct ofport_dpif *ofport) 
>> OVS_REQ_WRLOCK(rwlock)
>>
>>     tnl_port = tnl_find_ofport(ofport);
>>     if (tnl_port) {
>> +        const struct netdev_tunnel_config *cfg =
>> +            netdev_get_tunnel_config(tnl_port->netdev);
>>         struct hmap **map;
>>
>> +        tnl_port_delete(tnl_port->match.ip_dst, cfg->dst_port);
>>         tnl_port_mod_log(tnl_port, "removing");
>>         map = tnl_match_map(&tnl_port->match);
>>         hmap_remove(*map, &tnl_port->match_node);
>> @@ -639,3 +655,54 @@ tnl_port_get_name(const struct tnl_port *tnl_port) 
>> OVS_REQ_RDLOCK(rwlock)
>> {
>>     return netdev_get_name(tnl_port->netdev);
>> }
>> +
>> +int
>> +tnl_port_build_header(const struct ofport_dpif *ofport, ovs_be32 router_ip,
>> +                      const struct flow *tnl_flow,
>> +                      const char br_name[], uint8_t smac[ETH_ADDR_LEN],
>> +                      ovs_be32 ip_src, struct ofpbuf *header,
>> +                      bool *send_arp)
>> +{
>> +    struct tnl_port *tnl_port;
>> +    uint8_t dmac[ETH_ADDR_LEN];
>> +    struct eth_header *eth;
>> +    struct ip_header *ip;
>> +    int res;
>> +
>> +    *send_arp = false;
>> +    ovs_rwlock_rdlock(&rwlock);
>> +    tnl_port = tnl_find_ofport(ofport);
>> +    ovs_assert(tnl_port);
>> +
>> +    res = tnl_arp_lookup(br_name, router_ip, dmac);
>> +    if (res) {
>> +        *send_arp = true;
>> +        goto out;
>> +    }
>> +
>> +    /* Build Ethernet and IP headers. */
>> +    ofpbuf_clear(header);
>> +
>> +    eth = ofpbuf_put_uninit(header, sizeof *eth);
>> +    memcpy(eth->eth_dst, dmac, ETH_ADDR_LEN);
>> +    memcpy(eth->eth_src, smac, ETH_ADDR_LEN);
>> +    eth->eth_type = htons(ETH_TYPE_IP);
>> +
>> +    ip = ofpbuf_put_zeros(header, sizeof *ip);
>> +    ip->ip_ihl_ver = IP_IHL_VER(5, 4);
>> +    ip->ip_tos = tnl_flow->tunnel.ip_tos;
>> +    ip->ip_ttl = tnl_flow->tunnel.ip_ttl;
>> +    ip->ip_proto = IPPROTO_UDP;
>> +    ip->ip_frag_off = (tnl_flow->tunnel.flags & FLOW_TNL_F_DONT_FRAGMENT) ?
>> +                      htons(IP_DF) : 0;
>> +
>> +    put_16aligned_be32(&ip->ip_src, ip_src);
>> +    put_16aligned_be32(&ip->ip_dst, tnl_flow->tunnel.ip_dst);
>> +    ip->ip_csum = 0;
>> +
>> +    netdev_build_header(tnl_port->netdev, header);
>> +out:
>> +    ovs_rwlock_unlock(&rwlock);
>> +
>> +    return res;
>> +}
>> diff --git a/ofproto/tunnel.h b/ofproto/tunnel.h
>> index 27a2f7d..611c139 100644
>> --- a/ofproto/tunnel.h
>> +++ b/ofproto/tunnel.h
>> @@ -29,10 +29,10 @@ struct ofport_dpif;
>> struct netdev;
>>
>> bool tnl_port_reconfigure(const struct ofport_dpif *, const struct netdev *,
>> -                          odp_port_t);
>> +                          odp_port_t, const char name[]);
>>
>> void tnl_port_add(const struct ofport_dpif *, const struct netdev *,
>> -                  odp_port_t odp_port);
>> +                  odp_port_t odp_port, const char name[]);
>> void tnl_port_del(const struct ofport_dpif *);
>>
>> const struct ofport_dpif *tnl_port_receive(const struct flow *);
>> @@ -48,4 +48,10 @@ tnl_port_should_receive(const struct flow *flow)
>>     return flow->tunnel.ip_dst != 0;
>> }
>>
>> +int tnl_port_build_header(const struct ofport_dpif *ofport, ovs_be32 
>> router_ip,
>> +                          const struct flow *tnl_flow,
>> +                          const char br_name[], uint8_t smac[ETH_ADDR_LEN],
>> +                          ovs_be32 ip_src, struct ofpbuf *header,
>> +                          bool *send_arp);
>> +
>> #endif /* tunnel.h */
>> diff --git a/rhel/openvswitch.spec.in b/rhel/openvswitch.spec.in
>> index 56b7404..b63f00c 100644
>> --- a/rhel/openvswitch.spec.in
>> +++ b/rhel/openvswitch.spec.in
>> @@ -175,6 +175,6 @@ exit 0
>> /usr/share/openvswitch/vswitch.ovsschema
>> /usr/share/openvswitch/vtep.ovsschema
>> %doc COPYING DESIGN INSTALL.SSL NOTICE README.md WHY-OVS FAQ NEWS
>> -%doc INSTALL.DPDK rhel/README.RHEL
>> +%doc INSTALL.DPDK rhel/README.RHEL README-Userspace-Tunneling
>> /var/lib/openvswitch
>> /var/log/openvswitch
>> diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
>> index 6e4929c..56e4a91 100644
>> --- a/vswitchd/ovs-vswitchd.8.in
>> +++ b/vswitchd/ovs-vswitchd.8.in
>> @@ -82,6 +82,10 @@ Some systems do not support \fBmlockall()\fR at all, and 
>> other systems
>> only allow privileged users, such as the superuser, to use it.
>> \fBovs\-vswitchd\fR emits a log message if \fBmlockall()\fR is
>> unavailable or unsuccessful.
>> +.IP "\fB\-\-userspace-tunneling\fR"
>> +Enable userspace tunneling in \fBovs\-vswitchd\fR for userspace datapath. 
>> User
>> +need to do special configuration to make it work. For more info refer to
>> +README-Userspace-Tunneling.
>> .
>> .SS "DPDK Options"
>> .IP "\fB\-\-dpdk\fR"
>> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
>> index 3c82f0f..b90a592 100644
>> --- a/vswitchd/ovs-vswitchd.c
>> +++ b/vswitchd/ovs-vswitchd.c
>> @@ -149,6 +149,7 @@ parse_options(int argc, char *argv[], char 
>> **unixctl_pathp)
>>         OPT_DISABLE_SYSTEM,
>>         DAEMON_OPTION_ENUMS,
>>         OPT_DPDK,
>> +        OPT_ENABLE_USER_TUNNEL,
>>     };
>>     static const struct option long_options[] = {
>>         {"help",        no_argument, NULL, 'h'},
>> @@ -163,6 +164,7 @@ parse_options(int argc, char *argv[], char 
>> **unixctl_pathp)
>>         {"enable-dummy", optional_argument, NULL, OPT_ENABLE_DUMMY},
>>         {"disable-system", no_argument, NULL, OPT_DISABLE_SYSTEM},
>>         {"dpdk", required_argument, NULL, OPT_DPDK},
>> +        {"userspace-tunneling", no_argument, NULL, OPT_ENABLE_USER_TUNNEL},
>>         {NULL, 0, NULL, 0},
>>     };
>>     char *short_options = long_options_to_short_options(long_options);
>> @@ -211,6 +213,10 @@ parse_options(int argc, char *argv[], char 
>> **unixctl_pathp)
>>             dp_blacklist_provider("system");
>>             break;
>>
>> +        case OPT_ENABLE_USER_TUNNEL:
>> +            dpif_enable_userspace_tunneling();
>> +            break;
>> +
>
> IMO we should use probing instead of explicit configuration parameter.
>
removed this vswitch option.

>    Jarno
>
>
>>         case '?':
>>             exit(EXIT_FAILURE);
>>
>> --
>> 1.9.3
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> http://openvswitch.org/mailman/listinfo/dev
>
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH RFC 2/2] openvswitch: Userspace tunneling.

Reply via email to