Hi Mark, RFC 2863 standard describes in "Interface Numbering" chapter (https://tools.ietf.org/html/rfc2863#page-10) that if there's possibility to reuse ifindex after reinitialization then it should be reused. And in our case it's definitely possible. I also agree with you that limitation to 1024 vhostuser interfaces is a major problem here and I should avoid such arbitrary values. I think the solution might be to use hashmap or OVS list implementation or maybe even dynamic allocation of new vhost ifindex entries.
I have also analyzed current vhost-cuse initialization and adding support for ifindex assignment to this type of port should be a trivial task and it could be easily introduced in V2 of this patch. Please share your thoughts. Thanks, Przemek -----Original Message----- From: Kavanagh, Mark B Sent: Wednesday, May 11, 2016 11:56 AM To: Lal, PrzemyslawX <przemyslawx....@intel.com>; dev@openvswitch.org Subject: RE: [PATCH] netdev-dpdk: add sflow support for vhost-user ports > >Hi Mark, > >Replies inline prefixed with [PL]. > >Thanks, >Przemek > >-----Original Message----- >From: Kavanagh, Mark B >Sent: Thursday, May 5, 2016 2:19 PM >To: Lal, PrzemyslawX <przemyslawx....@intel.com>; dev@openvswitch.org >Subject: RE: [PATCH] netdev-dpdk: add sflow support for vhost-user >ports > >Hi Przemek, > >Some additional comments/queries inline. > >Thanks again, >Mark > >> >>This patch adds sFlow support for DPDK vHost-user interfaces by >>assigning ifindex value. Values of ifindexes for vHost-user interfaces >>start with 2000 to avoid overlapping with kernel datapath interfaces. >> >>Patch also fixes issue with 'dpdk0' interface being ignored by sFlow >>agent, because of ifindex 0. Ifindexes values for physical DPDK >>interfaces start with 1000 to avoid overlapping with kernel datapath >>interfaces. >> >>Signed-off-by: Przemyslaw Lal <przemyslawx....@intel.com> >>--- >> lib/netdev-dpdk.c | 70 >>++++++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 69 insertions(+), 1 deletion(-) >> >>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index >>208c5f5..1b60c5a 100644 >>--- a/lib/netdev-dpdk.c >>+++ b/lib/netdev-dpdk.c >>@@ -115,6 +115,18 @@ static char *vhost_sock_dir = NULL; /* Location of >>vhost-user sockets >>*/ >> */ >> #define VHOST_ENQ_RETRY_USECS 100 >> >>+/* For DPDK ETH interfaces use ifindex values starting with 1000 >>+ * to avoid overlapping with kernel-space interfaces. >>+ * Also starting with 0 would cause sFlow to ignore 'dpdk0' interface. >>+ */ >>+#define DPDK_PORT_ID_TO_IFINDEX(port_id) ((port_id) + 1000) >>+ >>+/* For DPDK vhost-user interfaces use ifindexes starting with 2000. >>+ */ >>+#define VHOST_ID_TO_IFINDEX(port_id) ((port_id) + 2000) >>+ >>+#define VHOST_IDS_MAX_LEN 1024 >>+ >> static const struct rte_eth_conf port_conf = { >> .rxmode = { >> .mq_mode = ETH_MQ_RX_RSS, >>@@ -149,6 +161,7 @@ enum dpdk_dev_type { static int rte_eal_init_ret >>= ENODEV; >> >> static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER; >>+static struct ovs_mutex vhost_mutex = OVS_MUTEX_INITIALIZER; >> >> /* Quality of Service */ >> >>@@ -790,6 +803,50 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev) >> return err; >> } >> >>+/* Counter for VHOST interfaces as DPDK library doesn't provide >>+mechanism >>+ * similar to rte_eth_dev_count for vhost-user sockets. >>+ */ >>+static int vhost_counter OVS_GUARDED_BY(vhost_mutex) = 0; >>+ >>+/* Array storing vhost_ids, so their ifindex can be reused after >>+socket >>+ * recreation. >>+ */ >>+static char vhost_ids[VHOST_IDS_MAX_LEN][PATH_MAX] >>+OVS_GUARDED_BY(vhost_mutex); >>+ >>+/* Simple lookup in vhost_ids array. >>+ * On success returns id of vhost_id stored in the array, >>+ * otherwise returns -1. >>+ */ >>+static int >>+netdev_dpdk_lookup_vhost_id(struct netdev_dpdk *dev) >>+OVS_REQUIRES(vhost_mutex) { >>+ for (int i = 0; i < vhost_counter; i++) { >>+ if (strncmp(vhost_ids[i], dev->vhost_id, strlen(dev->vhost_id)) == >>0) { >>+ return i; >>+ } >>+ } >>+ return -1; >>+} >>+ >>+/* Inserts vhost_id at the first free position in the vhost_ids array. >>+ */ >>+static void >>+netdev_dpdk_push_vhost_id(struct netdev_dpdk *dev) { >>+ ovs_mutex_lock(&vhost_mutex); >>+ if (netdev_dpdk_lookup_vhost_id(dev) < 0) { >>+ if (vhost_counter < VHOST_IDS_MAX_LEN) { >>+ strncpy(vhost_ids[vhost_counter++], dev->vhost_id, >>+ strlen(dev->vhost_id)); >>+ } else { >>+ VLOG_WARN("Could not assign ifindex to \"%s\" port. " >>+ "List of vhost IDs list is full.", >>+ dev->vhost_id); >>+ } >>+ } >>+ ovs_mutex_unlock(&vhost_mutex); >>+} >>+ >> static int >> netdev_dpdk_vhost_user_construct(struct netdev *netdev) { @@ -825,6 >>+882,8 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev) >> err = vhost_construct_helper(netdev); >> } >> >>+ netdev_dpdk_push_vhost_id(dev); >>+ >> ovs_mutex_unlock(&dpdk_mutex); >> return err; >> } >>@@ -1773,9 +1832,18 @@ netdev_dpdk_get_ifindex(const struct netdev >>*netdev) { >> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> int ifindex; >>+ int ret; >> >> ovs_mutex_lock(&dev->mutex); >>- ifindex = dev->port_id; >>+ if (dev->type == DPDK_DEV_ETH) { >>+ ifindex = DPDK_PORT_ID_TO_IFINDEX(dev->port_id); >>+ } else { > > >This 'else' statement is executed in both the case of vhost user >devices and also vhost cuse devices. >In the latter case, do the port numbers returned by VHOST_ID_IFINDEX >make sense? (i.e. can sFlow 'talk' to vhost-cuse ports? If not, then this code >has a bug). > >[PL] At this point for vhost cuse there's always '-1' returned from >lookup function, so in the nested if statement you'll always hit 'else' >and 0 will be returned as ifindex for vhost cuse devices. Agreed. But then all vhost-cuse devices share index of 0 - this is surely problematic? >Anyway setting ifindexes for vhost cuse devices should be a trivial >task and if vhost cuse is supported by sflow I can handle it - but then >it will require changing patch subject and message. Have you tested this patch against vhost-cuse? I'd be interested as to how it behaves. > >>+ if ((ret = netdev_dpdk_lookup_vhost_id(dev)) >= 0) { >>+ ifindex = VHOST_ID_TO_IFINDEX(ret); >>+ } else { >>+ ifindex = 0; > >I don't think that 0 is an appropriate value to return here, since it >overlaps with a kernel interface's index. > >[PL] IMHO it's ok to use 0 here - it will be assigned to ifindex only >when lookup for vhost device failed. Have you seen Ben's comment on this? http://openvswitch.org/pipermail/dev/2016-May/070626.html > >>+ } >>+ } >> ovs_mutex_unlock(&dev->mutex); >> >> return ifindex; >>-- >>2.1.0 > >[MK] There's a gap in this code - when the vhost-user device is >destroyed, its entry should be removed from 'vhost_ids'. > >[PL] Actually the whole idea behind "caching" these vhost_ids is to >have them stored to survive destroy and recreation of vhost devices, >for instance when VM is hard-rebooted, to keep the same ifindex and not >assign a new one, which could be confusing (for example one could have >vhu1 port with ifindex 2001 visible in sFlow stats, but then >hard-reboots this VM and vhu1 gets ifindex 2002 - it's the same vhost >user socket, the same VM, but somehow it gets new ifindex and gets new entry >in sflow). But if you think I can be improve this, don't hesitate to share >your ideas. I completely understand what your intention is here. There is still an inherent issue with the design though, in that once vhost ports have been added to 'vhost_ids', they are never removed. Consider the following scenario: - multiple vhost ports have been added to the bridge, such that vhost_counter = VHOST_IDS_MAX_LEN (I'm not sure what the use case for adding 1024 ports to a bridge would be, but I digress) - one or more of the previously-added ports are no longer needed, so we bring the VMs that they are attached to down - we want to add another vhost port, described by a vhost_id that is not already present in 'vhost_ids', and connect it to a VM => Now, however, we cannot add the additional port, due to the fact that vhost_counter was never decremented, such that it is still equal to VHOST_IDS_MAX_LEN: the bounds check in netdev_dpdk_push_vhost_id fails, and the user is warned that their port cannot be added. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev