On 2017年01月11日 11:13, Tan, Jianfeng wrote:


On 1/11/2017 10:42 AM, Jason Wang wrote:


On 2017年01月10日 14:11, Tan, Jianfeng wrote:

Hi Jason,


On 1/9/2017 12:39 PM, Jason Wang wrote:


On 2016年12月23日 15:14, Jianfeng Tan wrote:
This patch add support vhost kernel as the backend for virtio_user.
Three main hook functions are added:
   - vhost_kernel_setup() to open char device, each vq pair needs one
     vhostfd;
   - vhost_kernel_ioctl() to communicate control messages with vhost
     kernel module;
   - vhost_kernel_enable_queue_pair() to open tap device and set it
as the backend of corresonding vhost fd (that is to say, vq pair).

Signed-off-by: Jianfeng Tan <jianfeng....@intel.com>
---

[...]
+/* TUNSETIFF ifr flags */
+#define IFF_TAP          0x0002
+#define IFF_NO_PI        0x1000
+#define IFF_ONE_QUEUE    0x2000
+#define IFF_VNET_HDR     0x4000
+#define IFF_MULTI_QUEUE  0x0100
+#define IFF_ATTACH_QUEUE 0x0200
+#define IFF_DETACH_QUEUE 0x0400

Do we really want to duplicate those things which has been exposed by uapi here?

You mean those defined in <linux/if_tun.h>? Redefine those common macros, or include standard header file, with respective pros and cons. DPDK prefers the redefinition way as far as I understand, doesn't it?


Well, if you really want to do this, you may want to use an independent file. Then you can sync it with linux headers with a bash script.

Agreed!

[...]
+static struct vhost_memory_kernel *
+prepare_vhost_memory_kernel(void)
+{
+    uint32_t i, j, k = 0;
+    struct rte_memseg *seg;
+    struct vhost_memory_region *mr;
+    struct vhost_memory_kernel *vm;
+
+    vm = malloc(sizeof(struct vhost_memory_kernel) +
+            VHOST_KERNEL_MAX_REGIONS *
+            sizeof(struct vhost_memory_region));
+
+    for (i = 0; i < RTE_MAX_MEMSEG; ++i) {
+        seg = &rte_eal_get_configuration()->mem_config->memseg[i];
+        if (!seg->addr)
+            break;

If we're sure the number of regions is less than 64(or the module parameter read from /sys), can we avoid the iteration here?

The "if" statement under the "for" statement can save us from all RTE_MAX_MEMSEG iteration.

But if we know the number of regions is short than the limit, there's even no need for this?

The problem is: we don't have a variable to know how many segments are there in DPDK. Anyway, we need to report error in the situation that # of segments > # of regions. Or can you give a more specific example?

[...]

If we don't know #segments, I'm fine with current code.

+    if (dev->ifname)
+        strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ);
+    else
+        strncpy(ifr.ifr_name, "tap%d", IFNAMSIZ);
+    if (ioctl(tapfd, TUNSETIFF, (void *)&ifr) == -1) {
+        PMD_DRV_LOG(ERR, "TUNSETIFF failed: %s", strerror(errno));
+        goto error;
+    }

This requires CAP_NET_ADMIN, so we should really consider to accept a pre-created fd here.

It sounds like a future work for me. So far, all DPDK apps are running in privileged mode (including CAP_NET_ADMIN?).


That's not safe.

Yes.

Accepting a per-created fd can solve this, and can even support macvtap

So you are proposing a way to specify the virtio_user parameter like, --vdev=virtio_user,tapfd=fd1,xxx, right? As replied in your another comment, it's a great way to go. But I'm afraid we have no time to implement that in this cycle.

Thanks,
Jianfeng

Ok, just want to show its advantages. It can be added on top.

And two more suggestions:
- better to split tap support out of vhost file
- kernel support more than 8 queues on recent kernel, so there's no need to limit it to 8. When running on old kernel, TUNSETIFF will fail which can give user a hint to reduce #queues.

Thanks


Reply via email to