On 2017年01月11日 11:13, Tan, Jianfeng wrote:
On 1/11/2017 10:42 AM, Jason Wang wrote:
On 2017年01月10日 14:11, Tan, Jianfeng wrote:
Hi Jason,
On 1/9/2017 12:39 PM, Jason Wang wrote:
On 2016年12月23日 15:14, Jianfeng Tan wrote:
This patch add support vhost kernel as the backend for virtio_user.
Three main hook functions are added:
- vhost_kernel_setup() to open char device, each vq pair needs one
vhostfd;
- vhost_kernel_ioctl() to communicate control messages with vhost
kernel module;
- vhost_kernel_enable_queue_pair() to open tap device and set it
as the backend of corresonding vhost fd (that is to say, vq
pair).
Signed-off-by: Jianfeng Tan <jianfeng....@intel.com>
---
[...]
+/* TUNSETIFF ifr flags */
+#define IFF_TAP 0x0002
+#define IFF_NO_PI 0x1000
+#define IFF_ONE_QUEUE 0x2000
+#define IFF_VNET_HDR 0x4000
+#define IFF_MULTI_QUEUE 0x0100
+#define IFF_ATTACH_QUEUE 0x0200
+#define IFF_DETACH_QUEUE 0x0400
Do we really want to duplicate those things which has been exposed
by uapi here?
You mean those defined in <linux/if_tun.h>? Redefine those common
macros, or include standard header file, with respective pros and
cons. DPDK prefers the redefinition way as far as I understand,
doesn't it?
Well, if you really want to do this, you may want to use an
independent file. Then you can sync it with linux headers with a bash
script.
Agreed!
[...]
+static struct vhost_memory_kernel *
+prepare_vhost_memory_kernel(void)
+{
+ uint32_t i, j, k = 0;
+ struct rte_memseg *seg;
+ struct vhost_memory_region *mr;
+ struct vhost_memory_kernel *vm;
+
+ vm = malloc(sizeof(struct vhost_memory_kernel) +
+ VHOST_KERNEL_MAX_REGIONS *
+ sizeof(struct vhost_memory_region));
+
+ for (i = 0; i < RTE_MAX_MEMSEG; ++i) {
+ seg = &rte_eal_get_configuration()->mem_config->memseg[i];
+ if (!seg->addr)
+ break;
If we're sure the number of regions is less than 64(or the module
parameter read from /sys), can we avoid the iteration here?
The "if" statement under the "for" statement can save us from all
RTE_MAX_MEMSEG iteration.
But if we know the number of regions is short than the limit, there's
even no need for this?
The problem is: we don't have a variable to know how many segments are
there in DPDK. Anyway, we need to report error in the situation that #
of segments > # of regions. Or can you give a more specific example?
[...]
If we don't know #segments, I'm fine with current code.
+ if (dev->ifname)
+ strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ);
+ else
+ strncpy(ifr.ifr_name, "tap%d", IFNAMSIZ);
+ if (ioctl(tapfd, TUNSETIFF, (void *)&ifr) == -1) {
+ PMD_DRV_LOG(ERR, "TUNSETIFF failed: %s", strerror(errno));
+ goto error;
+ }
This requires CAP_NET_ADMIN, so we should really consider to accept
a pre-created fd here.
It sounds like a future work for me. So far, all DPDK apps are
running in privileged mode (including CAP_NET_ADMIN?).
That's not safe.
Yes.
Accepting a per-created fd can solve this, and can even support macvtap
So you are proposing a way to specify the virtio_user parameter like,
--vdev=virtio_user,tapfd=fd1,xxx, right? As replied in your another
comment, it's a great way to go. But I'm afraid we have no time to
implement that in this cycle.
Thanks,
Jianfeng
Ok, just want to show its advantages. It can be added on top.
And two more suggestions:
- better to split tap support out of vhost file
- kernel support more than 8 queues on recent kernel, so there's no need
to limit it to 8. When running on old kernel, TUNSETIFF will fail which
can give user a hint to reduce #queues.
Thanks