Adds documentation on how to run IVSHMEM communication through VM. Signed-off-by: Mike A. Polehn <mike.a.pol...@intel.com>
diff --git a/INSTALL.DPDK b/INSTALL.DPDK index 4551f4c..8d866e9 100644 --- a/INSTALL.DPDK +++ b/INSTALL.DPDK @@ -19,10 +19,14 @@ Recommended to use DPDK 1.6. DPDK: Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.6.0r2 cd $DPDK_DIR -update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate single lib file. +update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate +single lib file (modification also required for IVSHMEM build). CONFIG_RTE_BUILD_COMBINE_LIBS=y -make install T=x86_64-default-linuxapp-gcc +For default install without IVSHMEM (old): + make install T=x86_64-default-linuxapp-gcc +To include IVSHMEM (shared memory): + make install T=x86_64-ivshmem-linuxapp-gcc For details refer to http://dpdk.org/ Linux kernel: @@ -32,7 +36,10 @@ DPDK kernel requirement. OVS: cd $(OVS_DIR)/openvswitch ./boot.sh -export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc +Without IVSHMEM + export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc +With IVSHMEM: + export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-ivshmem-linuxapp-gcc ./configure --with-dpdk=$DPDK_BUILD make @@ -44,12 +51,18 @@ Using the DPDK with ovs-vswitchd: Setup system boot: kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=1 +To include 3 GB memory for VM (2 socket system, half on each NUMA node) + kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=8 First setup DPDK devices: - insert uio.ko e.g. modprobe uio - - insert igb_uio.ko + + - insert igb_uio.ko (non-IVSHMEM case) e.g. insmod DPDK/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko + - insert igb_uio.ko (IVSHMEM case) + e.g. insmod DPDK/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko + - Bind network device to ibg_uio. e.g. DPDK/tools/pci_unbind.py --bind=igb_uio eth1 Alternate binding method: @@ -73,7 +86,7 @@ First setup DPDK devices: Prepare system: - mount hugetlbfs - e.g. mount -t hugetlbfs -o pagesize=1G none /mnt/huge/ + e.g. mount -t hugetlbfs -o pagesize=1G none /dev/hugepages Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. @@ -91,7 +104,7 @@ Start ovsdb-server as discussed in INSTALL doc: ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ --private-key=db:Open_vSwitch,SSL,private_key \ - --certificate=dbitch,SSL,certificate \ + --certificate=Open_vSwitch,SSL,certificate \ --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach First time after db creation, initialize: cd $OVS_DIR @@ -105,12 +118,13 @@ for dpdk initialization. e.g. export DB_SOCK=/usr/local/var/run/openvswitch/db.sock - ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach + ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach -If allocated more than 1 GB huge pages, set amount and use NUMA node 0 memory: +If allocated more than one 1 GB hugepage (as for IVSHMEM), set amount and use NUMA +node 0 memory: ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ - -- unix:$DB_SOCK --pidfile --detach + -- unix:$DB_SOCK --pidfile --detach To use ovs-vswitchd with DPDK, create a bridge with datapath_type "netdev" in the configuration database. For example: @@ -136,9 +150,7 @@ Test flow script across NICs (assuming ovs in /usr/src/ovs): ############################# Script: #! /bin/sh - # Move to command directory - cd /usr/src/ovs/utilities/ # Clear current flows @@ -158,7 +170,8 @@ help. At this time all ovs-vswitchd tasks end up being affinitized to cpu core 0 but this may change. Lets pick a target core for 100% task to run on, i.e. core 7. -Also assume a dual 8 core sandy bridge system with hyperthreading enabled. +Also assume a dual 8 core sandy bridge system with hyperthreading enabled +where CPU1 has cores 0,...,7 and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu configuration will have different core mask requirements). To give better ownership of 100%, isolation maybe useful. @@ -178,11 +191,11 @@ taskset -p 080 1762 pid 1762's new affinity mask: 80 Assume that all other ovs-vswitchd threads to be on other socket 0 cores. -Affinitize the rest of the ovs-vswitchd thread ids to 0x0FF007F +Affinitize the rest of the ovs-vswitchd thread ids to 0x07F007F -taskset -p 0x0FF007F {thread pid, e.g 1738} +taskset -p 0x07F007F {thread pid, e.g 1738} pid 1738's current affinity mask: 1 - pid 1738's new affinity mask: ff007f + pid 1738's new affinity mask: 7f007f . . . The core 23 is left idle, which allows core 7 to run at full rate. @@ -207,8 +220,8 @@ with the ring naming used within ovs. location tests/ovs_client To run the client : - - ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" + cd /usr/src/ovs/tests/ + ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" In the case of the dpdkr example above the "port id you gave dpdkr" is 0. @@ -218,6 +231,9 @@ The application simply receives an mbuf on the receive queue of the ethernet ring and then places that same mbuf on the transmit ring of the ethernet ring. It is a trivial loopback application. +DPDK rings in VM (IVSHMEM shared memory communications) +------------------------------------------------------- + In addition to executing the client in the host, you can execute it within a guest VM. To do so you will need a patched qemu. You can download the patch and getting started guide at : @@ -228,6 +244,281 @@ A general rule of thumb for better performance is that the client application should not be assigned the same dpdk core mask "-c" as the vswitchd. +Alternative method to get QEMU, download and build from OVDK: +------------------------------------------------------------- + +##### On Host + +Rebuild DPDK and OVS with IVSHMEM support (above). + +Example Fedora 20 Host tools which will build qemu: +Infrastructure Server install + Virt, C devl tools, devl tools, RPM tools + yum install tunctl rpmdevtools yum-utils ncurses-devel qt3-devel \ + libXi-devel gcc-c++ openssl-devel glibc.i686 libgcc.i686 libstdc++.i686 \ + glibc-devel.i686 kernel-devel libcap-devel gcc coreutils make nasm \ + glibc-devel autoconf automake zlib-devel glib2-devel libtool fuse-devel \ + pixman-devel fuse kernel-modules-extra + +Get and build qemu for OVDK: + cd /usr/src + git clone git://github.com/01org/dpdk-ovs + cd /usr/src/dpdk-ovs/qemu/ + export DPDK_DIR=/usr/src/dpdk-1.6.0r2 + ./configure --enable-kvm --target-list=x86_64-softmmu --disable-pie + make + +Start OVS for IVSHMEM test +-------------------------- + +Start ovsdb-server as above. +Start ovs-vswitchd with 1 GB NUMA node 0 memory + cd /usr/src/ovs + ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 -- \ + unix:/usr/local/var/run/openvswitch/db.sock --pidfile --detach + +Copy information for IVSHMEM VM use: + mkdir /tmp/share + chmod 777 /tmp/share + cp -a /run/.rte_* /tmp/share + +Prepare and start VM +-------------------- + +Example Fedora 20 VM created: +Minimal Install+Guest Agents, Standard Tools, C Dev Tools, Dev tools, RPM Tools + yum install rpmdevtools yum-utils ncurses-devel gcc make qt3-devel \ + libXi-devel gcc-c++ openssl-devel coreutils kernel-devel glibc.i686 \ + libgcc.i686 libstdc++.i686 glibc-devel.i686 kernel-devel libcap-devel + +Start VM for IVSHMEM test +------------------------- +Note: VM runs QEMU from the OVDK directory built above. + +Example: VM at /vm/Fed20-vm.qcow2, VM scipts in /vm/vm_ctl +Linux br-mgt management bridge setup + +############################## Example VM start script + +#!/bin/sh +vm=/vm/Fed20-vm.qcow2 +vm_name="IVSHMEM1" + +vnc=10 + +n1=tap46 +bra=br-mgt +dn_scrp_a=/vm/vm_ctl/br-mgt-ifdown +mac1=00:1f:33:16:64:44 + +if [ ! -f $vm ]; +then + echo "VM $vm not found!" +else + echo "VM $vm started! VNC: $vnc, management network: $n1" + tunctl -t $n1 + brctl addif $bra $n1 + ifconfig $n1 0.0.0.0 up + + taskset 0x30 /usr/src/dpdk-ovs/qemu/x86_64-softmmu/qemu-system-x86_64 \ +-cpu host -hda $vm -m 3072 -boot c -smp 2 -pidfile /tmp/vm1.pid \ +-monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages \ +-mem-prealloc -enable-kvm -net nic,model=virtio,netdev=eth0,macaddr=$mac1 \ +-netdev tap,ifname=$n1,id=eth0,vhost=on,script=no,downscript=$dn_scrp_a \ +-device ivshmem,size=1024M,shm=fd:/dev/hugepages/rtemap_0:0x0:0x40000000 \ +-name $vm_name -vnc :$vnc -drive file=fat:/tmp/share & +fi + +############################## Example br-mgt-ifdown script + +#!/bin/sh + +bridge='br-mgt' +/sbin/ifconfig $1 0.0.0.0 down +brctl delif ${bridge} $1 + +############################## + +On host, taskset sets the host CPU cores (0x030 in this case) used by QEMU. +Core selection is system specific, the selected core should be on the same +NUMA node or CPU socket. For Linux, the memory is generally allocated from the +NUMA node the processes are running on when allocation occurs providing memory +is available on that NUMA node at that time. + +VM Setup +---------------------- + +Set VM Kernel Bootline parameters and reboot: + default_hugepagesz=1GB hugepagesz=1G hugepages=1 isolcpus=1 + +Copy from host to VM (10.4.0.160): + scp /usr/src/dpdk-1.6.0r2.tar.gz 10.4.0.160:/root + +Build DPDK in VM + cd /root + tar -xf dpdk-1.6.0r2.tar.gz + cd /root/dpdk-1.6.0r2/ +update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate +single lib file (modification also required for IVSHMEM build). +CONFIG_RTE_BUILD_COMBINE_LIBS=y + make install T=x86_64-ivshmem-linuxapp-gcc + + mkdir -p /root/ovs_client +Copy ovsclient code and Makefile from host: + scp /usr/src/ovs/tests/ovs_client/ovs_client.c 10.4.0.160:/root/ovs_client + scp /usr/src/dpdk-ovs/guest/ovs_client/Makefile 10.4.0.160:/root/ovs_client + +On VM patch or change /root/ovs_client/Makefile: + +################# Makefile Patch + +diff --git a/Makefile b/Makefile +index 9df37ef..cef1903 100755 +--- a/Makefile ++++ b/Makefile +@@ -39,13 +39,12 @@ endif + include $(RTE_SDK)/mk/rte.vars.mk + + # binary name +-APP = ovs_client ++APP = ovsclient + + # all source are stored in SRCS-y +-SRCS-y := ovs_client.c libvport/ovs-vport.c ++SRCS-y := ovs_client.c + + CFLAGS += -O3 + CFLAGS += $(WERROR_FLAGS) +-CFLAGS += -I$(SRCDIR)/libvport + + include $(RTE_SDK)/mk/rte.extapp.mk + +################## + +On the VM, patch or change /root/ovs_client/ovs_client to remove printf in +the process loop. The printf makes processing unstable in the VM, but works +OK on the host test. + +################## ovs_client.c Patch + +diff --git a/ovs_client.c b/ovs_client.c +index dbd99b1..8240275 100644 +--- a/ovs_client.c ++++ b/ovs_client.c +@@ -217,10 +217,5 @@ main(int argc, char *argv[]) + } else { + no_pkt++; + } +- +- if (!(pkt % 100000)) { +- printf("pkt %d %d\n", pkt, no_pkt); +- pkt = no_pkt = 0; +- } + } + } + +################## + +Build ovsclient: + cd /root/ovs_client + export RTE_SDK=/root/dpdk-1.6.0r2 + export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc + make + +Setup of VM to run ovsclient +---------------------------- + +Mount VM internal hugepage memory + mkdir -p /mnt/hugepages + mount -t hugetlbfs hugetlbfs /mnt/hugepages + +Copy host info for IVSHMEM memory + mount -o iocharset=utf8 /dev/sdb1 /mnt/ovs_client + cp -a /mnt/ovs_client/.rte_* /run + +Find IVSHMEM memory PCI device, set hugepage link (first time) + lspci |grep RAM + 00:04.0 RAM memory: Red Hat, Inc Inter-VM shared memory + Find device info by tab completion + dir /sys/devices/pci0000\:00/0000\:00\:04.0/resource2 + Use last line and modify to creat link + ln -s /sys/devices/pci0000\:00/0000\:00\:04.0/resource2 \ + /dev/hugepages/rtemap_0 + Verify link: + ls -l /dev/hugepages + +Run ovsclient +------------- + +Start IVSHMEM Client task + cd /root/ovs_client + ./build/ovsclient -c 1 -n 4 --proc-type=secondary -- -n 0 & + +Find ovsclient task process ids: + ps -eLF |grep ovsclient + root 3538 392 3538 11 2 399513 1092 0 13:09 pts/0 00:00:15 ... + root 3538 392 3539 0 2 399513 1092 0 13:09 pts/0 00:00:00 ... +Set process accumalating time to vCPU1 and make high priortiy, others to vCPU0: + taskset -p 2 3538 + taskset -p 1 3539 + renice -20 -p 3538 +Verify with top ... 1, in VM that task load on vCPU 1 and not vCPU 0 + +Affinitize host QEMU for VM vCPUs +--------------------------------- +CPU core affinitization is different for different systems. For this example +the cpu cores 4 (0x10) and 5 (0x20) are on same physical CPU. + +Example: + ps -eLF |grep qemu + root 2256 1 2256 0 5 1123016 21828 4 13:38 pts/1 00:00:02 ... + root 2256 1 2261 5 5 1123016 21828 4 13:38 pts/1 00:08:36 ... + root 2256 1 2262 4 5 1123016 21828 4 13:38 pts/1 00:07:21 ... + root 2256 1 2264 0 5 1123016 21828 4 13:38 pts/1 00:00:00 ... +Note: VM 100% task was in vCPU0 then moved to vCPU1 so both vCPU0 (2261) and +vCPU1 (2262) have accumulated process time. + taskset -p 10 2256 + pid 2256's current affinity mask: 30 + pid 2256's new affinity mask: 10 + taskset -p 10 2261 + pid 2261's current affinity mask: 30 + pid 2261's new affinity mask: 10 + taskset -p 20 2262 + pid 2262's current affinity mask: 30 + pid 2262's new affinity mask: 20 + taskset -p 10 2264 + pid 2264's current affinity mask: 30 + pid 2264's new affinity mask: 10 +Verify with top ... 1, in VM that task load on vCPU1 (host core 5) and +not vCPU0 (host core 4). + +Set packet flows from ports through VM out ports +------------------------------------------------ + +Set flows IP address as needed for test. + +############################## Example flow test script: + +#! /bin/sh +# Move to command directory +cd /usr/src/ovs/utilities/ + +# Clear current flows +./ovs-ofctl del-flows br0 + +# Add bidirectional flows between +# port 1 (dpdk0) <--> VM port 3 (dpdkr0) <--> port 2 (dpdk1) +./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\ +nw_dst=1.1.1.2,idle_timeout=0,action=output:3 +./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\ +nw_dst=1.1.1.1,idle_timeout=0,action=output:3 + +./ovs-ofctl add-flow br0 in_port=3,dl_type=0x800,nw_src=1.1.1.1,\ +nw_dst=1.1.1.2,idle_timeout=0,action=output:2 +./ovs-ofctl add-flow br0 in_port=3,dl_type=0x800,nw_src=1.1.1.2,\ +nw_dst=1.1.1.1,idle_timeout=0,action=output:1 + +############################### + Restrictions: ------------- _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev