Like i said, while 12 vms are runing on this hosts, im conducting this test only from the hosts, not the vms, since i can see the delay on the host side already. I really want to make the delay go away on the host first, and once solved, going down the vnet -> vm:eth0.
Obviously the ethX tuning (ethtool are done on the host) but the sysctl tuning is done on the host and the guest also. I was thinking about the bonding being one culprit, but i want to test buffer settings first (queues and ring parameters). regarding bonding, what kind of problems you had ? are you now running direct eth0 -> br100 settings ? Is it vhost_net any good ? We are just using virtio. thanks! On Wed, Jan 15, 2014 at 1:32 PM, Narayan Desai <narayan.de...@gmail.com>wrote: > Are you using virtio, and vhost_net? > > Also, where are you tuning those parameters, host or guest? The ethernet > level ones will definitely need to be done in the host, but the TCP and > socket buffer ones need to be in the guest. > > Also, these buffers may be too large for 2x1ge. You might also check if > the link aggregation is messing you up here. I've generally had problems > with it. > > One last thing: how does the app run from the hypervisor? You can rule a > lot of things out by testing that. > -nld > > > On Wed, Jan 15, 2014 at 9:58 AM, Alejandro Comisario < > alejandro.comisa...@mercadolibre.com> wrote: > >> Hi Narayan, thanks fir the prompted response, let me try to give you some >> insights. >> Since like you said, our setup can reach the maximum bandwidth in some >> tests, but we cant achieve the THROUGHPUT we want, we run avg of 14vms on >> 128GB of ram compute nodes, and while all those vms are runing, we run a >> test between two compute nodes with a c++ aplication that sends 50 packets >> per second (lower than our 1500bytes MTU) and waits from the response from >> the target server lower than 10ms. >> This test runs on the br100 interface on both compute nodes (passign >> through eth0-eth1, bond0 and br100) and while all vms are running (using >> high throughput low bandwidth applications) we see this simple tests >> showing thenths of thousands responses higher than 10ms, actually 99% of >> this slow responses are taking 20/21ms, i dont seem to find whats that >> magic delay value means, we are starting to look traversing what interfaces >> are adding the delay. >> >> Let me show you what our settings look like regarding networking (i will >> take the vms out of the picture) >> >> COMPUTE HOST >> ------------ >> 2x1Gb bonded interfaces (no jumbo frames, 1500MTU since jumbo frames are >> a separate project) >> >> Ethernet ring settings on both interfaces: >> >> RX 256 >> TX 256 >> >> >> Ethernet txqueuelen on both interfaces: >> >> txqueuelen 1000 >> >> >> sysctl settings: >> >> net.ipv4.tcp_max_tw_buckets = 3600000 >> net.ipv4.tcp_max_syn_backlog = 30000 >> net.core.netdev_max_backlog = 50000 >> net.core.somaxconn = 16384 >> net.core.rmem_max = 16777216 >> net.core.wmem_max = 16777216 >> net.ipv4.tcp_rmem = 4096 87380 16777216 >> net.ipv4.tcp_wmem = 4096 65536 16777216 >> net.core.rmem_default = 16777216 >> net.core.wmem_default = 16777216 >> net.ipv4.tcp_congestion_control = cubic >> net.ipv4.ip_local_port_range = 1024 65000 >> net.ipv4.tcp_fin_timeout = 5 >> net.ipv4.tcp_keepalive_time = 5 >> net.ipv4.tcp_tw_recycle = 1 >> net.ipv4.tcp_tw_reuse = 1 >> vm.swappiness = 0 >> net.ipv4.tcp_syncookies = 1 >> net.ipv4.tcp_timestamps = 1 >> net.ipv4.tcp_max_orphans = 60000 >> net.ipv4.tcp_synack_retries = 3 >> net.ipv4.tcp_ecn=1 >> net.ipv4.tcp_sack=1 >> net.ipv4.tcp_dsack=1 >> net.ipv4.route.flush = 1 >> net.ipv6.route.flush = 1 >> net.ipv4.netfilter.ip_conntrack_udp_timeout = 30 >> net.ipv4.netfilter.ip_conntrack_tcp_timeout_close = 10 >> net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 120 >> net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait = 60 >> net.ipv4.netfilter.ip_conntrack_max = 1200000 >> net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 432000 >> net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60 >> net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120 >> net.ipv4.tcp_keepalive_time = 90 >> >> One other tip i can add is that allways the delay is on the RX side, this >> means, the server responding. >> So, we were thinking about going upper with ring or txqueuelen settings. >> >> Any idea ? >> >> >> >> >> >> >> *Alejandro Comisario #melicloud CloudBuilders* >> Arias 3751, Piso 7 (C1430CRG) >> Ciudad de Buenos Aires - Argentina >> Cel: +549(11) 15-3770-1857 >> Tel : +54(11) 4640-8443 >> >> >> On Wed, Jan 15, 2014 at 12:32 AM, Narayan Desai >> <narayan.de...@gmail.com>wrote: >> >>> We don't have a workload remotely like that (generally, we have a lot >>> more demand for bandwidth, but we also generally run faster networks than >>> that as well), but 1k pps sounds awfully low. Like low by several orders of >>> magnitude. >>> >>> I didn't measure pps in our benchmarking, but did manage to saturate a >>> 10GE link from a VM (actually we did this on 10 nodes at a time to saturate >>> a 100GE wide area link), and all of those settings are here: >>> >>> http://buriedlede.blogspot.com/2012/11/driving-100-gigabit-network-with.html >>> >>> I'd start trying to do some fault isolation; see if you can get NAT out >>> of the mix, for example, or see if it is a network stack tuning problem. >>> You probably need to crank up some of your buffer sizes, even if you don't >>> need to mess with your TCP windows. >>> >>> Can you actually saturate your 2x1ge lag with bandwidth? (single or >>> ganged flows?) >>> -nld >>> >>> >>> On Tue, Jan 14, 2014 at 3:52 PM, Alejandro Comisario < >>> alejandro.comisa...@mercadolibre.com> wrote: >>> >>>> Wow, its kinda hard to imagine we are the only ones that have only >>>> 100Mb/s bandwidth but 50.000 requests per minute on each compute, i mean, >>>> lots of throughput, almost none bandwith. >>>> >>>> Everyone has their networking performance figured out ? >>>> No one to share some "SUPER THROUGHPUT" sysctl / ethtool / power / etc >>>> settings on the compute side ? >>>> >>>> Best regards. >>>> >>>> * alejandrito* >>>> >>>> On Sat, Jan 11, 2014 at 4:12 PM, Alejandro Comisario < >>>> alejandro.comisa...@mercadolibre.com> wrote: >>>> >>>>> Well, its been a long time since we use nova with KVM, we got over the >>>>> many thousand vms, and still, something doesnt feel right. >>>>> We are using ubuntu 12.04 kernel 3.2.0-[40-48], tuned sysctl with lots >>>>> of parameters, and everything ... works, you can say, quite well. >>>>> >>>>> But here's the deal, we have an special networking scenario that is, >>>>> EVERYTHING IS APIS, everything is throughput, no bandwidth. >>>>> Every 2x1Gb bonded compute node, doesnt get over the [200Mb/s - >>>>> 400Mb/s] but its handling hundreds of thousands requests per minute to the >>>>> vms. >>>>> >>>>> And once in a while, gives you the sensation that everything goes to >>>>> hell, timeouts from aplications over there, response times from apis going >>>>> from 10ms to 200ms over there, 20ms delays happening between the vm ETH0 >>>>> and the VNET interface, etc. >>>>> So, since its a massive scenario to tune, we never kinda, nailedon >>>>> WHERE TO give this 1, 2 or 3 final buffer/ring/affinity tune to make >>>>> everything work from the compute side. >>>>> >>>>> I know its a little awkward, but im craving, and jaunting for >>>>> community real life examples regarding "HIGH THROUGHPUT" tuning with KVM >>>>> scenarios, dark linux or if someone can help me go through configurations >>>>> that might sound weird / unnecesary / incorrect. >>>>> >>>>> For those who are wondering, well ... i dont know what you have, lets >>>>> start with this. >>>>> >>>>> COMPUTE NODES (99% of them, different vendors, but ...) >>>>> * 128/256 GB of ram >>>>> * 2 hexacores with HT enabled >>>>> * 2x1Gb bonded interfaces (want to know the more than 20 models we are >>>>> using, just ask for it) >>>>> * Multi queue interfaces, pined via irq to different cores >>>>> * ubuntu 12.04 kernel 3.2.0-[40-48] >>>>> * Linux bridges, no VLAN, no open-vswitch >>>>> >>>>> I want to try to keep the networking appliances ( TOR's, AGGR, CORES ) >>>>> as out of the picture as possible. >>>>> im thinking "i hope this thread gets great, in time" >>>>> >>>>> So, ready to learn as much as i can. >>>>> Thank you openstack community, as allways. >>>>> >>>>> alejandrito >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> OpenStack-operators mailing list >>>> openstack-operat...@lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>> >>>> >>> >> >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack