Re: [smartos-discuss] low network bandwidth with 40 vms

Garrett D'Amore Tue, 03 May 2016 13:30:29 -0700

Reading the driver for e1000g, it appears that this hardware (or the driver
at any rate) only supports a single receive ring.  This means that its
impossible to spread interrupt loading, because you don’t have multiple
rings.  (Put another way — in Windows friendly parlance, it appears that
this hardware does not support RSS.)

 Intel hardware using igb does support RSS, but the e1000g seems a bit more
budget minded.

However, we’re talking about 1Gbps hardware.  A single CPU should have no
problem keeping up with this.  Even the context switches to service threads
run on separate CPUs are not going to have much impact.  (In my testing,
running on the “wrong” CPU generally added on the order of about a
microsecond of latency to the frames where it occurred.  I doubt you’re
using tiny frames — if your average frame size is 500 bytes, then you’re
going to peak at 250K pps (1Gbps limit), or spend about 4 us per packet.
An extra 1 us would definitely be measurable, but it would not have
anything like the impact you show above.  The results are far too
devastating to be the result of lack of RSS.

Now, having said that, when you say through put, what are you measuring?
Is it packets per second out of the interface?  Or into the interface?  Or
something somewhere else?  Are these “vms” using kvm or are they rather
native or lx branded zones?  (If the former, I wouldn’t be at all surprised
if you are simply running into other resource limitations, as kvm has a
much higher per-zone impact.  It also would not surprise me in the least if
it is shown that there are bottlenecks or hot locks inside the kvm code.

The other thing is that because kvm guests run with a whole machine
emulation, as they fight for available CPU their NIC performance is going
to drop noticeably.  This means also that a busy process in one vm can have
a negative bandwidth impact on other vms, even if that first vm is *not*
sending or receiving any data.  (Note that this is due to the hardware
being emulated in software.  Native zones and LX shouldn’t suffer from this
at all, since they access real hardware and any kernel side code run on
their behalf runs at higher priority than user space code.)

One thing I’d look at during this is to determine what your system is
doing.  Richard already proposed looking at intrstat.  You could also use
powertop.  Personally, my gut instinct (not backed by any data, note) is
that you maybe suffering from lock contention . So I’d do a little poking
with lockstat to look for hot locks.  (As I indicated, I would not be at
all surprised to learn that kvm has some very hot locks in it.  But as I
said, you could be fighting purely for CPU in the kvm instances, and that
won’t necessarily show up as a contended lock.)

  - Garrett

On Tue, May 3, 2016 at 10:16 AM, Stefan <[email protected]> wrote:

> Dear List,
>
> on our machines with about 40 vservers we observe low network
> throughput.  It seems to scale inversely with the number of vms on the
> respective node:
>
>    # vms  bw (Mbps)
>    -----  ---------
>       37         60
>       24         94
>       18        174
>       12        608
>        5        933
>        2        934
>        1        935
>
> The measurements were obtained using iperf from a different physical
> machine. It has been suggested that the slowdown may be due to
> interrupt 51 being clamped to CPU 12:
>
>    # echo '::interrupts -d' | mdb -k
>    IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# Driver Name(s)
>    :
>    49   0x40 5   PCI    Edg MSI    8   1     -         mpt#0
>    50   0x60 6   PCI    Edg MSI    11  1     -         e1000g#0
>    51   0x61 6   PCI    Edg MSI    12  1     -         e1000g#1
>    160  0xa0 0          Edg IPI    all 0     -         poke_cpu
>    :
> 
> If this is the cause of the problem we would like to deliver the
> interrupts to all of the cpus.  How do we achieve this with smartos?
> 
> Kind Regards,
> Stefan
> 

-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Re: [smartos-discuss] low network bandwidth with 40 vms

Reply via email to