On Tue, Nov 28, 2017 at 08:50:54PM +0100, Paolo Bonzini wrote: > On 28/11/2017 19:54, Michael S. Tsirkin wrote: > >> Now there's a downside: with L3 cache the Linux scheduler is more eager > >> to wake up tasks on sibling CPUs, resulting in unnecessary cross-vCPU > >> interactions and therefore exessive halts and IPIs. E.g. "perf bench > >> sched pipe -i 100000" gives > >> > >> l3-cache #res IPI /s #HLT /s #time /100000 loops > >> off 200 (no K) 230 0.2 sec > >> on 400K 330K 0.5 sec > >> > >> In a more realistic test, we observe 15% degradation in VM density > >> (measured as the number of VMs, each running Drupal CMS serving 2 http > >> requests per second to its main page, with 95%-percentile response > >> latency under 100 ms) with l3-cache=on. > >> > >> We think that mostly-idle scenario is more common in cloud and personal > >> usage, and should be optimized for by default; users of highly loaded > >> VMs should be able to tune them up themselves. > > Hi Denis, > > thanks for the report. I think there are two cases: > > 1) The dedicated pCPU case: do you still get the performance degradation > with dedicated pCPUs?
I wonder why dedicated pCPU would matter at all? The behavior change is in the guest scheduler. > 2) The non-dedicated pCPU case: do you still get the performance > degradation with threads=1? If not, why do you have sibling vCPUs at > all, if you don't have a dedicated physical CPU for each vCPU? We have sibling vCPUs in terms of cores, not threads. I.e. the configuration in the test was sockets=1,cores=8,threads=1. Are you suggesting that it shouldn't be used without pCPU binding? Roman.