@Jay, actually I'm here for CAT. I also have another idea about the proposal, so catch you about it, let us sync all the ideas. :)
Thanks Alex 2017-02-22 11:17 GMT-05:00 Jay Pipes <jaypi...@gmail.com>: > Hi Eli, > > Sorry for top-posting. Just a quick note to say I had a good conversation > on Monday about this with Sean Mooney. I think we have some ideas on how to > model all of these resources in the new placement/resource providers schema. > > Are you at the PTG? If so, would be great to meet up to discuss... > > Best, > -jay > > On 02/21/2017 05:38 AM, Qiao, Liyong wrote: > >> Hi folks: >> >> >> >> Seeking community input on an initial design for Intel Resource Director >> Technology (RDT), in particular for Cache Allocation Technology in >> OpenStack Nova to protect workloads from co-resident noisy neighbors, to >> ensure quality of service (QoS). >> >> >> >> 1. What is Cache Allocation Technology (CAT)?** >> >> Intel’s RDT(Resource Director Technology) [1] is a umbrella of >> *hardware* support to facilitate the monitoring and reservation of >> shared resources such as cache, memory and network bandwidth towards >> obtaining Quality of Service. RDT will enable fine grain control of >> resources which in particular is valuable in cloud environments to meet >> Service Level Agreements while increasing resource utilization through >> sharing. CAT is a part of RDT and concerns itself with reserving for a >> process(es) a portion of last level cache with further fine grain >> control as to how much for code versus data. The below figure shows a >> single processor composed of 4 cores and the cache hierarchy. The L1 >> cache is split into Instruction and Data, the L2 cache is next in speed >> to L1. The L1 and L2 caches are per core. The Last Level Cache (LLC) is >> shared among all cores. With CAT on the currently available hardware the >> LLC can be partitioned on a per process (virtual machine, container, or >> normal application) or process group basis. >> >> >> >> Libvirt and OpenStack [2] already support monitoring cache (CMT) and >> memory bandwidth usage local to a processor socket (MBM_local) and total >> memory bandwidth usage across all processor sockets (MBM_total) for a >> process or process group. >> >> >> >> >> 2. How CAT works ** >> >> To learn more about CAT please refer to the Intel Processor Soft >> Developer's Manual >> <http://www.intel.com/content/www/us/en/processors/architect >> ures-software-developer-manuals.html> >> volume 3b, chapters 17.16 and 17.17 [3]. Linux kernel support for the >> same is expected in release 4.10 and documented at [4] >> >> >> 3. Libvirt Interface** >> >> >> Libvirt support for CAT is underway with the patch at reversion 7 >> >> >> >> Interface changes of libvirt: >> >> >> >> 3.1 The capabilities xml has been extended to reveal cache information ** >> >> >> >> <cache> >> >> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'> >> >> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> >> >> </bank> >> >> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'> >> >> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> >> >> </bank> >> >> </cache> >> >> >> >> The new `cache` xml element shows that the host has two *banks* of >> *type* L3 or Last Level Cache (LLC), one per processor socket. The cache >> *type* is l3 cache, its *size* 56320 KiB, and the *cpus* attribute >> indicates the physical CPUs associated with the same, here ‘0-21’, >> ‘44-65’ respectively. >> >> >> >> The *control *tag shows that bank belongs to scope L3, with a minimum >> possible allocation of 2816 KiB and still has 2816 KiB need to be >> reserved. >> >> >> >> If the host enabled CDP (Code and Data Prioritization) , l3 cache will >> be divided as code (L3CODE)and data (L3Data). >> >> >> >> Control tag will be extended to: >> >> ... >> >> <control min='2816' reserved='2816' unit='KiB' scope='L3CODE'/> >> >> <control min='2816' reserved='2816' unit='KiB' scope='L3DATA'/> >> >> … >> >> >> >> The scope of L3CODE and L3DATA show that we can allocate cache for >> code/data usage respectively, they share same amount of l3 cache. >> >> >> >> 3.2 Domain xml extended to include new CacheTune element ** >> >> >> >> <cputune> >> >> <vcpupin vcpu='0' cpuset='0'/> >> >> <vcpupin vcpu='1' cpuset='1'/> >> >> <vcpupin vcpu='2' cpuset='0'/> >> >> <vcpupin vcpu='3' cpuset='1'/> >> >> <cachetune id='0' host_id='0' type='l3' size='2816' unit='KiB' >> vcpus='0, 1/> >> >> <cachetune id='1' host_id='1' type='l3' size='2816' unit='KiB' >> vcpus=’2, 3’/> >> >> ... >> >> </cputune> >> >> >> >> This means the guest will be have vcpus 0, 1 running on host’s socket 0, >> with 2816 KiB cache exclusively allocated to it and vcpus 2, 3 running >> on host’s socket 0, with 2816 KiB cache exclusively allocated to it. >> >> >> >> Here we need to make sure vcpus 0, 1 are pinned to the pcpus of socket >> 0, refer capabilities >> >> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'>: >> >> >> >> Here we need to make sure vcpus 2, 3 are pinned to the pcpus of socket >> 1, refer capabilities >> >> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'>:. >> >> >> >> 3.3 Libvirt work flow for CAT** >> >> >> >> 1. Create qemu process and get it’s PIDs >> 2. Define a new resource control domain also known as >> *Cl*ass-*o*f-*S*ervice (CLOS) under /sys/fs/resctrl and set the >> desired *C*ache *B*it *M*ask(CBM) in the libvirt domain xml file in >> addition to updating the default schemata of the host >> >> >> >> 4. Proposed Nova Changes** >> >> >> >> 1. Get host capabilities from libvirt and extend compute node’ filed >> 2. Add new scheduler filter and weight to help schedule host for >> requested guest. >> 3. Extend flavor’s (and image meta) extra spec fields: >> >> >> >> We need to specify numa setting for NUMA hosts if we want to enable >> CAT, see [5] to learn more about NUMA. >> >> In flavor, we can have: >> >> >> >> vcpus=8 >> >> mem=4 >> >> hw:numa_nodes=2 - numa of NUMA nodes to expose to the guest. >> >> hw:numa_cpus.0=0,1,2,3,4,5 >> >> hw:numa_cpus.1=6,7 >> >> hw:numa_mem.0=3072 >> >> hw:numa_mem.1=1024 >> >> // new added in the proposal >> >> hw:cache_banks=2 ///cache banks to be allocated to a guest, (can be >> less than the number of NUMA nodes)/ >> >> hw:cache_type.0=l3 ///cache bank type, could be l3, l3data + l3code/ >> >> hw:cache_type.1=l3_c+d ///cache bank type, could be l3, l3data + l3code/ >> >> hw:cache_vcpus.0=0,1 ///vcpu list on cache banks, can be none/ >> >> hw:cache_vcpus.1=6,7 >> >> hw:cache_l3.0=2816 ///cache size in KiB./ >> >> hw:cache_l3_code.1=2816 >> >> hw:cache_l3_data.1=2816 >> >> >> >> Here, user can clear about which vcpus will benefit cache allocation, >> about cache bank, it’s should be co-work with numa cell, it will >> allocate cache on a physical CPU socket, but here cache bank is a logic >> concept. Cache bank will allocate cache for a vcpu list, all vcpu list >> should group >> >> >> >> Modify in addition the <cachetune> element in libvirt domain xml, see >> 3.2 for detail >> >> >> >> This will allocate 2 cache banks from the host’s cache banks and >> associate vcpus to the same. >> >> In the example, the guest will be have vcpus 0, 1 running on socket 0 of >> the host with 2816 KiB of cache for exclusive use and have vcpus 6, 7 >> running on socket 1 of the host with l3 code cache 2816KiB and l3 data >> with 2816KiB cache allocation. >> >> >> >> If a NUMA Cell were to contain multiple CPU sockets (this is rare), then >> we will adjust NUMA vCPU placement policy, to ensure that vCPUs and the >> cache allocated to them are all co-located on the same socket. >> >> >> >> * We can define less cache bank on a multiple NUMA cell node. >> * No cache_vcpus parameter needs to be specified if no reservation is >> desired. >> >> >> >> NOTE: the cache allocation for a guest is in isolated/exclusive mode. >> >> >> >> References** >> >> >> >> [1] >> http://www.intel.com/content/www/us/en/architecture-and-tech >> nology/resource-director-technology.html >> >> [2] https://blueprints.launchpad.net/nova/+spec/support-perf-event >> >> [3] >> http://www.intel.com/content/www/us/en/processors/architectu >> res-software-developer-manuals.html >> >> [4] >> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/tre >> e/Documentation/x86/intel_rdt_ui.txt?h=x86/cache >> >> >> [5] >> https://specs.openstack.org/openstack/nova-specs/specs/juno/ >> implemented/virt-driver-numa-placement.html >> >> >> >> >> >> >> Best Regards >> >> >> >> Eli Qiao(乔立勇)OpenStack Core team OTC Intel. >> >> -- >> >> >> >> >> >> ____________________________________________________________ >> ______________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib >> e >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev