OK
Dear Friend. I am Mr. .Ahmed Zama .I am sending this brief letter to solicit your partnership to € 15 MILLION Euros into your account. I shall send you more information and procedures when I receive positive response from you. If you are interested, send to me the followings immediately Full Names Age Nationality Occupation Scanned copy of your International Passport Direct Telephone Lines Mr Ahmed Zama
OK
Greetings, I humbly solicit for your partnership to transfer €15 million Euros into your personal or company’s account .As soon as the fund is successfully transferred, You shall be entitled to 30% of the total sum.60% will be for me while 10% will be set aside for expenses that may be incurred on the process of transferring the fund. Contact me for more detailed explanation. Kindly send me the followings Full Names Address Occupation Direct Mobile Telephone Lines Nationality Ahmed Zama +22675844869
OK
Greetings, I humbly solicit for your partnership to transfer €15 million Euros into your personal or company’s account .I will offer you 30% of the total sum,60% will be for me while 10% will set aside for expenses that may be incurred on the process of transferring the fund. Contact me for more detailed explanation. Kindly send me the followings Full Names Address Occupation Direct Mobile Telephone Lines Nationality Ahmed Zama
OK
Greetings, I humbly solicit for your partnership to transfer €15 million Euros into your personal or company’s account .Contact me for more detailed explanation. Kindly send me the followings Full Names Address Occupation Direct Mobile Telephone Lines Nationality Ahmed Zama +22675844869
OK
Greetings, I humbly solicit for your partnership to transfer €15 million Euros into your personal or company’s account .Contact me for more detailed explanation. Kindly send me the followings Full Names Address Occupation Direct Mobile Telephone Lines Nationality Ahmed Zama
RE: MEETING IN DUBAI
Dear i only need your help to meet with Mr kelly adams who is right now in Dubai,you will play a roll of the beneficiary of the funds which i have agreed to give you 40% of the total sum,this money will be used for investment in uae,please the box is right now in London security company,once you cooperate with my representative and have meeting together,we can now proceed to instruct the company to ship down the consignment to Dubai through diplomatic shipment,the duty of my representative is to stay with you until you receive the cash,then once you have every thing in your control,you will give him one million dollars cash from the box to bring down for me,as you know my government over here has confiscated all my bank account,i only have that said funds secret for now,please keep this transaction private to your self,then invest the rest of the funds in good business of your choice in ant country in the world,just book your ticket to Dubai,three days we can finish ok,not every thing we talk on the phone or email,must things will be discus face to face thanks for your understanding fill free to call me at your convenient time please cooperate with Mr kelly adams, attached is my identity i send you a proposals thanks for your understanding we will put all money in real estate REPLY BACK HERE : nikolai.nikolai...@gmail.com Best Regards MOHAMED ABDUL
HELLO DEAR
With Due Respect, I know that this mail will come to you as a surprise as we have never met before, but need not to worry as I am contacting you independently of my investigation and no one is informed of this communication. I need your urgent assistance in transferring the sum of $11.3million immediately to your private account.The money has been here in our Bank lying dormant for years now without anybody coming for the claim of it. I want to release the money to you as the relative to our deceased customer (the account owner) who died a long with his supposed Next Of Kin since 16th October 2005. The Banking laws here does not allow such money to stay more than 14 years, because the money will be recalled to the Bank treasury account as unclaimed fund. By indicating your interest I will send you the full details on how the business will be executed. Please respond urgently and delete if you are not interested. Best Regards, Mr.Ahmed Ouedraogo.
Please Respond Urgently.
With due respect, I am inviting you for a business deal of Eleven Million Three hundred thousand united states dollars where this money can be shared between us. By indicating your interest I will send you the full details on how the business will be executed. Please send your reply to my private email --- ouedraogoah...@outlook.com
With due respect.
Dear Friend, I know that this mail will come to you as a surprise as we have never met before, but need not to worry as I am contacting you independently of my investigation and no one is informed of this communication. I need your urgent assistance in transferring the sum of $11.3million immediately to your private account.The money has been here in our Bank lying dormant for years now without anybody coming for the claim of it. I want to release the money to you as the relative to our deceased customer (the account owner) who died a long with his supposed Next Of Kin since 16th October 2005. The Banking laws here does not allow such money to stay more than 15 years, because the money will be recalled to the Bank treasury account as unclaimed fund. By indicating your interest I will send you the full details on how the business will be executed. Please respond urgently and delete if you are not interested. Best Regards, Ahmed Ouedraogo.
WITH DUE RESPECT.
Dear Friend, I know that this mail will come to you as a surprise as we have never met before, but need not to worry as I am contacting you independently of my investigation and no one is informed of this communication. I need your urgent assistance in transferring the sum of $11.3million immediately to your private account.The money has been here in our Bank lying dormant for years now without anybody coming for the claim of it. I want to release the money to you as the relative to our deceased customer (the account owner) who died a long with his supposed Next Of Kin since 16th October 2005. The Banking laws here does not allow such money to stay more than 15 years, because the money will be recalled to the Bank treasury account as unclaimed fund. By indicating your interest I will send you the full details on how the business will be executed. Please respond urgently and delete if you are not interested. Best Regards, Ahmed Ouedraogo.
HELLO DEAR.
Dear Friend, I know that this mail will come to you as a surprise as we have never met before, but need not to worry as I am contacting you independently of my investigation and no one is informed of this communication. I need your urgent assistance in transferring the sum of $11.3million immediately to your private account.The money has been here in our Bank lying dormant for years now without anybody coming for the claim of it. I want to release the money to you as the relative to our deceased customer (the account owner) who died a long with his supposed Next Of Kin since 16th October 2005. The Banking laws here does not allow such money to stay more than 15 years, because the money will be recalled to the Bank treasury account as unclaimed fund. By indicating your interest I will send you the full details on how the business will be executed. Please respond urgently and delete if you are not interested. Best Regards, Ahmed Ouedraogo.
HELLO DEAR .
Dear Friend, I know that this mail will come to you as a surprise as we have never met before, but need not to worry as I am contacting you independently of my investigation and no one is informed of this communication. I need your urgent assistance in transferring the sum of $11.3million immediately to your private account.The money has been here in our Bank lying dormant for years now without anybody coming for the claim of it. I want to release the money to you as the relative to our deceased customer (the account owner) who died a long with his supposed Next Of Kin since 16th October 2005. The Banking laws here does not allow such money to stay more than 15 years, because the money will be recalled to the Bank treasury account as unclaimed fund. By indicating your interest I will send you the full details on how the business will be executed. Please respond urgently and delete if you are not interested. Best Regards, Ahmed Ouedraogo.
I need your cooperation.
With due respect, I am inviting you for a business deal of Eleven Million Three hundred thousand united states dollars where this money can be shared between us if you agree to my business proposal. By indicating your interest I will send you the full details on how the business will be executed. If you are interested please send your reply to my private email --- ouedraogoah...@outlook.com
Greeting!!!
Greeting!!! I am Mr Ahmed Hassan, I have a business transaction of ($11.3 million) By indicating your interest I will send you the full details on how the business will be executed. Please respond urgently for more details and delete if you are not interested. Best Regards Mr Ahmed Hassan +22968776349 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/11] FUSE - core
unsubscribe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
NTOP for Redhat
I tried to install my Linux Redhat the Network Monitoring system call Ntop and the following messages is what I am getting each time I execute make. I thought Libpcap is what is needed and I installed but it did not help. Can any body out there help me whit this. The following is the message that I am receiving form the system installations Thanks creating config.h config.h is unchanged make all-recursive make[1]: Entering directory `/etc/ntop/ntop-1.3.1' Making all in gdchart0.94b make[2]: Entering directory `/etc/ntop/ntop-1.3.1/gdchart0.94b' cc -Igd1.3 -I. -g -c gdc.c cc -Igd1.3 -I. -g -c gdchart.c cc -g -c price_conv.c cc -Igd1.3 -I. -g -c gdc_pie.c cd gd1.3 ; make -f Makefile libgd.a make[3]: Entering directory `/etc/ntop/ntop-1.3.1/gdchart0.94b/gd1.3' cc -O -c -o gd.o gd.c cc -O -c -o gdfontt.o gdfontt.c cc -O -c -o gdfonts.o gdfonts.c cc -O -c -o gdfontmb.o gdfontmb.c cc -O -c -o gdfontl.o gdfontl.c cc -O -c -o gdfontg.o gdfontg.c rm -f libgd.a ar rc libgd.a gd.o gdfontt.o gdfonts.o gdfontmb.o \ gdfontl.o gdfontg.o make[3]: Leaving directory `/etc/ntop/ntop-1.3.1/gdchart0.94b/gd1.3' make[2]: Leaving directory `/etc/ntop/ntop-1.3.1/gdchart0.94b' Making all in . make[2]: Entering directory `/etc/ntop/ntop-1.3.1' /bin/sh ./libtool --mode=compile gcc -DHAVE_CONFIG_H -I. -I./gdchart0.94b -I/usr/include/pcap-g -O2 -pipe -c admin.c mkdir .libs gcc -DHAVE_CONFIG_H -I. -I./gdchart0.94b -I/usr/include/pcap -g -O2 -pipe -c admin.c -fPIC -DPIC -o .libs/admin.lo In file included from admin.c:23: ntop.h:380: pcap.h: No such file or directory In file included from admin.c:23: ntop.h:465: field `h' has incomplete type ntop.h:567: parse error before `pcap_t' ntop.h:567: warning: no semicolon at end of struct or union ntop.h:572: `filter' redeclared as different kind of symbol /usr/include/ncurses.h:447: previous declaration of `filter' ntop.h:655: parse error before `}' ntop.h:655: warning: data definition has no type or storage class ntop.h:1083: field `fcode' has incomplete type ntop.h:1277: field `h' has incomplete type In file included from ntop.h:1534, from admin.c:23: globals-core.h:38: parse error before `device' globals-core.h:38: warning: data definition has no type or storage class make[2]: *** [admin.lo] Error 1 make[2]: Leaving directory `/etc/ntop/ntop-1.3.1' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/etc/ntop/ntop-1.3.1' make: *** [all-recursive-am] Error 2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] sparse: Track the boundaries of memory sections for accurate checks
When sparse memory model is used an array of memory sections is created to track each block of contiguous physical pages. Each element of this array contains PAGES_PER_SECTION pages. During the creation of this array the actual boundaries of the memory block is lost, so the whole block is either considered as present or not. pfn_valid() in the sparse memory configuration checks which memory sections the pfn belongs to then checks whether it's present or not. This yields sub-optimal results when the available memory doesn't cover the whole memory section, because pfn_valid will return 'true' even for the unavailable pfns at the boundaries of the memory section. If pfn_valid() returns 'true' this means that this is a valid RAM page and that it is controlled by the kernel (there's a 'struct page' backing it) which is not the case if this pfn happens to be unavailable and at the boundaries of the memory section and given the pattern of using pfn_valid just before accessing the 'struct page' (through pfn_to_page) which can lead to a lot of surprises. For example this hunk of code in '__ioremap_check_ram': if (pfn_valid(start_pfn + i) && !PageReserved(pfn_to_page(start_pfn + i))) return 1; which can return '1' even for a pfn that's not valid! or this other hunk (which is almost the same pattern) in 'kvm_is_reserved_pfn': if (pfn_valid(pfn)) return PageReserved(pfn_to_page(pfn)); which can return false for the same reason (which will trigger a BUG_ON at the call-site). Using 'mem=' kernel parameter will have the same effect on pfn_valid() because even though the memory at the memory section boundary can be RAM, it's not valid because there's no 'struct page' for it. Cc: Andrew Morton Cc: Mel Gorman Cc: Vlastimil Babka Cc: Michal Hocko Cc: Johannes Weiner Cc: Yaowei Bai Cc: Dan Williams Cc: Joe Perches Cc: Tejun Heo Cc: Anthony Liguori Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed Signed-off-by: Jan H. Schönherr --- v2: A little bit more verbose commit message to explain why 'sub-optimal' results can actually cause problems. --- include/linux/mmzone.h | 22 -- mm/sparse.c| 37 - 2 files changed, 52 insertions(+), 7 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 02069c2..f76a0e1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1067,8 +1067,12 @@ struct mem_section { * section. (see page_ext.h about this.) */ struct page_ext *page_ext; - unsigned long pad; + unsigned long pad[3]; #endif + + unsigned long first_pfn; + unsigned long last_pfn; + /* * WARNING: mem_section must be a power-of-2 in size for the * calculation and use of SECTION_ROOT_MASK to make sense. @@ -1140,23 +1144,29 @@ static inline int valid_section_nr(unsigned long nr) static inline struct mem_section *__pfn_to_section(unsigned long pfn) { + if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) + return NULL; + return __nr_to_section(pfn_to_section_nr(pfn)); } #ifndef CONFIG_HAVE_ARCH_PFN_VALID static inline int pfn_valid(unsigned long pfn) { - if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) + struct mem_section *ms; + + ms = __pfn_to_section(pfn); + + if (ms && !(ms->first_pfn <= pfn && ms->last_pfn >= pfn)) return 0; - return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); + + return valid_section(ms); } #endif static inline int pfn_present(unsigned long pfn) { - if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) - return 0; - return present_section(__nr_to_section(pfn_to_section_nr(pfn))); + return present_section(__pfn_to_section(pfn)); } /* diff --git a/mm/sparse.c b/mm/sparse.c index 5d0cf45..3c91837 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -166,24 +166,59 @@ void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn, } } +static int __init +overlaps(u64 start1, u64 end1, u64 start2, u64 end2) +{ + u64 start, end; + + start = max(start1, start2); + end = min(end1, end2); + return start <= end; +} + /* Record a memory area against a node. */ void __init memory_present(int nid, unsigned long start, unsigned long end) { + unsigned long first_pfn = start; unsigned long pfn; start &= PAGE_SECTION_MASK; mminit_validate_memmodel_limits(&start, &end); for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) { unsigned long section = pfn_to_section_nr(pfn); + unsign
[PATCH] kvm, x86: Properly check whether a pfn is an MMIO or not
pfn_valid check is not sufficient because it only checks if a page has a struct page or not, if for example "mem=" was passed to the kernel some valid pages won't have a struct page. This means that if guests were assigned valid memory that lies after the mem= boundary it will be passed uncached to the guest no matter what the guest caching attributes are for this memory. Use the original e820 map to check whether a certain pfn belongs to RAM or not. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Borislav Petkov Cc: Denys Vlasenko Cc: Andrew Morton Cc: Toshi Kani Cc: Tony Luck Cc: linux-kernel@vger.kernel.org Cc: k...@vger.kernel.org Cc: x...@kernel.org Signed-off-by: KarimAllah Ahmed --- arch/x86/include/asm/e820.h | 1 + arch/x86/kernel/e820.c | 18 ++ arch/x86/kvm/mmu.c | 2 +- 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h index 3ab0537..2d4f7d8 100644 --- a/arch/x86/include/asm/e820.h +++ b/arch/x86/include/asm/e820.h @@ -16,6 +16,7 @@ extern struct e820map e820_saved; extern unsigned long pci_mem_start; extern int e820_any_mapped(u64 start, u64 end, unsigned type); extern int e820_all_mapped(u64 start, u64 end, unsigned type); +extern bool e820_is_ram(u64 addr); extern void e820_add_region(u64 start, u64 size, int type); extern void e820_print_map(char *who); extern int diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 621b501..387cdba 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -105,6 +105,24 @@ int __init e820_all_mapped(u64 start, u64 end, unsigned type) return 0; } +bool +e820_is_ram(u64 addr) +{ + int i; + + for (i = 0; i < e820_saved.nr_map; i++) { + struct e820entry *ei = &e820_saved.map[i]; + + if (ei->type != E820_RAM) + continue; + if ((addr >= ei->addr) && (addr < (ei->addr + ei->size))) + return true; + } + + return false; +} +EXPORT_SYMBOL_GPL(e820_is_ram); + /* * Add a memory region to the kernel e820 map. */ diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 24e8001..5e07bf5 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2507,7 +2507,7 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) if (pfn_valid(pfn)) return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn)); - return true; + return !e820_is_ram(pfn << PAGE_SHIFT); } static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, -- 2.8.2
RE: Congratulations!!!
Congratulations money was donated to you reply,harold-diam...@outlook.com. for more info. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
OK
Greetings Please assist me to receive about 15 million euros into your personal account. I will give you details as I hear from you. Regard, Mr Ahmed Zama
Please respond urgently!
Dear Friend, I know that this mail will come to you as a surprise as we have never met before, but need not to worry as I am contacting you independently of my investigation and no one is informed of this communication. I need your urgent assistance in transferring the sum of $11.3million immediately to your private account.The money has been here in our Bank lying dormant for years now without anybody coming for the claim of it. I want to release the money to you as the relative to our deceased customer (the account owner) who died a long with his supposed NEXT OF KIN since 16th October 2005. The Banking laws here does not allow such money to stay more than 13 years, because the money will be recalled to the Bank treasury account as unclaimed fund. By indicating your interest I will send you the full details on how the business will be executed. Please respond urgently and delete if you are not interested. Best Regards, Mr. Ahmed Hassan.
OK
Greetings Please assist me to receive about 15 million euros into your personal account. I will give you details as I hear from you. Send me the followings, Age Nationality Occupation Telephone Line Regard, Mr Ahmed Zama
RE: Privileged and Confidential:
RE: Privileged and Confidential: Here I brought a potential Business Proposal at your door step for consideration. I have a client that is interested to Invest in your Country and would like to engage you and your company on this project. The Investment Amount is valued at US$500 million. If you are interested, kindly include your direct telephone numbers for full discussion of this offer when responding to this email. Respectfully, AHMED KARIM Email reply here mohamedabdul1...@gmail.com
RE: Privileged and Confidential:
Dear Respectfully, My name is Ahmed Abdul from Syria,please i need your urgent assistance to help me and my two daughters relocate out of Syria because of the recent bombing by president Trump and his intentions to bomb more..I need you to help us relocate including our belongings and funds for we are good people and would not like to be treated as refugees and we have the cash to buy a new house , good school for my kids and a good business for us to start a new life millions of dollars is involved right now move us to any Muslim world asap. Please help us. the money is now cash with secrete security company in india, as Indian i need your help to talk with the delivering agent in India to deliver it for your home $40% is for you while the rest is for me,please keep this mail secrete and confidential send me your cell phone number and they will contact you from India reply here " abdulmohamed66...@gmail.com Yours Sincerely AHMED ABDUL
hello
-- DEAR FRIEND I am MR.MUSA AHMED With the business proposal deal of US(US$18.5 mllion US Dollars) to transfer into your account, if you are interested get back to me for more detail.at my E-mail (mr.musa.ahme...@gmail.com) Best Regard MR.MUSA AHMED --
Re: [iptables] extensions: add support for 'srh' match
On Wed, 10 Jan 2018 16:32:24 +0100 Pablo Neira Ayuso wrote: > On Fri, Dec 29, 2017 at 12:08:25PM +0100, Ahmed Abdelsalam wrote: > > This patch adds a new exetension to iptables to supprt 'srh' match > > The implementation considers revision 7 of the SRH draft. > > https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07 > > > > Signed-off-by: Ahmed Abdelsalam > > --- > > extensions/libip6t_srh.c| 283 > > > > include/linux/netfilter_ipv6/ip6t_srh.h | 63 +++ > > Please, add a extensions/libip6t_srh.t test file and send a v2. > > Thanks. Ok, Is there minimum requirements of the test cases to be added to the extensions/libip6t_srh.t file ? -- Ahmed
Re: [PATCH v3 0/4] KVM: Expose speculation control feature to guests
On 01/30/2018 10:00 AM, David Woodhouse wrote: On Tue, 2018-01-30 at 01:10 +0100, KarimAllah Ahmed wrote: Add direct access to speculation control MSRs for KVM guests. This allows the guest to protect itself against Spectre V2 using IBRS+IBPB instead of a retpoline+IBPB based approach. It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future Intel processors to indicate RDCL_NO and IBRS_ALL. Thanks. I think you've already fixed the SPEC_CTRL patch in the git tree so that it adds F(IBRS) to kvm_cpuid_8000_0008_ebx_x86_features, right? Yup, this is already fixed in the tree. The SVM part of Ashok's IBPB patch is still exposing the PRED_CMD MSR to guests based on boot_cpu_has(IBPB), not based on the *guest* capabilities. Looking back at Paolo's patch set from January 9th, it was done differently there but I think it had the same behaviour? The rest of Paolo's patch set I think has been covered, except 6/8: lkml.kernel.org/r/20180109120311.27565-7-pbonz...@redhat.com That exposes SPEC_CTRL for SVM too (since AMD now apparently has it). If adding that ends up with duplicate MSR handling for get/set, perhaps that wants shifting up into kvm_[sg]et_msr_common()? Although I don't see offhand where you'd put the ->spec_ctrl field in that case. It doesn't want to live in the generic (even to non-x86) struct kvm_vcpu. So maybe a little bit of duplication is the best answer. Other than those details, I think we're mostly getting close. Do we want to add STIBP on top? There is some complexity there which meant I was happier getting these first bits ready first, before piling that on too. I believe Ashok sent you a change which made us do IBPB on *every* vmexit; I don't think we need that. It's currently done in vcpu_load() which means we'll definitely have done it between running one vCPU and the next, and when vCPUs are pinned we basically never need to do it. We know that VMM (e.g. qemu) userspace could be vulnerable to attacks from guest ring 3, because there is no flush between the vmexit and the host kernel "returning" to the userspace thread. Doing a full IBPB on *every* vmexit would protect from that, but it's overkill. If that's the reason, let's come up with something better. Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH v3 4/4] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/30/2018 06:49 PM, Jim Mattson wrote: On Mon, Jan 29, 2018 at 4:10 PM, KarimAllah Ahmed wrote: [ Based on a patch from Ashok Raj ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- v2: - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). - special case writing '0' in SPEC_CTRL to avoid confusing live-migration when the instance never used the MSR (dwmw@). - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). v3: - Save/restore manually - Fix CPUID handling - Fix a copy & paste error in the name of SPEC_CTRL MSR in disable_intercept. - support !cpu_has_vmx_msr_bitmap() --- arch/x86/kvm/cpuid.c | 7 +-- arch/x86/kvm/vmx.c | 59 arch/x86/kvm/x86.c | 2 +- 3 files changed, 65 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 1909635..662d0c0 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | + F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); entry->edx = 0; - /* IBPB isn't necessarily present in hardware cpuid */ + /* IBRS and IBPB aren't necessarily present in hardware cpuid */ if (boot_cpu_has(X86_FEATURE_IBPB)) entry->ebx |= F(IBPB); + if (boot_cpu_has(X86_FEATURE_IBRS)) + entry->ebx |= F(IBRS); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 798a00b..9ac9747 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -582,6 +582,8 @@ struct vcpu_vmx { u64 msr_guest_kernel_gs_base; #endif u64 arch_capabilities; + u64 spec_ctrl; + bool save_spec_ctrl_on_exit; u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; @@ -922,6 +924,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3226,6 +3230,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + msr_info->data = to_vmx(vcpu)->spec_ctrl; + break; case MSR_IA32_ARCH_CAPABILITIES: if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) @@ -3339,6 +3350,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
Re: [PATCH v3 4/4] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/30/2018 11:49 PM, Jim Mattson wrote: On Tue, Jan 30, 2018 at 1:00 PM, KarimAllah Ahmed wrote: Ooops! I did not think at all about nested :) This should be addressed now, I hope: http://git.infradead.org/linux-retpoline.git/commitdiff/f7f0cbba3e0cffcee050a8a5a9597a162d57e572 + if (cpu_has_vmx_msr_bitmap() && data && + !vmx->save_spec_ctrl_on_exit) { + vmx->save_spec_ctrl_on_exit = true; + + msr_bitmap = is_guest_mode(vcpu) ? vmx->nested.vmcs02.msr_bitmap : + vmx->vmcs01.msr_bitmap; + vmx_disable_intercept_for_msr(msr_bitmap, + MSR_IA32_SPEC_CTRL, + MSR_TYPE_RW); + } There are two ways to get to this point in vmx_set_msr while is_guest_mode(vcpu) is true: 1) L0 is processing vmcs12's VM-entry MSR load list on emulated VM-entry (see enter_vmx_non_root_mode). 2) L2 tried to execute WRMSR, writes to the MSR are intercepted in vmcs02's MSR permission bitmap, and writes to the MSR are not intercepted in vmcs12's MSR permission bitmap. In the first case, disabling the intercepts for the MSR in vmx->nested.vmcs02.msr_bitmap is incorrect, because we haven't yet determined that the intercepts are clear in vmcs12's MSR permission bitmap. In the second case, disabling *both* of the intercepts for the MSR in vmx->nested.vmcs02.msr_bitmap is incorrect, because we don't know that the read intercept is clear in vmcs12's MSR permission bitmap. Furthermore, disabling the write intercept for the MSR in vmx->nested.vmcs02.msr_bitmap is somewhat fruitless, because nested_vmx_merge_msr_bitmap is just going to undo that change on the next emulated VM-entry. Okay, I took a second look at the code (specially nested_vmx_merge_msr_bitmap). This means that I simply should not touch the MSR bitmap in set_msr in case of nested, I just need to properly update the l02 msr_bitmap in nested_vmx_merge_msr_bitmap. As in here: http://git.infradead.org/linux-retpoline.git/commitdiff/d90eedebdd16bb00741a2c93bc13c5e444c99c2b or am I still missing something? (sorry, did not actually look at the nested code before!) Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH v3 4/4] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/31/2018 01:27 AM, Jim Mattson wrote: On Tue, Jan 30, 2018 at 4:19 PM, Paolo Bonzini wrote: The new code in nested_vmx_merge_msr_bitmap should be conditional on vmx->save_spec_ctrl_on_exit. But then if L1 doesn't use MSR_IA32_SPEC_CTRL itself and it uses the VM-entry MSR load list to set up L2's MSR_IA32_SPEC_CTRL, you will never set vmx->save_spec_ctrl_on_exit, and L2's accesses to the MSR will always be intercepted by L0. I can add another variable (actually two) to indicate if msr interception should be disabled or not for SPEC_CTRL and PRED_CMD in nested case. That would allow us to have a fast alternative to guest_cpuid_has in nested_vmx_merge_msr_bitmap and at the same time maintain the current semantics of save_spec_ctrl_on_exit (i.e we would still differentiate between set_msr that is called from the loading MSRs for the emulated vm-entry vs L2 actually writing to it). What do you think? Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
[PATCH v4 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL
[ Based on a patch from Paolo Bonzini ] ... basically doing exactly what we do for VMX: - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID) - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest actually used it. Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/svm.c | 58 ++ 1 file changed, 58 insertions(+) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 89495cf..e1ba4c6 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -184,6 +184,9 @@ struct vcpu_svm { u64 gs_base; } host; + u64 spec_ctrl; + bool save_spec_ctrl_on_exit; + u32 *msrpm; ulong nmi_iret_rip; @@ -1583,6 +1586,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) u32 dummy; u32 eax = 1; + svm->spec_ctrl = 0; + if (!init_event) { svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; @@ -3604,6 +3609,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_VM_CR: msr_info->data = svm->nested.vm_cr_msr; break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + msr_info->data = svm->spec_ctrl; + break; case MSR_IA32_UCODE_REV: msr_info->data = 0x0165; break; @@ -3695,6 +3707,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_SPEC_CTRL: + if (!msr->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + /* The STIBP bit doesn't fault even if it's not advertised */ + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + return 1; + + svm->spec_ctrl = data; + + /* +* When it's written (to non-zero) for the first time, pass +* it through. This means we don't have to take the perf +* hit of saving it on vmexit for the common case of guests +* that don't use it. +*/ + if (data && !svm->save_spec_ctrl_on_exit) { + svm->save_spec_ctrl_on_exit = true; + if (is_guest_mode(vcpu)) + break; + set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1); + } + break; case MSR_IA32_PRED_CMD: if (!msr->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) @@ -4963,6 +4999,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) local_irq_enable(); + /* +* If this vCPU has touched SPEC_CTRL, restore the guest's value if +* it's non-zero. Since vmentry is serialising on affected CPUs, there +* is no need to worry about the conditional branch over the wrmsr +* being speculatively taken. +*/ + if (svm->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + asm volatile ( "push %%" _ASM_BP "; \n\t" "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t" @@ -5055,6 +5100,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) #endif ); + /* +* We do not use IBRS in the kernel. If this vCPU has used the +* SPEC_CTRL MSR it may have left it on; save the value and +* turn it off. This is much more efficient than blindly adding +* it to the atomic save/restore list. Especially as the former +* (Saving guest MSRs on vmexit) doesn't even exist in KVM. +*/ + if (svm->save_spec_ctrl_on_exit) + rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + + if (svm->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); -- 2.7.4
[PATCH v4 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
[ Based on a patch from Ashok Raj ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- v4: - Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features - Handling nested guests v3: - Save/restore manually - Fix CPUID handling - Fix a copy & paste error in the name of SPEC_CTRL MSR in disable_intercept. - support !cpu_has_vmx_msr_bitmap() v2: - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). - special case writing '0' in SPEC_CTRL to avoid confusing live-migration when the instance never used the MSR (dwmw@). - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). --- arch/x86/kvm/cpuid.c | 9 --- arch/x86/kvm/vmx.c | 68 arch/x86/kvm/x86.c | 2 +- 3 files changed, 75 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 1909635..13f5d42 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 0x8008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(IBPB); + F(IBPB) | F(IBRS); /* cpuid 0xC001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = @@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | + F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); entry->edx = 0; - /* IBPB isn't necessarily present in hardware cpuid */ + /* IBRS and IBPB aren't necessarily present in hardware cpuid */ if (boot_cpu_has(X86_FEATURE_IBPB)) entry->ebx |= F(IBPB); + if (boot_cpu_has(X86_FEATURE_IBRS)) + entry->ebx |= F(IBRS); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 40643b8..9080938 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -593,6 +593,8 @@ struct vcpu_vmx { u64 msr_guest_kernel_gs_base; #endif u64 arch_capabilities; + u64 spec_ctrl; + bool save_spec_ctrl_on_exit; u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; @@ -938,6 +940,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3238,6 +3242,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + msr_info->data =
[PATCH v4 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
[dwmw2: Stop using KF() for bits in it, too] Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: x...@kernel.org Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Paolo Bonzini Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 8 +++- arch/x86/kvm/cpuid.h | 1 + 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..c0eb337 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void) #define F(x) bit(X86_FEATURE_##x) -/* These are scattered features in cpufeatures.h. */ -#define KVM_CPUID_BIT_AVX512_4VNNIW 2 -#define KVM_CPUID_BIT_AVX512_4FMAPS 3 +/* For scattered features from cpufeatures.h; we currently expose none */ #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) entry->ecx &= ~F(PKU); entry->edx &= kvm_cpuid_7_0_edx_x86_features; - entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX); + cpuid_mask(&entry->edx, CPUID_7_EDX); } else { entry->ebx = 0; entry->ecx = 0; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index c2cea66..9a327d5 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX}, [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX}, + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) -- 2.7.4
[PATCH v4 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
Future intel processors will use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Cc: Ashok Raj Reviewed-by: Paolo Bonzini Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/vmx.c | 15 +++ arch/x86/kvm/x86.c | 1 + 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 033004d..1909635 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 96e672e..40643b8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -592,6 +592,8 @@ struct vcpu_vmx { u64 msr_host_kernel_gs_base; u64 msr_guest_kernel_gs_base; #endif + u64 arch_capabilities; + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; u32 secondary_exec_control; @@ -3236,6 +3238,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) + return 1; + msr_info->data = to_vmx(vcpu)->arch_capabilities; + break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); break; @@ -3362,6 +3370,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, MSR_TYPE_W); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated) + return 1; + vmx->arch_capabilities = data; + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -5624,6 +5637,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) ++vmx->nmsrs; } + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities); vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c53298d..4ec142e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, + MSR_IA32_ARCH_CAPABILITIES }; static unsigned num_msrs_to_save; -- 2.7.4
[PATCH v4 0/5] KVM: Expose speculation control feature to guests
Add direct access to speculation control MSRs for KVM guests. This allows the guest to protect itself against Spectre V2 using IBRS+IBPB instead of a retpoline+IBPB based approach. It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future Intel processors to indicate RDCL_NO and IBRS_ALL. v4: - Add IBRS passthrough for SVM (5/5). - Handle nested guests properly. - expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features Ashok Raj (1): KVM: x86: Add IBPB support KarimAllah Ahmed (4): KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL arch/x86/kvm/cpuid.c | 22 +++--- arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/svm.c | 85 ++ arch/x86/kvm/vmx.c | 114 ++- arch/x86/kvm/x86.c | 1 + 5 files changed, 216 insertions(+), 7 deletions(-) Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Ashok Raj Cc: Asit Mallick Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Janakarajan Natarajan Cc: Joerg Roedel Cc: Jun Nakajima Cc: Laura Abbott Cc: Linus Torvalds Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Tim Chen Cc: Tom Lendacky Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: x...@kernel.org -- 2.7.4
[PATCH v4 2/5] KVM: x86: Add IBPB support
From: Ashok Raj Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor barriers on switching between VMs to avoid inter VM Spectre-v2 attacks. [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Signed-off-by: Ashok Raj Signed-off-by: Peter Zijlstra (Intel) Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/cpuid.c | 11 ++- arch/x86/kvm/svm.c | 27 +++ arch/x86/kvm/vmx.c | 31 ++- 3 files changed, 67 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c0eb337..033004d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + /* cpuid 0x8008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = + F(IBPB); + /* cpuid 0xC001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!g_phys_as) g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); - entry->ebx = entry->edx = 0; + entry->edx = 0; + /* IBPB isn't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entry->ebx |= F(IBPB); + entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; } case 0x8019: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f40d0da..89495cf 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -529,6 +529,7 @@ struct svm_cpu_data { struct kvm_ldttss_desc *tss_desc; struct page *save_area; + struct vmcb *current_vmcb; }; static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); @@ -1703,11 +1704,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); + /* +* The vmcb page can be recycled, causing a false negative in +* svm_vcpu_load(). So do a full IBPB now. +*/ + indirect_branch_prediction_barrier(); } static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_svm *svm = to_svm(vcpu); + struct svm_cpu_data *sd = per_cpu(svm_data, cpu); int i; if (unlikely(cpu != vcpu->cpu)) { @@ -1736,6 +1743,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (static_cpu_has(X86_FEATURE_RDTSCP)) wrmsrl(MSR_TSC_AUX, svm->tsc_aux); + if (sd->current_vmcb != svm->vmcb) { + sd->current_vmcb = svm->vmcb; + indirect_branch_prediction_barrier(); + } avic_vcpu_load(vcpu, cpu); } @@ -3684,6 +3695,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_PRED_CMD: + if (!msr->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) + return 1; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + if (is_guest_mode(vcpu)) + break; + set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1); + break; case MSR_STAR: svm->vmcb->save.star = data; break; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d46a61b..96e672e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2285,6 +2285,7 @@ static void vmx_vcpu_load(struct kvm_
Re: [PATCH v4 2/5] KVM: x86: Add IBPB support
On 01/31/2018 05:50 PM, Jim Mattson wrote: On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed wrote: + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, + MSR_TYPE_W); Why not disable this intercept eagerly, rather than lazily? Unlike MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there is no cost to disabling the intercept if the guest cpuid info declares support for it. + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PRED_CMD, + MSR_TYPE_R); + } I don't think this should be predicated on "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Paolo suggested this on the previous revision because guest_cpuid_has() would be slow. Also, the final argument to nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than MSR_TYPE_R. Oops! will fix! Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH v4 2/5] KVM: x86: Add IBPB support
On 01/31/2018 05:55 PM, Paolo Bonzini wrote: On 31/01/2018 11:50, Jim Mattson wrote: + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PRED_CMD, + MSR_TYPE_R); + } I don't think this should be predicated on "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than MSR_TYPE_R. In fact this MSR can even be passed down unconditionally, since it needs no save/restore and has no ill performance effect on the sibling hyperthread. Only MSR_IA32_SPEC_CTRL needs to be conditional on "to_vmx(vcpu)->save_spec_ctrl_on_exit". That used to be the case in an earlier version. There seems to be two opinions here: 1) Pass it only if CPUID for the guest has it. 2) Pass it unconditionally. I do not really have a preference. Paolo Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
[PATCH v5 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
[dwmw2: Stop using KF() for bits in it, too] Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: x...@kernel.org Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Paolo Bonzini Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 8 +++- arch/x86/kvm/cpuid.h | 1 + 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..c0eb337 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void) #define F(x) bit(X86_FEATURE_##x) -/* These are scattered features in cpufeatures.h. */ -#define KVM_CPUID_BIT_AVX512_4VNNIW 2 -#define KVM_CPUID_BIT_AVX512_4FMAPS 3 +/* For scattered features from cpufeatures.h; we currently expose none */ #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) entry->ecx &= ~F(PKU); entry->edx &= kvm_cpuid_7_0_edx_x86_features; - entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX); + cpuid_mask(&entry->edx, CPUID_7_EDX); } else { entry->ebx = 0; entry->ecx = 0; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index c2cea66..9a327d5 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX}, [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX}, + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) -- 2.7.4
[PATCH v5 0/5] KVM: Expose speculation control feature to guests
Add direct access to speculation control MSRs for KVM guests. This allows the guest to protect itself against Spectre V2 using IBRS+IBPB instead of a retpoline+IBPB based approach. It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future Intel processors to indicate RDCL_NO and IBRS_ALL. v5: - svm: add PRED_CMD and SPEC_CTRL to direct_access_msrs list. - vmx: check also for X86_FEATURE_SPEC_CTRL for msr reads and writes. - vmx: Use MSR_TYPE_W instead of MSR_TYPE_R for the nested IBPB MSR - rewrite commit message for IBPB patch [2/5] (Ashok) v4: - Add IBRS passthrough for SVM (5/5). - Handle nested guests properly. - expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features Ashok Raj (1): KVM: x86: Add IBPB support KarimAllah Ahmed (4): KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL arch/x86/kvm/cpuid.c | 22 +++--- arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/svm.c | 87 ++ arch/x86/kvm/vmx.c | 117 +-- arch/x86/kvm/x86.c | 1 + 5 files changed, 218 insertions(+), 10 deletions(-) Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Ashok Raj Cc: Asit Mallick Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Janakarajan Natarajan Cc: Joerg Roedel Cc: Jun Nakajima Cc: Laura Abbott Cc: Linus Torvalds Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Tim Chen Cc: Tom Lendacky Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: x...@kernel.org -- 2.7.4
[PATCH v5 2/5] KVM: x86: Add IBPB support
From: Ashok Raj The Indirect Branch Predictor Barrier (IBPB) is an indirect branch control mechanism. It keeps earlier branches from influencing later ones. Unlike IBRS and STIBP, IBPB does not define a new mode of operation. It's a command that ensures predicted branch targets aren't used after the barrier. Although IBRS and IBPB are enumerated by the same CPUID enumeration, IBPB is very different. IBPB helps mitigate against three potential attacks: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. * Mitigate attacks from guest/ring3->host/ring3. These would require a IBPB during context switch in host, or after VMEXIT. The host process has two ways to mitigate - Either it can be compiled with retpoline - If its going through context switch, and has set !dumpable then there is a IBPB in that path. (Tim's patch: https://patchwork.kernel.org/patch/10192871) - The case where after a VMEXIT you return back to Qemu might make Qemu attackable from guest when Qemu isn't compiled with retpoline. There are issues reported when doing IBPB on every VMEXIT that resulted in some tsc calibration woes in guest. * Mitigate guest/ring0->host/ring0 attacks. When host kernel is using retpoline it is safe against these attacks. If host kernel isn't using retpoline we might need to do a IBPB flush on every VMEXIT. Even when using retpoline for indirect calls, in certain conditions 'ret' can use the BTB on Skylake-era CPUs. There are other mitigations available like RSB stuffing/clearing. * IBPB is issued only for SVM during svm_free_vcpu(). VMX has a vmclear and SVM doesn't. Follow discussion here: https://lkml.org/lkml/2018/1/15/146 Please refer to the following spec for more details on the enumeration and control. Refer here to get documentation about mitigations. https://software.intel.com/en-us/side-channel-security-support [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Signed-off-by: Ashok Raj Signed-off-by: Peter Zijlstra (Intel) Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed v5: - Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR. - Always merge the bitmaps unconditionally. - Add PRED_CMD to direct_access_msrs. - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes - rewrite the commit message (from ashok.raj@) --- arch/x86/kvm/cpuid.c | 11 ++- arch/x86/kvm/svm.c | 28 arch/x86/kvm/vmx.c | 29 + 3 files changed, 63 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c0eb337..033004d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + /* cpuid 0x8008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = + F(IBPB); + /* cpuid 0xC001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!g_phys_as) g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); - entry->ebx = entry->edx = 0; + entry->edx = 0; + /* IBPB isn't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entry->ebx |= F(IBPB); + entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; } case 0x8019: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f40d0da..bfbb7b9 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -250,6 +250,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_SYSCALL_MASK,.always = true }, #endif { .index = MSR_IA32_LASTBRANCHFROMIP,
[PATCH v5 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
Future intel processors will use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Cc: Ashok Raj Reviewed-by: Paolo Bonzini Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/vmx.c | 15 +++ arch/x86/kvm/x86.c | 1 + 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 033004d..1909635 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2e4e8af..a0b2bd1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -592,6 +592,8 @@ struct vcpu_vmx { u64 msr_host_kernel_gs_base; u64 msr_guest_kernel_gs_base; #endif + u64 arch_capabilities; + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; u32 secondary_exec_control; @@ -3236,6 +3238,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) + return 1; + msr_info->data = to_vmx(vcpu)->arch_capabilities; + break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); break; @@ -3363,6 +3371,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, MSR_TYPE_W); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated) + return 1; + vmx->arch_capabilities = data; + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -5625,6 +5638,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) ++vmx->nmsrs; } + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities); vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c53298d..4ec142e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, + MSR_IA32_ARCH_CAPABILITIES }; static unsigned num_msrs_to_save; -- 2.7.4
[PATCH v5 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL
[ Based on a patch from Paolo Bonzini ] ... basically doing exactly what we do for VMX: - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID) - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest actually used it. Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- v5: - Add SPEC_CTRL to direct_access_msrs. --- arch/x86/kvm/svm.c | 59 ++ 1 file changed, 59 insertions(+) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index bfbb7b9..0016a8a 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -184,6 +184,9 @@ struct vcpu_svm { u64 gs_base; } host; + u64 spec_ctrl; + bool save_spec_ctrl_on_exit; + u32 *msrpm; ulong nmi_iret_rip; @@ -250,6 +253,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_SYSCALL_MASK,.always = true }, #endif { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, + { .index = MSR_IA32_SPEC_CTRL, .always = false }, { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, { .index = MSR_IA32_LASTINTFROMIP, .always = false }, @@ -1584,6 +1588,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) u32 dummy; u32 eax = 1; + svm->spec_ctrl = 0; + if (!init_event) { svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; @@ -3605,6 +3611,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_VM_CR: msr_info->data = svm->nested.vm_cr_msr; break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + msr_info->data = svm->spec_ctrl; + break; case MSR_IA32_UCODE_REV: msr_info->data = 0x0165; break; @@ -3696,6 +3709,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_SPEC_CTRL: + if (!msr->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + /* The STIBP bit doesn't fault even if it's not advertised */ + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + return 1; + + svm->spec_ctrl = data; + + /* +* When it's written (to non-zero) for the first time, pass +* it through. This means we don't have to take the perf +* hit of saving it on vmexit for the common case of guests +* that don't use it. +*/ + if (data && !svm->save_spec_ctrl_on_exit) { + svm->save_spec_ctrl_on_exit = true; + if (is_guest_mode(vcpu)) + break; + set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1); + } + break; case MSR_IA32_PRED_CMD: if (!msr->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) @@ -4964,6 +5001,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) local_irq_enable(); + /* +* If this vCPU has touched SPEC_CTRL, restore the guest's value if +* it's non-zero. Since vmentry is serialising on affected CPUs, there +* is no need to worry about the conditional branch over the wrmsr +* being speculatively taken. +*/ + if (svm->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + asm volatile ( "push %%" _ASM_BP "; \n\t" "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t" @@ -5056,6 +5102,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) #endif ); + /* +* We do not use IBRS in the kernel. If this vCPU has used the +* SPEC_CTRL MSR it may have left it on; save the value and +* turn it off. This is much more efficient than blindly adding +* it to the atomic save/restore list. Especiall
[PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
[ Based on a patch from Ashok Raj ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- v5: - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes v4: - Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features - Handling nested guests v3: - Save/restore manually - Fix CPUID handling - Fix a copy & paste error in the name of SPEC_CTRL MSR in disable_intercept. - support !cpu_has_vmx_msr_bitmap() v2: - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). - special case writing '0' in SPEC_CTRL to avoid confusing live-migration when the instance never used the MSR (dwmw@). - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). --- arch/x86/kvm/cpuid.c | 9 --- arch/x86/kvm/vmx.c | 73 arch/x86/kvm/x86.c | 2 +- 3 files changed, 80 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 1909635..13f5d42 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 0x8008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(IBPB); + F(IBPB) | F(IBRS); /* cpuid 0xC001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = @@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | + F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); entry->edx = 0; - /* IBPB isn't necessarily present in hardware cpuid */ + /* IBRS and IBPB aren't necessarily present in hardware cpuid */ if (boot_cpu_has(X86_FEATURE_IBPB)) entry->ebx |= F(IBPB); + if (boot_cpu_has(X86_FEATURE_IBRS)) + entry->ebx |= F(IBRS); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a0b2bd1..4ee93cb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -593,6 +593,8 @@ struct vcpu_vmx { u64 msr_guest_kernel_gs_base; #endif u64 arch_capabilities; + u64 spec_ctrl; + bool save_spec_ctrl_on_exit; u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; @@ -938,6 +940,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3238,6 +3242,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu,
Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/31/2018 08:53 PM, Jim Mattson wrote: On Wed, Jan 31, 2018 at 11:37 AM, KarimAllah Ahmed wrote: + + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) { + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_SPEC_CTRL, + MSR_TYPE_R | MSR_TYPE_W); + } + As this is written, L2 will never get direct access to this MSR until after L1 writes it. What if L1 never writes it? The condition should really be something that captures, "if L0 is willing to yield this MSR to the guest..." but save_spec_ctrl_on_exit is also set for L2 write. So once L2 writes to it, this condition will be true and then the bitmap will be updated. Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH v5 2/5] KVM: x86: Add IBPB support
On 01/31/2018 09:28 PM, Konrad Rzeszutek Wilk wrote: diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d46a61b..2e4e8af 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2285,6 +2285,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; vmcs_load(vmx->loaded_vmcs->vmcs); + indirect_branch_prediction_barrier(); } if (!already_loaded) { @@ -3342,6 +3343,26 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; + case MSR_IA32_PRED_CMD: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) && + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) + return 1; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + + if (is_guest_mode(vcpu)) + break; Don't you want this the other way around? That is first do the disable_intercept and then add the 'if (is_guest_mode(vcpu))' ? Otherwise the very first MSR write from the guest is going to hit condition above and never end up executing the disabling of the intercept? is_guest_mode is checking if this is an L2 guest. I *should not* do disable_intercept on the L1 guest bitmap if it is an L2 guest that is why this check happens before disable_intercept. For the short-circuited L2 path, nested_vmx_merge_msr_bitmap will properly update the L02 MSR bitmap and use it. So the checks are fine AFAICT. + + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, + MSR_TYPE_W); + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/31/2018 09:18 PM, Jim Mattson wrote: On Wed, Jan 31, 2018 at 12:01 PM, KarimAllah Ahmed wrote: but save_spec_ctrl_on_exit is also set for L2 write. So once L2 writes to it, this condition will be true and then the bitmap will be updated. So if L1 or any L2 writes to the MSR, then save_spec_ctrl_on_exit is set to true, even if the MSR permission bitmap for a particular VMCS *doesn't* allow the MSR to be written without an intercept. That's functionally correct, but inefficient. It seems to me that save_spec_ctrl_on_exit should indicate whether or not the *current* MSR permission bitmap allows unintercepted writes to IA32_SPEC_CTRL. To that end, perhaps save_spec_ctrl_on_exit rightfully belongs in the loaded_vmcs structure, alongside the msr_bitmap pointer that it is associated with. For vmcs02, nested_vmx_merge_msr_bitmap() should set the vmcs02 save_spec_ctrl_on_exit based on (a) whether L0 is willing to yield the MSR to L1, and (b) whether L1 is willing to yield the MSR to L2. I actually got rid of this save_spec_ctrl_on_exit variable and replaced it with another variable like the one suggested for IBPB. Just to avoid doing an expensive guest_cpuid_has. Now I peak instead in the MSR bitmap to figure out if this MSR was supposed to be intercepted or not. This test should provide a similar semantics to save_spec_ctrl_on_exit. Anyway, cleaning up/testing now and will post a new version. Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/31/2018 11:52 PM, KarimAllah Ahmed wrote: On 01/31/2018 09:18 PM, Jim Mattson wrote: On Wed, Jan 31, 2018 at 12:01 PM, KarimAllah Ahmed wrote: but save_spec_ctrl_on_exit is also set for L2 write. So once L2 writes to it, this condition will be true and then the bitmap will be updated. So if L1 or any L2 writes to the MSR, then save_spec_ctrl_on_exit is set to true, even if the MSR permission bitmap for a particular VMCS *doesn't* allow the MSR to be written without an intercept. That's functionally correct, but inefficient. It seems to me that save_spec_ctrl_on_exit should indicate whether or not the *current* MSR permission bitmap allows unintercepted writes to IA32_SPEC_CTRL. To that end, perhaps save_spec_ctrl_on_exit rightfully belongs in the loaded_vmcs structure, alongside the msr_bitmap pointer that it is associated with. For vmcs02, nested_vmx_merge_msr_bitmap() should set the vmcs02 save_spec_ctrl_on_exit based on (a) whether L0 is willing to yield the MSR to L1, and (b) whether L1 is willing to yield the MSR to L2. I actually got rid of this save_spec_ctrl_on_exit variable and replaced it with another variable like the one suggested for IBPB. Just to avoid doing an expensive guest_cpuid_has. Now I peak instead in the MSR bitmap to figure out if this MSR was supposed to be intercepted or not. This test should provide a similar semantics to save_spec_ctrl_on_exit. Anyway, cleaning up/testing now and will post a new version. I think this patch should address all your concerns. Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B >From 9c19a8ac3f021efba6f70ad7e28f7ad06bb97e43 Mon Sep 17 00:00:00 2001 From: KarimAllah Ahmed Date: Mon, 29 Jan 2018 19:58:10 + Subject: [PATCH] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL [ Based on a patch from Ashok Raj ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- v6: - got rid of save_spec_ctrl_on_exit - introduce spec_ctrl_intercepted - introduce spec_ctrl_used v5: - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes v4: - Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features - Handling nested guests v3: - Save/restore manually - Fix CPUID handling - Fix a copy & paste error in the name of SPEC_CTRL MSR in disable_intercept. - support !cpu_has_vmx_msr_bitmap() v2: - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). - special case writing '0' in SPEC_CTRL to avoid confusing live-migration when the instance never used the MSR (dwmw@). - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). --- arch/x86/kvm/cpuid.c | 9 +++-- arch/x86/kvm/vmx.c | 94 +++- arch/x86/kvm/x86.c | 2 +- 3 files changed, 100 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 1909635..13f5d42 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 0x8008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(IBPB); + F(IBPB) | F(IBRS); /* cpuid 0xC001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = @@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | + F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as <<
Re: [PATCH v5 2/5] KVM: x86: Add IBPB support
On 01/31/2018 08:55 PM, Jim Mattson wrote: On Wed, Jan 31, 2018 at 11:53 AM, David Woodhouse wrote: Rather than doing the expensive guest_cpu_has() every time (which is worse now as we realised we need two of them) perhaps we should introduce a local flag for that too? That sounds good to me. Done. Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B >From d51391ae3667f85cd1d6160e83c1d6c28b47b7d8 Mon Sep 17 00:00:00 2001 From: Ashok Raj Date: Thu, 11 Jan 2018 17:32:19 -0800 Subject: [PATCH] KVM: x86: Add IBPB support The Indirect Branch Predictor Barrier (IBPB) is an indirect branch control mechanism. It keeps earlier branches from influencing later ones. Unlike IBRS and STIBP, IBPB does not define a new mode of operation. It's a command that ensures predicted branch targets aren't used after the barrier. Although IBRS and IBPB are enumerated by the same CPUID enumeration, IBPB is very different. IBPB helps mitigate against three potential attacks: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. * Mitigate attacks from guest/ring3->host/ring3. These would require a IBPB during context switch in host, or after VMEXIT. The host process has two ways to mitigate - Either it can be compiled with retpoline - If its going through context switch, and has set !dumpable then there is a IBPB in that path. (Tim's patch: https://patchwork.kernel.org/patch/10192871) - The case where after a VMEXIT you return back to Qemu might make Qemu attackable from guest when Qemu isn't compiled with retpoline. There are issues reported when doing IBPB on every VMEXIT that resulted in some tsc calibration woes in guest. * Mitigate guest/ring0->host/ring0 attacks. When host kernel is using retpoline it is safe against these attacks. If host kernel isn't using retpoline we might need to do a IBPB flush on every VMEXIT. Even when using retpoline for indirect calls, in certain conditions 'ret' can use the BTB on Skylake-era CPUs. There are other mitigations available like RSB stuffing/clearing. * IBPB is issued only for SVM during svm_free_vcpu(). VMX has a vmclear and SVM doesn't. Follow discussion here: https://lkml.org/lkml/2018/1/15/146 Please refer to the following spec for more details on the enumeration and control. Refer here to get documentation about mitigations. https://software.intel.com/en-us/side-channel-security-support [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Signed-off-by: Ashok Raj Signed-off-by: Peter Zijlstra (Intel) Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed --- v6: - introduce pred_cmd_used v5: - Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR. - Always merge the bitmaps unconditionally. - Add PRED_CMD to direct_access_msrs. - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes - rewrite the commit message (from ashok.raj@) --- arch/x86/kvm/cpuid.c | 11 ++- arch/x86/kvm/svm.c | 28 arch/x86/kvm/vmx.c | 42 -- 3 files changed, 78 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c0eb337..033004d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + /* cpuid 0x8008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = + F(IBPB); + /* cpuid 0xC001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!g_phys_as) g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); - entry->ebx = entry->edx = 0; + entry->edx = 0; + /* IBPB isn't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entr
Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
On 02/01/2018 03:19 PM, Konrad Rzeszutek Wilk wrote: .snip.. +/* Is SPEC_CTRL intercepted for the currently running vCPU? */ +static bool spec_ctrl_intercepted(struct kvm_vcpu *vcpu) +{ + unsigned long *msr_bitmap; + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return true; + + msr_bitmap = is_guest_mode(vcpu) ? + to_vmx(vcpu)->nested.vmcs02.msr_bitmap : + to_vmx(vcpu)->vmcs01.msr_bitmap; + + return !!test_bit(MSR_IA32_SPEC_CTRL, msr_bitmap + 0x800 / f); +} + ..snip.. @@ -3359,6 +3393,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) && + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) + return 1; + + vmx->spec_ctrl_used = true; + + /* The STIBP bit doesn't fault even if it's not advertised */ + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + return 1; + + vmx->spec_ctrl = data; + + /* +* When it's written (to non-zero) for the first time, pass +* it through. This means we don't have to take the perf .. But only if it is a nested guest (as you have && is_guest_mode). Do you want to update the comment a bit? +* hit of saving it on vmexit for the common case of guests +* that don't use it. +*/ + if (cpu_has_vmx_msr_bitmap() && data && + spec_ctrl_intercepted(vcpu) && + is_guest_mode(vcpu)) ^^ <=== here Would it be perhaps also good to mention the complexity of how we ought to be handling L1 and L2 guests in the commit? We are all stressed and I am sure some of us haven't gotten much sleep - but it can help in say three months when some unluckly new soul is trying to understand this and gets utterly confused. Yup, I will go through the patches and add as much details as possible. And yes, the is_guest_mode(vcpu) here is inverted :D I blame the late night :) + vmx_disable_intercept_for_msr( + vmx->vmcs01.msr_bitmap, + MSR_IA32_SPEC_CTRL, + MSR_TYPE_RW); + break; case MSR_IA32_PRED_CMD: if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) && Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
On 02/01/2018 02:25 PM, David Woodhouse wrote: On Wed, 2018-01-31 at 23:26 -0500, Konrad Rzeszutek Wilk wrote: diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6a9f4ec..bfc80ff 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -594,6 +594,14 @@ struct vcpu_vmx { #endif u64 arch_capabilities; + u64 spec_ctrl; + + /* + * This indicates that: + * 1) guest_cpuid_has(X86_FEATURE_IBRS) == true && + * 2) The guest has actually initiated a write against the MSR. + */ + bool spec_ctrl_used; /* * This indicates that: Thanks for persisting with the details here, Karim. In addition to Konrad's heckling at the comments, I'll add my own request to his... I'd like the comment for spec_ctrl_used to explain why it isn't entirely redundant with the spec_ctrl_intercepted() function. Without nesting, I believe it *would* be redundant, but the difference comes when an L2 is running for which L1 has not permitted the MSR to be passed through. That's when we have spec_ctrl_used = true but the MSR *isn't* actually passed through in the active msr_bitmap. Question: if spec_ctrl_used is always equivalent to the intercept bit in the vmcs01.msr_bitmap, just not the guest bitmap... should we ditch it and always use the bit from the vmcs01.msr_bitmap? If I used the vmcs01.msr_bitmap, spec_ctrl_used will always be true if L0 passed it to L1. Even if L1 did not actually pass it to L2 and even if L2 has not written to it yet (!used). This pretty much renders the short-circuit at nested_vmx_merge_msr_bitmap useless: if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && !to_vmx(vcpu)->pred_cmd_used && !to_vmx(vcpu)->spec_ctrl_used) return false; ... and the default path will be kvm_vcpu_gpa_to_page + kmap. That being said, I have to admit the logic for spec_ctrl_used is not perfect either. If L1 or any of the L2s touched the MSR, spec_ctrl_used will be set to true. So if one L2 used the MSR, all other L2s will also skip the short- circuit mentioned above and end up *always* going through kvm_vcpu_gpa_to_page + kmap. Maybe all of this is over-thinking and in reality the short-circuit above is really useless and all L2 guests are happily using x2apic :) Sorry :) Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
On 02/01/2018 06:37 PM, KarimAllah Ahmed wrote: On 02/01/2018 02:25 PM, David Woodhouse wrote: On Wed, 2018-01-31 at 23:26 -0500, Konrad Rzeszutek Wilk wrote: diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6a9f4ec..bfc80ff 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -594,6 +594,14 @@ struct vcpu_vmx { #endif u64 arch_capabilities; + u64 spec_ctrl; + + /* + * This indicates that: + * 1) guest_cpuid_has(X86_FEATURE_IBRS) == true && + * 2) The guest has actually initiated a write against the MSR. + */ + bool spec_ctrl_used; /* * This indicates that: Thanks for persisting with the details here, Karim. In addition to Konrad's heckling at the comments, I'll add my own request to his... I'd like the comment for spec_ctrl_used to explain why it isn't entirely redundant with the spec_ctrl_intercepted() function. Without nesting, I believe it *would* be redundant, but the difference comes when an L2 is running for which L1 has not permitted the MSR to be passed through. That's when we have spec_ctrl_used = true but the MSR *isn't* actually passed through in the active msr_bitmap. Question: if spec_ctrl_used is always equivalent to the intercept bit in the vmcs01.msr_bitmap, just not the guest bitmap... should we ditch it and always use the bit from the vmcs01.msr_bitmap? If I used the vmcs01.msr_bitmap, spec_ctrl_used will always be true if L0 passed it to L1. Even if L1 did not actually pass it to L2 and even if L2 has not written to it yet (!used). This pretty much renders the short-circuit at nested_vmx_merge_msr_bitmap useless: if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && !to_vmx(vcpu)->pred_cmd_used && !to_vmx(vcpu)->spec_ctrl_used) return false; ... and the default path will be kvm_vcpu_gpa_to_page + kmap. That being said, I have to admit the logic for spec_ctrl_used is not perfect either. If L1 or any of the L2s touched the MSR, spec_ctrl_used will be set to true. So if one L2 used the MSR, all other L2s will also skip the short- circuit mentioned above and end up *always* going through kvm_vcpu_gpa_to_page + kmap. Maybe all of this is over-thinking and in reality the short-circuit above is really useless and all L2 guests are happily using x2apic :) hehe .. >> if spec_ctrl_used is always equivalent to the intercept bit in the vmcs01.msr_bitmap actually yes, we can. I just forgot that we update the msr bitmap lazily! :) Sorry :) Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
[PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
[ Based on a patch from Ashok Raj ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only start saving and restoring when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- v6: - got rid of save_spec_ctrl_on_exit - introduce msr_write_intercepted v5: - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes v4: - Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features - Handling nested guests v3: - Save/restore manually - Fix CPUID handling - Fix a copy & paste error in the name of SPEC_CTRL MSR in disable_intercept. - support !cpu_has_vmx_msr_bitmap() v2: - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). - special case writing '0' in SPEC_CTRL to avoid confusing live-migration when the instance never used the MSR (dwmw@). - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). --- arch/x86/kvm/cpuid.c | 9 +++-- arch/x86/kvm/vmx.c | 105 ++- arch/x86/kvm/x86.c | 2 +- 3 files changed, 110 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 1909635..13f5d42 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 0x8008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(IBPB); + F(IBPB) | F(IBRS); /* cpuid 0xC001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = @@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | + F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); entry->edx = 0; - /* IBPB isn't necessarily present in hardware cpuid */ + /* IBRS and IBPB aren't necessarily present in hardware cpuid */ if (boot_cpu_has(X86_FEATURE_IBPB)) entry->ebx |= F(IBPB); + if (boot_cpu_has(X86_FEATURE_IBRS)) + entry->ebx |= F(IBRS); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b13314a..5d8a6a91 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -594,6 +594,7 @@ struct vcpu_vmx { #endif u64 arch_capabilities; + u64 spec_ctrl; u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; @@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) } /* + * Check if MSR is intercepted for currently loaded MSR bitmap. + */ +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr) +{ + unsigned long *msr_bitmap; + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return true; + + msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap; + + if (msr <= 0x1fff) { + return !!test_bit(msr, msr_bitmap + 0x800 / f); + } else if ((msr >= 0xc000) && (msr <= 0xc0001fff)) { + msr &= 0x1fff; + return !!test_bit(msr, msr_bitmap + 0xc00 / f); + } + + return true; +} + +/* * Check if MSR is intercepted for L01 MSR bitmap. */ static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr) @@ -3264,6 +3288,14 @@ static int vmx_get_msr(struct kvm
[PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Cc: Ashok Raj Reviewed-by: Paolo Bonzini Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/vmx.c | 15 +++ arch/x86/kvm/x86.c | 1 + 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 033004d..1909635 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 263eb1f..b13314a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -593,6 +593,8 @@ struct vcpu_vmx { u64 msr_guest_kernel_gs_base; #endif + u64 arch_capabilities; + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; u32 secondary_exec_control; @@ -3262,6 +3264,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) + return 1; + msr_info->data = to_vmx(vcpu)->arch_capabilities; + break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); break; @@ -3397,6 +3405,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, MSR_TYPE_W); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated) + return 1; + vmx->arch_capabilities = data; + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -5659,6 +5672,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) ++vmx->nmsrs; } + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities); vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c53298d..4ec142e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, + MSR_IA32_ARCH_CAPABILITIES }; static unsigned num_msrs_to_save; -- 2.7.4
[PATCH v6 2/5] KVM: x86: Add IBPB support
From: Ashok Raj The Indirect Branch Predictor Barrier (IBPB) is an indirect branch control mechanism. It keeps earlier branches from influencing later ones. Unlike IBRS and STIBP, IBPB does not define a new mode of operation. It's a command that ensures predicted branch targets aren't used after the barrier. Although IBRS and IBPB are enumerated by the same CPUID enumeration, IBPB is very different. IBPB helps mitigate against three potential attacks: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. * Mitigate attacks from guest/ring3->host/ring3. These would require a IBPB during context switch in host, or after VMEXIT. The host process has two ways to mitigate - Either it can be compiled with retpoline - If its going through context switch, and has set !dumpable then there is a IBPB in that path. (Tim's patch: https://patchwork.kernel.org/patch/10192871) - The case where after a VMEXIT you return back to Qemu might make Qemu attackable from guest when Qemu isn't compiled with retpoline. There are issues reported when doing IBPB on every VMEXIT that resulted in some tsc calibration woes in guest. * Mitigate guest/ring0->host/ring0 attacks. When host kernel is using retpoline it is safe against these attacks. If host kernel isn't using retpoline we might need to do a IBPB flush on every VMEXIT. Even when using retpoline for indirect calls, in certain conditions 'ret' can use the BTB on Skylake-era CPUs. There are other mitigations available like RSB stuffing/clearing. * IBPB is issued only for SVM during svm_free_vcpu(). VMX has a vmclear and SVM doesn't. Follow discussion here: https://lkml.org/lkml/2018/1/15/146 Please refer to the following spec for more details on the enumeration and control. Refer here to get documentation about mitigations. https://software.intel.com/en-us/side-channel-security-support [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Signed-off-by: Ashok Raj Signed-off-by: Peter Zijlstra (Intel) Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed --- v6: - introduce msr_write_intercepted_l01 v5: - Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR. - Always merge the bitmaps unconditionally. - Add PRED_CMD to direct_access_msrs. - Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes - rewrite the commit message (from ashok.raj@) --- arch/x86/kvm/cpuid.c | 11 +++- arch/x86/kvm/svm.c | 28 ++ arch/x86/kvm/vmx.c | 80 ++-- 3 files changed, 116 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c0eb337..033004d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + /* cpuid 0x8008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = + F(IBPB); + /* cpuid 0xC001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!g_phys_as) g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); - entry->ebx = entry->edx = 0; + entry->edx = 0; + /* IBPB isn't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entry->ebx |= F(IBPB); + entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; } case 0x8019: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f40d0da..254eefb 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -249,6 +249,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_CSTAR, .always = true
[PATCH v6 0/5] KVM: Expose speculation control feature to guests
Add direct access to speculation control MSRs for KVM guests. This allows the guest to protect itself against Spectre V2 using IBRS+IBPB instead of a retpoline+IBPB based approach. It also exposes the ARCH_CAPABILITIES MSR which is used by Intel processors to indicate RDCL_NO and IBRS_ALL. Keep in mind that the SVM part of the patch is unchanged this time. Mostly to get feedback/confirmation about the nested handling for VMX first, once this is done I will update SVM as well. v6: - Do not penalize (save/restore IBRS) all L2 guests when anyone of them starts using the SPEC_CTRL. v5: - svm: add PRED_CMD and SPEC_CTRL to direct_access_msrs list. - vmx: check also for X86_FEATURE_SPEC_CTRL for msr reads and writes. - vmx: Use MSR_TYPE_W instead of MSR_TYPE_R for the nested IBPB MSR - rewrite commit message for IBPB patch [2/5] (Ashok) v4: - Add IBRS passthrough for SVM (5/5). - Handle nested guests properly. - expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features Ashok Raj (1): KVM: x86: Add IBPB support KarimAllah Ahmed (4): KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL arch/x86/kvm/cpuid.c | 22 -- arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/svm.c | 87 +++ arch/x86/kvm/vmx.c | 196 ++- arch/x86/kvm/x86.c | 1 + 5 files changed, 299 insertions(+), 8 deletions(-) Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Ashok Raj Cc: Asit Mallick Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Janakarajan Natarajan Cc: Joerg Roedel Cc: Jun Nakajima Cc: Laura Abbott Cc: Linus Torvalds Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Tim Chen Cc: Tom Lendacky Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: x...@kernel.org -- 2.7.4
[PATCH v6 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL
[ Based on a patch from Paolo Bonzini ] ... basically doing exactly what we do for VMX: - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID) - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest actually used it. Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- v5: - Add SPEC_CTRL to direct_access_msrs. --- arch/x86/kvm/svm.c | 59 ++ 1 file changed, 59 insertions(+) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 254eefb..c6ab343 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -184,6 +184,9 @@ struct vcpu_svm { u64 gs_base; } host; + u64 spec_ctrl; + bool save_spec_ctrl_on_exit; + u32 *msrpm; ulong nmi_iret_rip; @@ -249,6 +252,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_CSTAR, .always = true }, { .index = MSR_SYSCALL_MASK,.always = true }, #endif + { .index = MSR_IA32_SPEC_CTRL, .always = false }, { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, @@ -1584,6 +1588,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) u32 dummy; u32 eax = 1; + svm->spec_ctrl = 0; + if (!init_event) { svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; @@ -3605,6 +3611,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_VM_CR: msr_info->data = svm->nested.vm_cr_msr; break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + msr_info->data = svm->spec_ctrl; + break; case MSR_IA32_UCODE_REV: msr_info->data = 0x0165; break; @@ -3696,6 +3709,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_SPEC_CTRL: + if (!msr->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + /* The STIBP bit doesn't fault even if it's not advertised */ + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + return 1; + + svm->spec_ctrl = data; + + /* +* When it's written (to non-zero) for the first time, pass +* it through. This means we don't have to take the perf +* hit of saving it on vmexit for the common case of guests +* that don't use it. +*/ + if (data && !svm->save_spec_ctrl_on_exit) { + svm->save_spec_ctrl_on_exit = true; + if (is_guest_mode(vcpu)) + break; + set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1); + } + break; case MSR_IA32_PRED_CMD: if (!msr->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) @@ -4964,6 +5001,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) local_irq_enable(); + /* +* If this vCPU has touched SPEC_CTRL, restore the guest's value if +* it's non-zero. Since vmentry is serialising on affected CPUs, there +* is no need to worry about the conditional branch over the wrmsr +* being speculatively taken. +*/ + if (svm->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + asm volatile ( "push %%" _ASM_BP "; \n\t" "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t" @@ -5056,6 +5102,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) #endif ); + /* +* We do not use IBRS in the kernel. If this vCPU has used the +* SPEC_CTRL MSR it may have left it on; save the value and +* turn it off. This is much more efficient than blindly adding +* it to the atomic save/restore list. Especiall
[PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
[dwmw2: Stop using KF() for bits in it, too] Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: x...@kernel.org Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Paolo Bonzini Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 8 +++- arch/x86/kvm/cpuid.h | 1 + 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..c0eb337 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void) #define F(x) bit(X86_FEATURE_##x) -/* These are scattered features in cpufeatures.h. */ -#define KVM_CPUID_BIT_AVX512_4VNNIW 2 -#define KVM_CPUID_BIT_AVX512_4FMAPS 3 +/* For scattered features from cpufeatures.h; we currently expose none */ #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) entry->ecx &= ~F(PKU); entry->edx &= kvm_cpuid_7_0_edx_x86_features; - entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX); + cpuid_mask(&entry->edx, CPUID_7_EDX); } else { entry->ebx = 0; entry->ecx = 0; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index c2cea66..9a327d5 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX}, [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX}, + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) -- 2.7.4
[PATCH] kvm: x86: Use X86_CR4_PAE instead of X86_CR4_PAE_BIT while validating sregs
Use the mask (X86_CR4_PAE) instead of the bit itself (X86_CR4_PAE_BIT) while validating sregs. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: x...@kernel.org Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index abd1723..6f452bc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7664,7 +7664,7 @@ int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) * 64-bit mode (though maybe in a 32-bit code segment). * CR4.PAE and EFER.LMA must be set. */ - if (!(sregs->cr4 & X86_CR4_PAE_BIT) + if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA)) return -EINVAL; } else { -- 2.7.4
Re: [PATCH] kvm: x86: Use X86_CR4_PAE instead of X86_CR4_PAE_BIT while validating sregs
Please ignore. I just noticed that a similar patch is already in Radim's tree and queued for linus. On 01/20/2018 07:08 PM, KarimAllah Ahmed wrote: Use the mask (X86_CR4_PAE) instead of the bit itself (X86_CR4_PAE_BIT) while validating sregs. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: x...@kernel.org Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index abd1723..6f452bc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7664,7 +7664,7 @@ int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) * 64-bit mode (though maybe in a 32-bit code segment). * CR4.PAE and EFER.LMA must be set. */ - if (!(sregs->cr4 & X86_CR4_PAE_BIT) + if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA)) return -EINVAL; } else { Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
[RFC 01/10] x86/speculation: Add basic support for IBPB
From: Thomas Gleixner Expose indirect_branch_prediction_barrier() for use in subsequent patches. [karahmed: remove the special-casing of skylake for using IBPB (wtf?), switch to using ALTERNATIVES instead of static_cpu_has] [dwmw2:set up ax/cx/dx in the asm too so it gets NOP'd out] Signed-off-by: Thomas Gleixner Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/nospec-branch.h | 16 arch/x86/kernel/cpu/bugs.c | 7 +++ 3 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 624d978..8ec9588 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -207,6 +207,7 @@ #define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* AMD Retpoline mitigation for Spectre variant 2 */ #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ +#define X86_FEATURE_IBPB ( 7*32+16) /* Using Indirect Branch Prediction Barrier */ #define X86_FEATURE_AMD_PRED_CMD ( 7*32+17) /* Prediction Command MSR (AMD) */ #define X86_FEATURE_MBA( 7*32+18) /* Memory Bandwidth Allocation */ #define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */ diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 4ad4108..c333c95 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -218,5 +218,21 @@ static inline void vmexit_fill_RSB(void) #endif } +static inline void indirect_branch_prediction_barrier(void) +{ + unsigned long ax, cx, dx; + + asm volatile(ALTERNATIVE("", +"movl %[msr], %%ecx\n\t" +"movl %[val], %%eax\n\t" +"movl $0, %%edx\n\t" +"wrmsr", +X86_FEATURE_IBPB) +: "=a" (ax), "=c" (cx), "=d" (dx) +: [msr] "i" (MSR_IA32_PRED_CMD), + [val] "i" (PRED_CMD_IBPB) +: "memory"); +} + #endif /* __ASSEMBLY__ */ #endif /* __NOSPEC_BRANCH_H__ */ diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 390b3dc..96548ff 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -249,6 +249,13 @@ static void __init spectre_v2_select_mitigation(void) setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); pr_info("Filling RSB on context switch\n"); } + + /* Initialize Indirect Branch Prediction Barrier if supported */ + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) || + boot_cpu_has(X86_FEATURE_AMD_PRED_CMD)) { + setup_force_cpu_cap(X86_FEATURE_IBPB); + pr_info("Enabling Indirect Branch Prediction Barrier\n"); + } } #undef pr_fmt -- 2.7.4
[RFC 00/10] Speculation Control feature support
Start using the newly-added microcode features for speculation control on both Intel and AMD CPUs to protect against Spectre v2. This patch series covers interrupts, system calls, context switching between processes, and context switching between VMs. It also exposes Indirect Branch Prediction Barrier MSR, aka IBPB MSR, to KVM guests. TODO: - Introduce a microcode blacklist to disable the feature for broken microcodes. - Restrict/Unrestrict the speculation (by toggling IBRS) around VMExit and VMEnter for KVM and expose IBRS to guests. Ashok Raj (1): x86/kvm: Add IBPB support David Woodhouse (1): x86/speculation: Add basic IBRS support infrastructure KarimAllah Ahmed (1): x86: Simplify spectre_v2 command line parsing Thomas Gleixner (4): x86/speculation: Add basic support for IBPB x86/speculation: Use Indirect Branch Prediction Barrier in context switch x86/speculation: Add inlines to control Indirect Branch Speculation x86/idle: Control Indirect Branch Speculation in idle Tim Chen (3): x86/mm: Only flush indirect branches when switching into non dumpable process x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation x86/enter: Use IBRS on syscall and interrupts Documentation/admin-guide/kernel-parameters.txt | 1 + arch/x86/entry/calling.h| 73 ++ arch/x86/entry/entry_64.S | 35 - arch/x86/entry/entry_64_compat.S| 21 ++- arch/x86/include/asm/cpufeatures.h | 2 + arch/x86/include/asm/mwait.h| 14 ++ arch/x86/include/asm/nospec-branch.h| 54 ++- arch/x86/kernel/cpu/bugs.c | 183 +++- arch/x86/kernel/process.c | 14 ++ arch/x86/kvm/svm.c | 14 ++ arch/x86/kvm/vmx.c | 4 + arch/x86/mm/tlb.c | 21 ++- 12 files changed, 359 insertions(+), 77 deletions(-) Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Ashok Raj Cc: Asit Mallick Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Janakarajan Natarajan Cc: Joerg Roedel Cc: Jun Nakajima Cc: Laura Abbott Cc: Linus Torvalds Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Tim Chen Cc: Tom Lendacky Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: x...@kernel.org -- 2.7.4
[RFC 03/10] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
From: Thomas Gleixner [peterz: comment] Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: David Woodhouse --- arch/x86/mm/tlb.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index a156195..304de7d 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -6,13 +6,14 @@ #include #include #include +#include #include #include +#include #include #include #include -#include /* * TLB flushing, formerly SMP-only @@ -220,6 +221,13 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, u16 new_asid; bool need_flush; + /* +* Avoid user/user BTB poisoning by flushing the branch predictor +* when switching between processes. This stops one process from +* doing Spectre-v2 attacks on another. +*/ + indirect_branch_prediction_barrier(); + if (IS_ENABLED(CONFIG_VMAP_STACK)) { /* * If our current stack is in vmalloc space and isn't -- 2.7.4
[RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation
From: Tim Chen Create macros to control Indirect Branch Speculation. Name them so they reflect what they are actually doing. The macros are used to restrict and unrestrict the indirect branch speculation. They do not *disable* (or *enable*) indirect branch speculation. A trip back to user-space after *restricting* speculation would still affect the BTB. Quoting from a commit by Tim Chen: """ If IBRS is set, near returns and near indirect jumps/calls will not allow their predicted target address to be controlled by code that executed in a less privileged prediction mode *BEFORE* the IBRS mode was last written with a value of 1 or on another logical processor so long as all Return Stack Buffer (RSB) entries from the previous less privileged prediction mode are overwritten. Thus a near indirect jump/call/return may be affected by code in a less privileged prediction mode that executed *AFTER* IBRS mode was last written with a value of 1. """ [ tglx: Changed macro names and rewrote changelog ] [ karahmed: changed macro names *again* and rewrote changelog ] Signed-off-by: Tim Chen Signed-off-by: Thomas Gleixner Signed-off-by: KarimAllah Ahmed Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Peter Zijlstra Cc: Greg KH Cc: Dave Hansen Cc: Andy Lutomirski Cc: Paolo Bonzini Cc: Dan Williams Cc: Arjan Van De Ven Cc: Linus Torvalds Cc: David Woodhouse Cc: Ashok Raj Link: https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b45d788d61a.1515542293.git.tim.c.c...@linux.intel.com Signed-off-by: David Woodhouse --- arch/x86/entry/calling.h | 73 1 file changed, 73 insertions(+) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 3f48f69..5aafb51 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -6,6 +6,8 @@ #include #include #include +#include +#include /* @@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel is built with .Lafter_call_\@: #endif .endm + +/* + * IBRS related macros + */ +.macro PUSH_MSR_REGS + pushq %rax + pushq %rcx + pushq %rdx +.endm + +.macro POP_MSR_REGS + popq%rdx + popq%rcx + popq%rax +.endm + +.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req + movl\msr_nr, %ecx + movl\edx_val, %edx + movl\eax_val, %eax + wrmsr +.endm + +.macro RESTRICT_IB_SPEC + ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS + PUSH_MSR_REGS + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS + POP_MSR_REGS +.Lskip_\@: +.endm + +.macro UNRESTRICT_IB_SPEC + ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS + PUSH_MSR_REGS + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0 + POP_MSR_REGS +.Lskip_\@: +.endm + +.macro RESTRICT_IB_SPEC_CLOBBER + ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS +.Lskip_\@: +.endm + +.macro UNRESTRICT_IB_SPEC_CLOBBER + ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0 +.Lskip_\@: +.endm + +.macro RESTRICT_IB_SPEC_SAVE_AND_CLOBBER save_reg:req + ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS + movl$MSR_IA32_SPEC_CTRL, %ecx + rdmsr + movl%eax, \save_reg + movl$0, %edx + movl$SPEC_CTRL_IBRS, %eax + wrmsr +.Lskip_\@: +.endm + +.macro RESTORE_IB_SPEC_CLOBBER save_reg:req + ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS + /* Set IBRS to the value saved in the save_reg */ + movl$MSR_IA32_SPEC_CTRL, %ecx + movl$0, %edx + movl\save_reg, %eax + wrmsr +.Lskip_\@: +.endm -- 2.7.4
[RFC 07/10] x86: Simplify spectre_v2 command line parsing
Signed-off-by: KarimAllah Ahmed --- arch/x86/kernel/cpu/bugs.c | 106 + 1 file changed, 58 insertions(+), 48 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 1d5e12f..349c7f4 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -99,13 +99,13 @@ static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE; static void __init spec2_print_if_insecure(const char *reason) { if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) - pr_info("%s\n", reason); + pr_info("%s selected on command line.\n", reason); } static void __init spec2_print_if_secure(const char *reason) { if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) - pr_info("%s\n", reason); + pr_info("%s selected on command line.\n", reason); } static inline bool retp_compiler(void) @@ -120,61 +120,71 @@ static inline bool match_option(const char *arg, int arglen, const char *opt) return len == arglen && !strncmp(arg, opt, len); } +static struct { + char *option; + enum spectre_v2_mitigation_cmd cmd; + bool secure; +} mitigation_options[] = { + { "off", SPECTRE_V2_CMD_NONE, false }, + { "on",SPECTRE_V2_CMD_FORCE, true }, + { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, + { "retpoline,amd", SPECTRE_V2_CMD_RETPOLINE_AMD, false }, + { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false }, + { "ibrs", SPECTRE_V2_CMD_IBRS, false }, + { "auto", SPECTRE_V2_CMD_AUTO, false }, +}; + +static const int mitigation_options_count = sizeof(mitigation_options) / + sizeof(mitigation_options[0]); + static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) { char arg[20]; - int ret; + int ret, i; + enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO; + + if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) + return SPECTRE_V2_CMD_NONE; ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); - if (ret > 0) { - if (match_option(arg, ret, "off")) { - goto disable; - } else if (match_option(arg, ret, "on")) { - spec2_print_if_secure("force enabled on command line."); - return SPECTRE_V2_CMD_FORCE; - } else if (match_option(arg, ret, "retpoline")) { - if (!IS_ENABLED(CONFIG_RETPOLINE)) { - pr_err("retpoline selected but not compiled in. Switching to AUTO select\n"); - return SPECTRE_V2_CMD_AUTO; - } - spec2_print_if_insecure("retpoline selected on command line."); - return SPECTRE_V2_CMD_RETPOLINE; - } else if (match_option(arg, ret, "retpoline,amd")) { - if (!IS_ENABLED(CONFIG_RETPOLINE)) { - pr_err("retpoline,amd selected but not compiled in. Switching to AUTO select\n"); - return SPECTRE_V2_CMD_AUTO; - } - if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) { - pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n"); - return SPECTRE_V2_CMD_AUTO; - } - spec2_print_if_insecure("AMD retpoline selected on command line."); - return SPECTRE_V2_CMD_RETPOLINE_AMD; - } else if (match_option(arg, ret, "retpoline,generic")) { - if (!IS_ENABLED(CONFIG_RETPOLINE)) { - pr_err("retpoline,generic selected but not compiled in. Switching to AUTO select\n"); - return SPECTRE_V2_CMD_AUTO; - } - spec2_print_if_insecure("generic retpoline selected on command line."); - return SPECTRE_V2_CMD_RETPOLINE_GENERIC; - } else if (match_option(arg, ret, "ibrs")) { - if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL)) { - pr_err("IBRS selected but no CPU support. Switching to AUTO select\n"); - return SPECTRE_V2_CMD_AUTO; -
[RFC 05/10] x86/speculation: Add basic IBRS support infrastructure
From: David Woodhouse Not functional yet; just add the handling for it in the Spectre v2 mitigation selection, and the X86_FEATURE_IBRS flag which will control the code to be added in later patches. Also take the #ifdef CONFIG_RETPOLINE from around the RSB-stuffing; IBRS mode will want that too. For now we are auto-selecting IBRS on Skylake. We will probably end up changing that but for now let's default to the safest option. XX: Do we want a microcode blacklist? [karahmed: simplify the switch block and get rid of all the magic] Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed --- Documentation/admin-guide/kernel-parameters.txt | 1 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/nospec-branch.h| 2 - arch/x86/kernel/cpu/bugs.c | 108 +++- 4 files changed, 68 insertions(+), 44 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 8122b5f..e597650 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3932,6 +3932,7 @@ retpoline - replace indirect branches retpoline,generic - google's original retpoline retpoline,amd - AMD-specific minimal thunk + ibrs - Intel: Indirect Branch Restricted Speculation Not specifying this option is equivalent to spectre_v2=auto. diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 8ec9588..ae86ad9 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -211,6 +211,7 @@ #define X86_FEATURE_AMD_PRED_CMD ( 7*32+17) /* Prediction Command MSR (AMD) */ #define X86_FEATURE_MBA( 7*32+18) /* Memory Bandwidth Allocation */ #define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */ +#define X86_FEATURE_IBRS ( 7*32+21) /* Use IBRS for Spectre v2 safety */ /* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index c333c95..8759449 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -205,7 +205,6 @@ extern char __indirect_thunk_end[]; */ static inline void vmexit_fill_RSB(void) { -#ifdef CONFIG_RETPOLINE unsigned long loops; asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE @@ -215,7 +214,6 @@ static inline void vmexit_fill_RSB(void) "910:" : "=r" (loops), ASM_CALL_CONSTRAINT : : "memory" ); -#endif } static inline void indirect_branch_prediction_barrier(void) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 96548ff..1d5e12f 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -79,6 +79,7 @@ enum spectre_v2_mitigation_cmd { SPECTRE_V2_CMD_RETPOLINE, SPECTRE_V2_CMD_RETPOLINE_GENERIC, SPECTRE_V2_CMD_RETPOLINE_AMD, + SPECTRE_V2_CMD_IBRS, }; static const char *spectre_v2_strings[] = { @@ -87,6 +88,7 @@ static const char *spectre_v2_strings[] = { [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", + [SPECTRE_V2_IBRS] = "Mitigation: Indirect Branch Restricted Speculation", }; #undef pr_fmt @@ -132,9 +134,17 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) spec2_print_if_secure("force enabled on command line."); return SPECTRE_V2_CMD_FORCE; } else if (match_option(arg, ret, "retpoline")) { + if (!IS_ENABLED(CONFIG_RETPOLINE)) { + pr_err("retpoline selected but not compiled in. Switching to AUTO select\n"); + return SPECTRE_V2_CMD_AUTO; + } spec2_print_if_insecure("retpoline selected on command line."); return SPECTRE_V2_CMD_RETPOLINE; } else if (match_option(arg, ret, "retpoline,amd")) { + if (!IS_ENABLED(CONFIG_RETPOLINE)) { + pr_err("retpoline,amd selected but not compiled in. Switching to AUTO select\n"); + return SPECTRE_V2_CMD_AUTO
[RFC 04/10] x86/mm: Only flush indirect branches when switching into non dumpable process
From: Tim Chen Flush indirect branches when switching into a process that marked itself non dumpable. This protects high value processes like gpg better, without having too high performance overhead. Signed-off-by: Andi Kleen Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed --- arch/x86/mm/tlb.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 304de7d..f64e80c 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -225,8 +225,19 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * Avoid user/user BTB poisoning by flushing the branch predictor * when switching between processes. This stops one process from * doing Spectre-v2 attacks on another. +* +* As an optimization: Flush indirect branches only when +* switching into processes that disable dumping. +* +* This will not flush when switching into kernel threads. +* But it would flush when switching into idle and back +* +* It might be useful to have a one-off cache here +* to also not flush the idle case, but we would need some +* kind of stable sequence number to remember the previous mm. */ - indirect_branch_prediction_barrier(); + if (tsk && tsk->mm && get_dumpable(tsk->mm) != SUID_DUMP_USER) + indirect_branch_prediction_barrier(); if (IS_ENABLED(CONFIG_VMAP_STACK)) { /* -- 2.7.4
[RFC 10/10] x86/enter: Use IBRS on syscall and interrupts
From: Tim Chen Stop Indirect Branch Speculation on every user space to kernel space transition and reenable it when returning to user space./ The NMI interrupt save/restore of IBRS state was based on Andrea Arcangeli's implementation. Here's an explanation by Dave Hansen on why we save IBRS state for NMI. The normal interrupt code uses the 'error_entry' path which uses the Code Segment (CS) of the instruction that was interrupted to tell whether it interrupted the kernel or userspace and thus has to switch IBRS, or leave it alone. The NMI code is different. It uses 'paranoid_entry' because it can interrupt the kernel while it is running with a userspace IBRS (and %GS and CR3) value, but has a kernel CS. If we used the same approach as the normal interrupt code, we might do the following; SYSENTER_entry <-- NMI HERE IBRS=1 do_something() IBRS=0 SYSRET The NMI code might notice that we are running in the kernel and decide that it is OK to skip the IBRS=1. This would leave it running unprotected with IBRS=0, which is bad. However, if we unconditionally set IBRS=1, in the NMI, we might get the following case: SYSENTER_entry IBRS=1 do_something() IBRS=0 <-- NMI HERE (set IBRS=1) SYSRET and we would return to userspace with IBRS=1. Userspace would run slowly until we entered and exited the kernel again. Instead of those two approaches, we chose a third one where we simply save the IBRS value in a scratch register (%r13) and then restore that value, verbatim. [karahmed use the new SPEC_CTRL_IBRS defines] Co-developed-by: Andrea Arcangeli Signed-off-by: Andrea Arcangeli Signed-off-by: Tim Chen Signed-off-by: Thomas Gleixner Signed-off-by: KarimAllah Ahmed Cc: Andi Kleen Cc: Peter Zijlstra Cc: Greg KH Cc: Dave Hansen Cc: Andy Lutomirski Cc: Paolo Bonzini Cc: Dan Williams Cc: Arjan Van De Ven Cc: Linus Torvalds Cc: David Woodhouse Cc: Ashok Raj Link: https://lkml.kernel.org/r/d5e4c03ec290c61dfbe5a769f7287817283fa6b7.1515542293.git.tim.c.c...@linux.intel.com --- arch/x86/entry/entry_64.S| 35 ++- arch/x86/entry/entry_64_compat.S | 21 +++-- 2 files changed, 53 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 63f4320..b3d90cf 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -171,6 +171,8 @@ ENTRY(entry_SYSCALL_64_trampoline) /* Load the top of the task stack into RSP */ movqCPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp + /* Restrict indirect branch speculation */ + RESTRICT_IB_SPEC /* Start building the simulated IRET frame. */ pushq $__USER_DS /* pt_regs->ss */ @@ -214,6 +216,8 @@ ENTRY(entry_SYSCALL_64) */ movq%rsp, PER_CPU_VAR(rsp_scratch) movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp + /* Restrict Indirect Branch Speculation */ + RESTRICT_IB_SPEC TRACE_IRQS_OFF @@ -409,6 +413,8 @@ syscall_return_via_sysret: pushq RSP-RDI(%rdi) /* RSP */ pushq (%rdi) /* RDI */ + /* Unrestrict Indirect Branch Speculation */ + UNRESTRICT_IB_SPEC /* * We are on the trampoline stack. All regs except RDI are live. * We can do future final exit work right here. @@ -757,11 +763,12 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode) /* Push user RDI on the trampoline stack. */ pushq (%rdi) + /* Unrestrict Indirect Branch Speculation */ + UNRESTRICT_IB_SPEC /* * We are on the trampoline stack. All regs except RDI are live. * We can do future final exit work right here. */ - SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi /* Restore RDI. */ @@ -849,6 +856,13 @@ native_irq_return_ldt: SWAPGS /* to kernel GS */ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi /* to kernel CR3 */ + /* +* There is no point in disabling Indirect Branch Speculation +* here as this is going to return to user space immediately +* after fixing ESPFIX stack. There is no vulnerable code +* to protect so spare two MSR writes. +*/ + movqPER_CPU_VAR(espfix_waddr), %rdi movq%rax, (0*8)(%rdi) /* user RAX */ movq(1*8)(%rsp), %rax /* user RIP */ @@ -982,6 +996,8 @@ ENTRY(switch_to_thread_stack) SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi movq%rsp, %rdi movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp + /* Restrict Indirect Branch Speculation */ + RESTRICT_IB_SPEC UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI pushq 7*8(%rdi) /* regs->ss */
[RFC 08/10] x86/idle: Control Indirect Branch Speculation in idle
From: Thomas Gleixner Indirect Branch Speculation (IBS) is controlled per physical core. If one thread disables it then it's disabled for the core. If a thread enters idle it makes sense to reenable IBS so the sibling thread can run with full speculation enabled in user space. This makes only sense in mwait_idle_with_hints() because mwait_idle() can serve an interrupt immediately before speculation can be stopped again. SKL which requires IBRS should use mwait_idle_with_hints() so this is a non issue and in the worst case a missed optimization. Originally-by: Tim Chen Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/mwait.h | 14 ++ arch/x86/kernel/process.c| 14 ++ 2 files changed, 28 insertions(+) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 39a2fb2..f173072 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -6,6 +6,7 @@ #include #include +#include #define MWAIT_SUBSTATE_MASK0xf #define MWAIT_CSTATE_MASK 0xf @@ -106,7 +107,20 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) mb(); } + /* +* Indirect Branch Speculation (IBS) is controlled per +* physical core. If one thread disables it, then it's +* disabled on all threads of the core. The kernel disables +* it on entry from user space. Reenable it on the thread +* which goes idle so the other thread has a chance to run +* with full speculation enabled in userspace. +*/ + unrestrict_branch_speculation(); __monitor((void *)¤t_thread_info()->flags, 0, 0); + /* +* Restrict IBS again to protect kernel execution. +*/ + restrict_branch_speculation(); if (!need_resched()) __mwait(eax, ecx); } diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 3cb2486..f941c5d 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -461,6 +461,20 @@ static __cpuidle void mwait_idle(void) mb(); /* quirk */ } + /* +* Indirect Branch Speculation (IBS) is controlled per +* physical core. If one thread disables it, then it's +* disabled on all threads of the core. The kernel disables +* it on entry from user space. For __sti_mwait() it's +* wrong to reenable it because an interrupt can be served +* before speculation can be stopped again. +* +* To plug that hole the interrupt entry code would need to +* save current state and restore. Not worth the trouble as +* SKL should not use mwait_idle(). It should use +* mwait_idle_with_hints() which can do speculation control +* safely. +*/ __monitor((void *)¤t_thread_info()->flags, 0, 0); if (!need_resched()) __sti_mwait(0, 0); -- 2.7.4
[RFC 02/10] x86/kvm: Add IBPB support
From: Ashok Raj Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor barriers on switching between VMs to avoid inter VM specte-v2 attacks. [peterz: rebase and changelog rewrite] [dwmw2: fixes] [karahmed: - vmx: expose PRED_CMD whenever it is available - svm: only pass through IBPB if it is available] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: David Woodhouse Cc: Paolo Bonzini Signed-off-by: Ashok Raj Signed-off-by: Peter Zijlstra (Intel) Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/svm.c | 14 ++ arch/x86/kvm/vmx.c | 4 2 files changed, 18 insertions(+) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2744b973..cfdb9ab 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -529,6 +529,7 @@ struct svm_cpu_data { struct kvm_ldttss_desc *tss_desc; struct page *save_area; + struct vmcb *current_vmcb; }; static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); @@ -918,6 +919,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm) set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1); } + + if (boot_cpu_has(X86_FEATURE_AMD_PRED_CMD)) + set_msr_interception(msrpm, MSR_IA32_PRED_CMD, 1, 1); } static void add_msr_offset(u32 offset) @@ -1706,11 +1710,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); + /* +* The vmcb page can be recycled, causing a false negative in +* svm_vcpu_load(). So do a full IBPB now. +*/ + indirect_branch_prediction_barrier(); } static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_svm *svm = to_svm(vcpu); + struct svm_cpu_data *sd = per_cpu(svm_data, cpu); int i; if (unlikely(cpu != vcpu->cpu)) { @@ -1739,6 +1749,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (static_cpu_has(X86_FEATURE_RDTSCP)) wrmsrl(MSR_TSC_AUX, svm->tsc_aux); + if (sd->current_vmcb != svm->vmcb) { + sd->current_vmcb = svm->vmcb; + indirect_branch_prediction_barrier(); + } avic_vcpu_load(vcpu, cpu); } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d1e25db..3b64de2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2279,6 +2279,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; vmcs_load(vmx->loaded_vmcs->vmcs); + indirect_branch_prediction_barrier(); } if (!already_loaded) { @@ -6791,6 +6792,9 @@ static __init int hardware_setup(void) kvm_tsc_scaling_ratio_frac_bits = 48; } + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) + vmx_disable_intercept_for_msr(MSR_IA32_PRED_CMD, false); + vmx_disable_intercept_for_msr(MSR_FS_BASE, false); vmx_disable_intercept_for_msr(MSR_GS_BASE, false); vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true); -- 2.7.4
[RFC 06/10] x86/speculation: Add inlines to control Indirect Branch Speculation
From: Thomas Gleixner XX: I am utterly unconvinced that having "friendly, self-explanatory" names for the IBRS-frobbing inlines is useful. There be dragons here for anyone who isn't intimately familiar with what's going on, and it's almost better to just call it IBRS, put a reference to the spec, and have a clear "you must be →this← tall to ride." [karahmed: switch to using ALTERNATIVES instead of static_cpu_has] [dwmw2: wrmsr args inside the ALTERNATIVE again, bikeshed naming] Signed-off-by: Thomas Gleixner Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/include/asm/nospec-branch.h | 36 1 file changed, 36 insertions(+) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 8759449..5be3443 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -232,5 +232,41 @@ static inline void indirect_branch_prediction_barrier(void) : "memory"); } +/* + * This also performs a barrier, and setting it again when it was already + * set is NOT a no-op. + */ +static inline void restrict_branch_speculation(void) +{ + unsigned long ax, cx, dx; + + asm volatile(ALTERNATIVE("", +"movl %[msr], %%ecx\n\t" +"movl %[val], %%eax\n\t" +"movl $0, %%edx\n\t" +"wrmsr", +X86_FEATURE_IBRS) +: "=a" (ax), "=c" (cx), "=d" (dx) +: [msr] "i" (MSR_IA32_SPEC_CTRL), + [val] "i" (SPEC_CTRL_IBRS) +: "memory"); +} + +static inline void unrestrict_branch_speculation(void) +{ + unsigned long ax, cx, dx; + + asm volatile(ALTERNATIVE("", +"movl %[msr], %%ecx\n\t" +"movl %[val], %%eax\n\t" +"movl $0, %%edx\n\t" +"wrmsr", +X86_FEATURE_IBRS) +: "=a" (ax), "=c" (cx), "=d" (dx) +: [msr] "i" (MSR_IA32_SPEC_CTRL), + [val] "i" (0) +: "memory"); +} + #endif /* __ASSEMBLY__ */ #endif /* __NOSPEC_BRANCH_H__ */ -- 2.7.4
Re: [RFC 10/10] x86/enter: Use IBRS on syscall and interrupts
On 01/21/2018 02:50 PM, Konrad Rzeszutek Wilk wrote: On Sat, Jan 20, 2018 at 08:23:01PM +0100, KarimAllah Ahmed wrote: From: Tim Chen Stop Indirect Branch Speculation on every user space to kernel space transition and reenable it when returning to user space./ How about interrupts? That is should .macro interrupt have the same treatment? RESTRICT_IB_SPEC is called in switch_to_thread_stack which is almost the first thing called from ".macro interrupt". Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH v2 5/8] x86/speculation: Add basic support for IBPB
On 01/21/2018 07:06 PM, Borislav Petkov wrote: On Sun, Jan 21, 2018 at 09:49:06AM +, David Woodhouse wrote: From: Thomas Gleixner Expose indirect_branch_prediction_barrier() for use in subsequent patches. [karahmed: remove the special-casing of skylake for using IBPB (wtf?), switch to using ALTERNATIVES instead of static_cpu_has] [dwmw2:set up ax/cx/dx in the asm too so it gets NOP'd out] Signed-off-by: Thomas Gleixner Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/nospec-branch.h | 16 arch/x86/kernel/cpu/bugs.c | 7 +++ 3 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 8c9e5c0..cf28399 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -207,6 +207,7 @@ #define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* AMD Retpoline mitigation for Spectre variant 2 */ #define X86_FEATURE_INTEL_PPIN( 7*32+14) /* Intel Processor Inventory Number */ +#define X86_FEATURE_IBPB ( 7*32+16) /* Using Indirect Branch Prediction Barrier */ Right, and as AMD has a separate bit for this in CPUID_8008_EBX[12], we probably don't really need the synthetic bit here but simply use the one at (13*32+12) - word 13. #define X86_FEATURE_AMD_PRED_CMD ( 7*32+17) /* Prediction Command MSR (AMD) */ #define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ #define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */ diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 4ad4108..c333c95 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -218,5 +218,21 @@ static inline void vmexit_fill_RSB(void) #endif } +static inline void indirect_branch_prediction_barrier(void) I like ibp_barrier() better. +{ + unsigned long ax, cx, dx; + + asm volatile(ALTERNATIVE("", +"movl %[msr], %%ecx\n\t" +"movl %[val], %%eax\n\t" +"movl $0, %%edx\n\t" +"wrmsr", +X86_FEATURE_IBPB) +: "=a" (ax), "=c" (cx), "=d" (dx) +: [msr] "i" (MSR_IA32_PRED_CMD), + [val] "i" (PRED_CMD_IBPB) +: "memory"); +} Btw, we can simplify this a bit by dropping the inputs and marking the 3 GPRs as clobbered: alternative_input("", "mov $0x49, %%ecx\n\t" "mov $1, %%eax\n\t" "xor %%edx, %%edx\n\t" "wrmsr\n\t", X86_FEATURE_IBPB, ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory")); The "memory" clobber is probably not really needed but it wouldn't hurt... Also, above says: switch to using ALTERNATIVES instead of static_cpu_has] Why? if (static_cpu_has(X86_FEATURE_IBPB)) wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0); It can't get any more readable than this. Why even f*ck with alternatives? Because static_cpu_has is an indirect branch which will cause speculation and we have to avoid that. David told me that Peter was working on a fix for static_cpu_has to avoid the speculation but I do not know what is the status of this. + #endif /* __ASSEMBLY__ */ #endif /* __NOSPEC_BRANCH_H__ */ diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 390b3dc..96548ff 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -249,6 +249,13 @@ static void __init spectre_v2_select_mitigation(void) setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); pr_info("Filling RSB on context switch\n"); } + + /* Initialize Indirect Branch Prediction Barrier if supported */ + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) || + boot_cpu_has(X86_FEATURE_AMD_PRED_CMD)) { + setup_force_cpu_cap(X86_FEATURE_IBPB); + pr_info("Enabling Indirect Branch Prediction Barrier\n"); We don't really need the pr_info as "ibpb" will appear in /proc/cpuinfo. Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [net-next v2] ipv6: sr: export some functions of seg6local
On Thu, 04 Jan 2018 13:37:33 -0500 (EST) David Miller wrote: > From: Ahmed Abdelsalam > Date: Sat, 30 Dec 2017 00:08:32 +0100 > > > Some functions of seg6local are very useful to process SRv6 > > encapsulated packets > > > > This patch exports some functions of seg6local that are useful and > > can be re-used at different parts of the kernel. > > > > The set of exported functions are: > > (1) seg6_get_srh() > > (2) seg6_advance_nextseg() > > (3) seg6_lookup_nexthop > > > > Signed-off-by: Ahmed Abdelsalam > > There is no way I am applying this as-is. > > Until you can submit this alongside an in-tree user of these symbols, > these symbol exports are not going to happen. > > Thank you. I will submit the other patches once I'm done with the testing. Thanks -- Ahmed
Re: [net-next] netfilter: add segment routing header 'srh' match
On Sun, 7 Jan 2018 00:40:03 +0100 Pablo Neira Ayuso wrote: > Hi Ahmed, > > On Fri, Dec 29, 2017 at 12:07:52PM +0100, Ahmed Abdelsalam wrote: > > It allows matching packets based on Segment Routing Header > > (SRH) information. > > The implementation considers revision 7 of the SRH draft. > > https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07 > > > > Currently supported match options include: > > (1) Next Header > > (2) Hdr Ext Len > > (3) Segments Left > > (4) Last Entry > > (5) Tag value of SRH > > > > Signed-off-by: Ahmed Abdelsalam > > --- > > include/uapi/linux/netfilter_ipv6/ip6t_srh.h | 63 ++ > > net/ipv6/netfilter/Kconfig | 9 ++ > > net/ipv6/netfilter/Makefile | 1 + > > net/ipv6/netfilter/ip6t_srh.c| 165 > > +++ > > 4 files changed, 238 insertions(+) > > create mode 100644 include/uapi/linux/netfilter_ipv6/ip6t_srh.h > > create mode 100644 net/ipv6/netfilter/ip6t_srh.c > > > > diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h > > b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h > > new file mode 100644 > > index 000..1b5dbd8 > > --- /dev/null > > +++ b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h > > @@ -0,0 +1,63 @@ > > +/** > > + * Definitions for Segment Routing Header 'srh' match > > + * > > + * Author: > > + * Ahmed Abdelsalam > > + */ > > Please, add this in SPDX format instead. > > See include/uapi/linux/netfilter/xt_owner.h for instance. > Ok > > +#ifndef _IP6T_SRH_H > > +#define _IP6T_SRH_H > > + > > +#include > > +#include > > + > > +/* Values for "mt_flags" field in struct ip6t_srh */ > > +#define IP6T_SRH_NEXTHDR0x0001 > > +#define IP6T_SRH_LEN_EQ 0x0002 > > +#define IP6T_SRH_LEN_GT 0x0004 > > +#define IP6T_SRH_LEN_LT 0x0008 > > +#define IP6T_SRH_SEGS_EQ0x0010 > > +#define IP6T_SRH_SEGS_GT0x0020 > > +#define IP6T_SRH_SEGS_LT0x0040 > > +#define IP6T_SRH_LAST_EQ0x0080 > > +#define IP6T_SRH_LAST_GT0x0100 > > +#define IP6T_SRH_LAST_LT0x0200 > > +#define IP6T_SRH_TAG0x0400 > > +#define IP6T_SRH_MASK 0x07FF > > + > > +/* Values for "mt_invflags" field in struct ip6t_srh */ > > +#define IP6T_SRH_INV_NEXTHDR0x0001 > > +#define IP6T_SRH_INV_LEN_EQ 0x0002 > > +#define IP6T_SRH_INV_LEN_GT 0x0004 > > +#define IP6T_SRH_INV_LEN_LT 0x0008 > > +#define IP6T_SRH_INV_SEGS_EQ0x0010 > > +#define IP6T_SRH_INV_SEGS_GT0x0020 > > +#define IP6T_SRH_INV_SEGS_LT0x0040 > > +#define IP6T_SRH_INV_LAST_EQ0x0080 > > +#define IP6T_SRH_INV_LAST_GT0x0100 > > +#define IP6T_SRH_INV_LAST_LT0x0200 > > +#define IP6T_SRH_INV_TAG0x0400 > > +#define IP6T_SRH_INV_MASK 0x07FF > > Looking at all these EQ, GT, LT... I think this should be very easy to > implement in nf_tables with no kernel changes. > > You only need to add the protocol definition to: > > nftables/src/exthdr.c > > Would you have a look into this? This would be very much appreciated > to we keep nftables in sync with what we have in iptables. Yes, I look into it. I will send you a patch for nf_tables as well. > > > + > > +/** > > + * struct ip6t_srh - SRH match options > > + * @ next_hdr: Next header field of SRH > > + * @ hdr_len: Extension header length field of SRH > > + * @ segs_left: Segments left field of SRH > > + * @ last_entry: Last entry field of SRH > > + * @ tag: Tag field of SRH > > + * @ mt_flags: match options > > + * @ mt_invflags: Invert the sense of match options > > + */ > > + > > +struct ip6t_srh { > > + __u8next_hdr; > > + __u8hdr_len; > > + __u8segs_left; > > + __u8last_entry; > > + __u16 tag; > > + __u16 mt_flags; > > + __u16 mt_invflags; > > +}; > > + > > +#endif /*_IP6T_SRH_H*/ > > diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig > > index 6acb2ee..e1818eb 100644 > > --- a/net/ipv6/netfilter/Kconfig > > +++ b/net/ipv6/netfilter/Kconfig > > @@ -232,6 +232,15 @@ config IP6_NF_MATCH_RT > > > > To compile it as a module, choose M here. If unsure,
[net-next v2] netfilter: add segment routing header 'srh' match
It allows matching packets based on Segment Routing Header (SRH) information. The implementation considers revision 7 of the SRH draft. https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07 Currently supported match options include: (1) Next Header (2) Hdr Ext Len (3) Segments Left (4) Last Entry (5) Tag value of SRH Signed-off-by: Ahmed Abdelsalam --- include/uapi/linux/netfilter_ipv6/ip6t_srh.h | 57 ++ net/ipv6/netfilter/Kconfig | 9 ++ net/ipv6/netfilter/Makefile | 1 + net/ipv6/netfilter/ip6t_srh.c| 161 +++ 4 files changed, 228 insertions(+) create mode 100644 include/uapi/linux/netfilter_ipv6/ip6t_srh.h create mode 100644 net/ipv6/netfilter/ip6t_srh.c diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h new file mode 100644 index 000..cebf4e8 --- /dev/null +++ b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h @@ -0,0 +1,57 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _IP6T_SRH_H +#define _IP6T_SRH_H + +#include +#include + +/* Values for "mt_flags" field in struct ip6t_srh */ +#define IP6T_SRH_NEXTHDR0x0001 +#define IP6T_SRH_LEN_EQ 0x0002 +#define IP6T_SRH_LEN_GT 0x0004 +#define IP6T_SRH_LEN_LT 0x0008 +#define IP6T_SRH_SEGS_EQ0x0010 +#define IP6T_SRH_SEGS_GT0x0020 +#define IP6T_SRH_SEGS_LT0x0040 +#define IP6T_SRH_LAST_EQ0x0080 +#define IP6T_SRH_LAST_GT0x0100 +#define IP6T_SRH_LAST_LT0x0200 +#define IP6T_SRH_TAG0x0400 +#define IP6T_SRH_MASK 0x07FF + +/* Values for "mt_invflags" field in struct ip6t_srh */ +#define IP6T_SRH_INV_NEXTHDR0x0001 +#define IP6T_SRH_INV_LEN_EQ 0x0002 +#define IP6T_SRH_INV_LEN_GT 0x0004 +#define IP6T_SRH_INV_LEN_LT 0x0008 +#define IP6T_SRH_INV_SEGS_EQ0x0010 +#define IP6T_SRH_INV_SEGS_GT0x0020 +#define IP6T_SRH_INV_SEGS_LT0x0040 +#define IP6T_SRH_INV_LAST_EQ0x0080 +#define IP6T_SRH_INV_LAST_GT0x0100 +#define IP6T_SRH_INV_LAST_LT0x0200 +#define IP6T_SRH_INV_TAG0x0400 +#define IP6T_SRH_INV_MASK 0x07FF + +/** + * struct ip6t_srh - SRH match options + * @ next_hdr: Next header field of SRH + * @ hdr_len: Extension header length field of SRH + * @ segs_left: Segments left field of SRH + * @ last_entry: Last entry field of SRH + * @ tag: Tag field of SRH + * @ mt_flags: match options + * @ mt_invflags: Invert the sense of match options + */ + +struct ip6t_srh { + __u8next_hdr; + __u8hdr_len; + __u8segs_left; + __u8last_entry; + __u16 tag; + __u16 mt_flags; + __u16 mt_invflags; +}; + +#endif /*_IP6T_SRH_H*/ diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig index 6acb2ee..e1818eb 100644 --- a/net/ipv6/netfilter/Kconfig +++ b/net/ipv6/netfilter/Kconfig @@ -232,6 +232,15 @@ config IP6_NF_MATCH_RT To compile it as a module, choose M here. If unsure, say N. +config IP6_NF_MATCH_SRH +tristate '"srh" Segment Routing header match support' +depends on NETFILTER_ADVANCED +help + srh matching allows you to match packets based on the segment + routing header of the packet. + + To compile it as a module, choose M here. If unsure, say N. + # The targets config IP6_NF_TARGET_HL tristate '"HL" hoplimit target support' diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile index c6ee0cd..e0d51a9 100644 --- a/net/ipv6/netfilter/Makefile +++ b/net/ipv6/netfilter/Makefile @@ -54,6 +54,7 @@ obj-$(CONFIG_IP6_NF_MATCH_MH) += ip6t_mh.o obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_hbh.o obj-$(CONFIG_IP6_NF_MATCH_RPFILTER) += ip6t_rpfilter.o obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o +obj-$(CONFIG_IP6_NF_MATCH_SRH) += ip6t_srh.o # targets obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o diff --git a/net/ipv6/netfilter/ip6t_srh.c b/net/ipv6/netfilter/ip6t_srh.c new file mode 100644 index 000..9642164 --- /dev/null +++ b/net/ipv6/netfilter/ip6t_srh.c @@ -0,0 +1,161 @@ +/* Kernel module to match Segment Routing Header (SRH) parameters. */ + +/* Author: + * Ahmed Abdelsalam + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +/* Test a struct->mt_invflags and a boolean for inequali
[PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL
Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Signed-off-by: KarimAllah Ahmed Signed-off-by: Ashok Raj --- arch/x86/kvm/cpuid.c | 4 +++- arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/vmx.c | 63 3 files changed, 67 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..dc78095 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void) /* These are scattered features in cpufeatures.h. */ #define KVM_CPUID_BIT_AVX512_4VNNIW 2 #define KVM_CPUID_BIT_AVX512_4FMAPS 3 +#define KVM_CPUID_BIT_SPEC_CTRL 26 #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ + (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ? KF(SPEC_CTRL) : 0); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index cdc70a3..dcfe227 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX}, [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX}, + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aa8638a..1b743a0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -920,6 +920,9 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); + static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -2007,6 +2010,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, m->host[i].value = host_val; } +/* do not touch guest_val and host_val if the msr is not found */ +static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, + u64 *guest_val, u64 *host_val) +{ + unsigned i; + struct msr_autoload *m = &vmx->msr_autoload; + + for (i = 0; i < m->nr; ++i) + if (m->guest[i].index == msr) + break; + + if (i == m->nr) + return 1; + + if (guest_val) + *guest_val = m->guest[i].value; + if (host_val) + *host_val = m->host[i].value; + + return 0; +} + static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) { u64 guest_efer = vmx->vcpu.arch.efer; @@ -3203,7 +3228,9 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, */ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { + u64 spec_ctrl = 0; struct shared_msr_entry *msr; + struct vcpu_vmx *vmx = to_vmx(vcpu); switch (msr_info->index) { #ifdef CONFIG_X86_64 @@ -3223,6 +3250,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) + return 1; + + /* +* If the MSR is not in the atomic list yet, then it was never +* written to. So the MSR value will be '0'. +*/ + read_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, &spec_ctrl, NULL); + + msr_info->data = sp
[PATCH v2 2/4] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL
Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed --- v2: - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). - special case writing '0' in SPEC_CTRL to avoid confusing live-migration when the instance never used the MSR (dwmw@). - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). --- arch/x86/kvm/cpuid.c | 4 +++- arch/x86/kvm/vmx.c | 65 arch/x86/kvm/x86.c | 1 + 3 files changed, 69 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..32c0c14 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void) /* These are scattered features in cpufeatures.h. */ #define KVM_CPUID_BIT_AVX512_4VNNIW 2 #define KVM_CPUID_BIT_AVX512_4FMAPS 3 +#define KVM_CPUID_BIT_IBRS 26 #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ + (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aa8638a..dac564d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -920,6 +920,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -2007,6 +2009,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, m->host[i].value = host_val; } +/* do not touch guest_val and host_val if the msr is not found */ +static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, + u64 *guest_val, u64 *host_val) +{ + unsigned i; + struct msr_autoload *m = &vmx->msr_autoload; + + for (i = 0; i < m->nr; ++i) + if (m->guest[i].index == msr) + break; + + if (i == m->nr) + return 1; + + if (guest_val) + *guest_val = m->guest[i].value; + if (host_val) + *host_val = m->host[i].value; + + return 0; +} + static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) { u64 guest_efer = vmx->vcpu.arch.efer; @@ -3203,7 +3227,9 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, */ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { + u64 spec_ctrl = 0; struct shared_msr_entry *msr; + struct vcpu_vmx *vmx = to_vmx(vcpu); switch (msr_info->index) { #ifdef CONFIG_X86_64 @@ -3223,6 +3249,20 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) + return 1; + + /* +* If the MSR is not in the atomic list yet, then the guest +* never wrote a non-zero value to it yet i.e. the MSR value is +* '0'. +*/ + read_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, &spec_ctrl, NULL); + + msr_info->data = spec_ctrl; + break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); bre
[PATCH v2 1/4] x86: kvm: Update the reverse_cpuid list to include CPUID_7_EDX
Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: x...@kernel.org Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/cpuid.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index cdc70a3..dcfe227 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX}, [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX}, + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) -- 2.7.4
[PATCH v2 4/4] x86: vmx: Allow direct access to MSR_IA32_ARCH_CAPABILITIES
Add direct access to MSR_IA32_SPEC_CTRL for guests. Future intel processors will use this MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/cpuid.c | 4 +++- arch/x86/kvm/vmx.c | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 32c0c14..2339b1a 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -71,6 +71,7 @@ u64 kvm_supported_xcr0(void) #define KVM_CPUID_BIT_AVX512_4VNNIW 2 #define KVM_CPUID_BIT_AVX512_4FMAPS 3 #define KVM_CPUID_BIT_IBRS 26 +#define KVM_CPUID_BIT_ARCH_CAPABILITIES 29 #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -394,7 +395,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ - (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0); + (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0) | \ + (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) ? KF(ARCH_CAPABILITIES) : 0); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f82a44c..99cb761 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -9617,6 +9617,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (boot_cpu_has(X86_FEATURE_IBPB)) vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PRED_CMD, MSR_TYPE_RW); + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_ARCH_CAPABILITIES, MSR_TYPE_R); vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW); vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); -- 2.7.4
[PATCH v2 0/4] KVM: Expose speculation control feature to guests
Add direct access to speculation control MSRs for KVM guests. This allows the guest to protect itself against Spectre V2 using IBRS+IBPB instead of a retpoline+IBPB based approach. It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future Intel processors to indicate RDCL_NO and IBRS_ALL. Ashok Raj (1): x86/kvm: Add IBPB support KarimAllah Ahmed (3): x86: kvm: Update the reverse_cpuid list to include CPUID_7_EDX x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL x86: vmx: Allow direct access to MSR_IA32_ARCH_CAPABILITIES arch/x86/kvm/cpuid.c | 6 - arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/svm.c | 14 +++ arch/x86/kvm/vmx.c | 71 arch/x86/kvm/x86.c | 1 + 5 files changed, 92 insertions(+), 1 deletion(-) Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Ashok Raj Cc: Asit Mallick Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Janakarajan Natarajan Cc: Joerg Roedel Cc: Jun Nakajima Cc: Laura Abbott Cc: Linus Torvalds Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Tim Chen Cc: Tom Lendacky Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: x...@kernel.org -- 2.7.4
[PATCH v2 3/4] x86/kvm: Add IBPB support
From: Ashok Raj Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor barriers on switching between VMs to avoid inter VM Spectre-v2 attacks. [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD whenever it is available - svm: only pass through IBPB if it is available] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Signed-off-by: Ashok Raj Signed-off-by: Peter Zijlstra (Intel) Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/svm.c | 14 ++ arch/x86/kvm/vmx.c | 4 2 files changed, 18 insertions(+) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2744b973..c886e46 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -529,6 +529,7 @@ struct svm_cpu_data { struct kvm_ldttss_desc *tss_desc; struct page *save_area; + struct vmcb *current_vmcb; }; static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); @@ -918,6 +919,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm) set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1); } + + if (boot_cpu_has(X86_FEATURE_IBPB)) + set_msr_interception(msrpm, MSR_IA32_PRED_CMD, 1, 1); } static void add_msr_offset(u32 offset) @@ -1706,11 +1710,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); + /* +* The vmcb page can be recycled, causing a false negative in +* svm_vcpu_load(). So do a full IBPB now. +*/ + indirect_branch_prediction_barrier(); } static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_svm *svm = to_svm(vcpu); + struct svm_cpu_data *sd = per_cpu(svm_data, cpu); int i; if (unlikely(cpu != vcpu->cpu)) { @@ -1739,6 +1749,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (static_cpu_has(X86_FEATURE_RDTSCP)) wrmsrl(MSR_TSC_AUX, svm->tsc_aux); + if (sd->current_vmcb != svm->vmcb) { + sd->current_vmcb = svm->vmcb; + indirect_branch_prediction_barrier(); + } avic_vcpu_load(vcpu, cpu); } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index dac564d..f82a44c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2296,6 +2296,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; vmcs_load(vmx->loaded_vmcs->vmcs); + indirect_branch_prediction_barrier(); } if (!already_loaded) { @@ -9613,6 +9614,9 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) goto free_msrs; msr_bitmap = vmx->vmcs01.msr_bitmap; + + if (boot_cpu_has(X86_FEATURE_IBPB)) + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PRED_CMD, MSR_TYPE_RW); vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW); vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); -- 2.7.4
Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/28/2018 09:21 PM, Konrad Rzeszutek Wilk wrote: On January 28, 2018 2:29:10 PM EST, KarimAllah Ahmed wrote: Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. We tried this and found that it was about 3% slower that doing the old way of rdmsr and wrmsr. I actually have not measured the performance difference between using the atomic_switch vs just just doing rdmsr/wrmsr. I was mostly focused on not saving and restoring when the guest does not actually use the MSRs. Interesting data point though, I will update the code to use rdmsr/wrmsr and see if I see it in my hardware (I am using a skylake processor). But that was also with the host doing IBRS as well. On what type of hardware did you run this? Ccing Daniel. Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Signed-off-by: KarimAllah Ahmed Signed-off-by: Ashok Raj --- arch/x86/kvm/cpuid.c | 4 +++- arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/vmx.c | 63 3 files changed, 67 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..dc78095 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void) /* These are scattered features in cpufeatures.h. */ #define KVM_CPUID_BIT_AVX512_4VNNIW 2 #define KVM_CPUID_BIT_AVX512_4FMAPS 3 +#define KVM_CPUID_BIT_SPEC_CTRL 26 #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ + (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ? KF(SPEC_CTRL) : 0); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index cdc70a3..dcfe227 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX}, [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX}, + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aa8638a..1b743a0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -920,6 +920,9 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); + static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -2007,6 +2010,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, m->host[i].value = host_val; } +/* do not touch guest_val and host_val if the msr is not found */ +static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, + u64 *guest_val, u64 *host_val) +{ + unsigned i; + struct msr_autoload *m = &vmx->msr_autoload; + + for (i = 0; i < m->nr; ++i) + if (m->guest[i].index == msr) + break; + + if (i == m->nr) + return 1; + + if (guest_val) + *guest_val = m->guest[i].value; + if (host_val) + *host_val = m->host[i].value; + + return 0; +} + static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) { u64 guest_efer = vmx->vcpu.arch.efer; @@ -3203,7 +3228,9 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, */ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { + u64 spec_ctrl = 0; struct shared_msr_entry *msr; + struct vcpu_vmx *vmx = to_vmx(vcpu); switch (msr_info->index) { #ifdef CONFIG_X86_64 @@ -3223,6 +3250,19 @@ static int vmx_get_msr(struct kvm_vcp
Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/29/2018 09:46 AM, David Woodhouse wrote: On Sun, 2018-01-28 at 16:39 -0800, Liran Alon wrote: Windows use IBRS and Microsoft don't have any plans to switch to retpoline. Running a Windows guest should be a pretty common use-case no? In addition, your handle of the first WRMSR intercept could be different. It could signal you to start doing the following: 1. Disable intercept on SPEC_CTRL MSR. 2. On VMEntry, Write vCPU SPEC_CTRL value into physical MSR. 3. On VMExit, read physical MSR into vCPU SPEC_CTRL value. (And if IBRS is used at host, also set physical SPEC_CTRL MSR here to 1) That way, you will both have fastest option as long as guest don't use IBRS and also won't have the 3% performance hit compared to Konrad's proposal. Am I missing something? Reads from the SPEC_CTRL MSR are strangely slow. I suspect a large part of the 3% speedup you observe is because in the above, the vmentry path doesn't need to *read* the host's value and store it; the host is expected to restore it for itself anyway? I'd actually quite like to repeat the benchmark on the new fixed microcode, if anyone has it yet, to see if that read/swap slowness is still quite as excessive. I'm certainly not ruling this out, but I'm just a little wary of premature optimisation, and I'd like to make sure we have everything *else* in the KVM patches right first. The fact that the save-and-restrict macros I have in the tip of my working tree at the moment are horrid and causing 0-day nastygrams, probably doesn't help persuade me to favour the approach ;) ... hm, the CPU actually has separate MSR save/restore lists for entry/exit, doesn't it? Is there any way to sanely make use of that and do the restoration manually on vmentry but let it be automatic on vmexit, by having it *only* in the guest's MSR-store area to be saved on exit and restored on exit, but *not* in the host's MSR-store area? Reading the code and comparing with the SDM, I can't see where we're ever setting VM_EXIT_MSR_STORE_{ADDR,COUNT} except in the nested case... Hmmm ... you are probably right! I think all users of this interface always trap + update save area and never passthrough the MSR. That is why only LOAD is needed *so far*. Okay, let me sort this out in v3 then. Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
NOTE
Attn I was able to trace a huge sum of money in my department that bellongs to our deceased customer according to my findings.I want to present you as the beneficiary of this huge sum of money.I will give you the full explanation as soon as you respond to this email. Ahmed Zama
Re: [PATCH v2 2/4] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/29/2018 11:44 AM, Paolo Bonzini wrote: On 29/01/2018 01:58, KarimAllah Ahmed wrote: Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. You are not storing the guest's MSR value on though vmexit, aren't you? I originally thought that atomic_switch was also saving the guest MSR on VM-exit. Now I know it is not. Also, there's an obvious typo here: + add_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, msr_info->data, 0); + + msr_bitmap = vmx->vmcs01.msr_bitmap; + vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); + oops! copy & paste error :) Finally, apparently add_atomic_switch_msr is slower than just rdmsr/wrmsr on vmexit. Can you reuse the patches I had posted mid January instead? They are also assuming no IBRS usage on the host, so the changes shouldn't be large, and limited mostly to using actual X86_FEATURE_* bits instead of cpuid_count(). They lack the code to only read/write SPEC_CTRL if the direct access is enabled, but that's small too... Enabling the direct access on the first write, as in this patches, is okay. Thanks, Paolo Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed --- v2: - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). - special case writing '0' in SPEC_CTRL to avoid confusing live-migration when the instance never used the MSR (dwmw@). - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). --- arch/x86/kvm/cpuid.c | 4 +++- arch/x86/kvm/vmx.c | 65 arch/x86/kvm/x86.c | 1 + 3 files changed, 69 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..32c0c14 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void) /* These are scattered features in cpufeatures.h. */ #define KVM_CPUID_BIT_AVX512_4VNNIW 2 #define KVM_CPUID_BIT_AVX512_4FMAPS 3 +#define KVM_CPUID_BIT_IBRS 26 #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ + (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aa8638a..dac564d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -920,6 +920,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -2007,6 +2009,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, m->host[i].value = host_val; } +/* do not touch guest_val and host_val if the msr is not found */ +static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, + u64 *guest_val, u64 *host_val) +{ + unsigned i; + struct msr_autoload *m = &vmx->msr_autoload; + + for (i = 0; i < m->nr; ++i) + if (m->guest[i].index == msr) + break; + + if (i == m->nr) + return 1; + + if (guest_val) + *guest_val = m->guest[i].value; + if (host_val) + *host_val = m->host[i].value; + + return 0; +} + static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) { u64 guest_efer = vmx->vcpu.arch.efer; @@ -3203,7 +3227,9 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, */ static int vmx_g
Re: [PATCH v2 4/4] x86: vmx: Allow direct access to MSR_IA32_ARCH_CAPABILITIES
On 01/29/2018 07:55 PM, Jim Mattson wrote: Why should this MSR be pass-through? I doubt that it would be accessed frequently. True. Will update it to be emulated and allow user-space to set the value exposed. On Sun, Jan 28, 2018 at 4:58 PM, KarimAllah Ahmed wrote: Add direct access to MSR_IA32_SPEC_CTRL for guests. Future intel processors will use this MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/cpuid.c | 4 +++- arch/x86/kvm/vmx.c | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 32c0c14..2339b1a 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -71,6 +71,7 @@ u64 kvm_supported_xcr0(void) #define KVM_CPUID_BIT_AVX512_4VNNIW 2 #define KVM_CPUID_BIT_AVX512_4FMAPS 3 #define KVM_CPUID_BIT_IBRS 26 +#define KVM_CPUID_BIT_ARCH_CAPABILITIES 29 #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -394,7 +395,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ - (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0); + (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0) | \ + (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) ? KF(ARCH_CAPABILITIES) : 0); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f82a44c..99cb761 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -9617,6 +9617,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (boot_cpu_has(X86_FEATURE_IBPB)) vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PRED_CMD, MSR_TYPE_RW); + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_ARCH_CAPABILITIES, MSR_TYPE_R); vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW); vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); -- 2.7.4 Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL
On 01/29/2018 08:04 PM, Jim Mattson wrote: Can I assume you'll send out a new version with the fixes? Yes, I am currently doing some tests and once I am done I will send a new round. ... and the typo is already fixed in 'ibpb-wip' :) On Mon, Jan 29, 2018 at 11:01 AM, David Woodhouse wrote: (Top-posting; sorry.) Much of that is already fixed during our day, in http://git.infradead.org/linux-retpoline.git/shortlog/refs/heads/ibpb I forgot to fix up the wrong-MSR typo though, and we do still need to address reset. On Mon, 2018-01-29 at 10:43 -0800, Jim Mattson wrote: On Sun, Jan 28, 2018 at 11:29 AM, KarimAllah Ahmed wrote: Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Signed-off-by: KarimAllah Ahmed Signed-off-by: Ashok Raj --- arch/x86/kvm/cpuid.c | 4 +++- arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/vmx.c | 63 3 files changed, 67 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..dc78095 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void) /* These are scattered features in cpufeatures.h. */ #define KVM_CPUID_BIT_AVX512_4VNNIW 2 #define KVM_CPUID_BIT_AVX512_4FMAPS 3 +#define KVM_CPUID_BIT_SPEC_CTRL 26 #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ + (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ? KF(SPEC_CTRL) : 0); Isn't 'boot_cpu_has()' superflous here? And aren't there two bits to pass through for existing CPUs (26 and 27)? /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index cdc70a3..dcfe227 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX}, [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX}, + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aa8638a..1b743a0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -920,6 +920,9 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); + static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -2007,6 +2010,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, m->host[i].value = host_val; } +/* do not touch guest_val and host_val if the msr is not found */ +static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, + u64 *guest_val, u64 *host_val) +{ + unsigned i; + struct msr_autoload *m = &vmx->msr_autoload; + + for (i = 0; i < m->nr; ++i) + if (m->guest[i].index == msr) + break; + + if (i == m->nr) + return 1; + + if (guest_val) + *guest_val = m->guest[i].value; + if (host_val) + *host_val = m->host[i].value; + + return 0; +} + static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) { u64 guest_efer = vmx->vcpu.arch.efer; @@ -3203,7 +3228,9 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, */ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { + u64 spec_ctrl = 0; struct shared_msr_entry *
[PATCH v3 1/4] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
[dwmw2: Stop using KF() for bits in it, too] Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: x...@kernel.org Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 8 +++- arch/x86/kvm/cpuid.h | 1 + 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..c0eb337 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void) #define F(x) bit(X86_FEATURE_##x) -/* These are scattered features in cpufeatures.h. */ -#define KVM_CPUID_BIT_AVX512_4VNNIW 2 -#define KVM_CPUID_BIT_AVX512_4FMAPS 3 +/* For scattered features from cpufeatures.h; we currently expose none */ #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) entry->ecx &= ~F(PKU); entry->edx &= kvm_cpuid_7_0_edx_x86_features; - entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX); + cpuid_mask(&entry->edx, CPUID_7_EDX); } else { entry->ebx = 0; entry->ecx = 0; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index cdc70a3..dcfe227 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX}, [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX}, + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) -- 2.7.4
[PATCH v3 2/4] KVM: x86: Add IBPB support
From: Ashok Raj Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor barriers on switching between VMs to avoid inter VM Spectre-v2 attacks. [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD whenever it is available - svm: only pass through IBPB if it is available - vmx: support !cpu_has_vmx_msr_bitmap()] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Signed-off-by: Ashok Raj Signed-off-by: Peter Zijlstra (Intel) Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com Signed-off-by: David Woodhouse Signed-off-by: KarimAllah Ahmed --- arch/x86/kvm/cpuid.c | 11 ++- arch/x86/kvm/svm.c | 14 ++ arch/x86/kvm/vmx.c | 12 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c0eb337..033004d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + /* cpuid 0x8008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = + F(IBPB); + /* cpuid 0xC001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!g_phys_as) g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); - entry->ebx = entry->edx = 0; + entry->edx = 0; + /* IBPB isn't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entry->ebx |= F(IBPB); + entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; } case 0x8019: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2744b973..c886e46 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -529,6 +529,7 @@ struct svm_cpu_data { struct kvm_ldttss_desc *tss_desc; struct page *save_area; + struct vmcb *current_vmcb; }; static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); @@ -918,6 +919,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm) set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1); } + + if (boot_cpu_has(X86_FEATURE_IBPB)) + set_msr_interception(msrpm, MSR_IA32_PRED_CMD, 1, 1); } static void add_msr_offset(u32 offset) @@ -1706,11 +1710,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); + /* +* The vmcb page can be recycled, causing a false negative in +* svm_vcpu_load(). So do a full IBPB now. +*/ + indirect_branch_prediction_barrier(); } static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_svm *svm = to_svm(vcpu); + struct svm_cpu_data *sd = per_cpu(svm_data, cpu); int i; if (unlikely(cpu != vcpu->cpu)) { @@ -1739,6 +1749,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (static_cpu_has(X86_FEATURE_RDTSCP)) wrmsrl(MSR_TSC_AUX, svm->tsc_aux); + if (sd->current_vmcb != svm->vmcb) { + sd->current_vmcb = svm->vmcb; + indirect_branch_prediction_barrier(); + } avic_vcpu_load(vcpu, cpu); } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aa8638a..ea278ce 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2272,6 +2272,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; vmcs_load(vmx->loaded_vmcs->vmcs); + indirect_branch_prediction_barrier(); } if (!already_loaded) { @@ -3330,6 +3331,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; + case MSR_IA32_PRED_CMD: + if (!msr_info
[PATCH v3 3/4] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
Future intel processors will use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/vmx.c | 15 +++ arch/x86/kvm/x86.c | 1 + 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 033004d..1909635 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ea278ce..798a00b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -581,6 +581,8 @@ struct vcpu_vmx { u64 msr_host_kernel_gs_base; u64 msr_guest_kernel_gs_base; #endif + u64 arch_capabilities; + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; u32 secondary_exec_control; @@ -3224,6 +3226,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) + return 1; + msr_info->data = to_vmx(vcpu)->arch_capabilities; + break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); break; @@ -3339,6 +3347,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (data & PRED_CMD_IBPB) wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated) + return 1; + vmx->arch_capabilities = data; + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -5599,6 +5612,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) ++vmx->nmsrs; } + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities); vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 03869eb..8e889dc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1006,6 +1006,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, + MSR_IA32_ARCH_CAPABILITIES }; static unsigned num_msrs_to_save; -- 2.7.4
[PATCH v3 4/4] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
[ Based on a patch from Ashok Raj ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only add_atomic_switch_msr when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Cc: Asit Mallick Cc: Arjan Van De Ven Cc: Dave Hansen Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Linus Torvalds Cc: Tim Chen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Paolo Bonzini Cc: David Woodhouse Cc: Greg KH Cc: Andy Lutomirski Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- v2: - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). - special case writing '0' in SPEC_CTRL to avoid confusing live-migration when the instance never used the MSR (dwmw@). - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). v3: - Save/restore manually - Fix CPUID handling - Fix a copy & paste error in the name of SPEC_CTRL MSR in disable_intercept. - support !cpu_has_vmx_msr_bitmap() --- arch/x86/kvm/cpuid.c | 7 +-- arch/x86/kvm/vmx.c | 59 arch/x86/kvm/x86.c | 2 +- 3 files changed, 65 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 1909635..662d0c0 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | + F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); entry->edx = 0; - /* IBPB isn't necessarily present in hardware cpuid */ + /* IBRS and IBPB aren't necessarily present in hardware cpuid */ if (boot_cpu_has(X86_FEATURE_IBPB)) entry->ebx |= F(IBPB); + if (boot_cpu_has(X86_FEATURE_IBRS)) + entry->ebx |= F(IBRS); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 798a00b..9ac9747 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -582,6 +582,8 @@ struct vcpu_vmx { u64 msr_guest_kernel_gs_base; #endif u64 arch_capabilities; + u64 spec_ctrl; + bool save_spec_ctrl_on_exit; u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; @@ -922,6 +924,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3226,6 +3230,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + msr_info->data = to_vmx(vcpu)->spec_ctrl; + break; case MSR_IA32_ARCH_CAPABILITIES: if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) @@ -3339,6 +3350,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info);
[PATCH v3 0/4] KVM: Expose speculation control feature to guests
Add direct access to speculation control MSRs for KVM guests. This allows the guest to protect itself against Spectre V2 using IBRS+IBPB instead of a retpoline+IBPB based approach. It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future Intel processors to indicate RDCL_NO and IBRS_ALL. Ashok Raj (1): KVM: x86: Add IBPB support KarimAllah Ahmed (3): KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL arch/x86/kvm/cpuid.c | 22 ++ arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/svm.c | 14 + arch/x86/kvm/vmx.c | 86 arch/x86/kvm/x86.c | 1 + 5 files changed, 118 insertions(+), 6 deletions(-) Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Ashok Raj Cc: Asit Mallick Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Janakarajan Natarajan Cc: Joerg Roedel Cc: Jun Nakajima Cc: Laura Abbott Cc: Linus Torvalds Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Thomas Gleixner Cc: Tim Chen Cc: Tom Lendacky Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: x...@kernel.org -- 2.7.4
Re: [PATCH v3 3/4] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
On 01/30/2018 01:22 AM, Raj, Ashok wrote: On Tue, Jan 30, 2018 at 01:10:27AM +0100, KarimAllah Ahmed wrote: Future intel processors will use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Cc: Asit Mallick Cc: Dave Hansen Cc: Arjan Van De Ven Cc: Tim Chen Cc: Linus Torvalds Cc: Andrea Arcangeli Cc: Andi Kleen Cc: Thomas Gleixner Cc: Dan Williams Cc: Jun Nakajima Cc: Andy Lutomirski Cc: Greg KH Cc: Paolo Bonzini Cc: Ashok Raj Signed-off-by: KarimAllah Ahmed Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/vmx.c | 15 +++ arch/x86/kvm/x86.c | 1 + 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 033004d..1909635 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ea278ce..798a00b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -581,6 +581,8 @@ struct vcpu_vmx { u64 msr_host_kernel_gs_base; u64 msr_guest_kernel_gs_base; #endif + u64 arch_capabilities; + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; u32 secondary_exec_control; @@ -3224,6 +3226,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) + return 1; + msr_info->data = to_vmx(vcpu)->arch_capabilities; + break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); break; @@ -3339,6 +3347,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (data & PRED_CMD_IBPB) wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated) + return 1; + vmx->arch_capabilities = data; + break; arch capabilities is read only. You don't need the set_msr handling for this. This is only for host driven writes. This would allow QEMU/whatever to override the default value (i.e. the value from the hardware). case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -5599,6 +5612,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) ++vmx->nmsrs; } + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities); vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 03869eb..8e889dc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1006,6 +1006,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, + MSR_IA32_ARCH_CAPABILITIES Same here.. no need to save/restore this. }; static unsigned num_msrs_to_save; -- 2.7.4 Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
[PATCH] kvm: Map PFN-type memory regions as writable (if possible)
For EPT-violations that are triggered by a read, the pages are also mapped with write permissions (if their memory region is also writable). That would avoid getting yet another fault on the same page when a write occurs. This optimization only happens when you have a "struct page" backing the memory region. So also enable it for memory regions that do not have a "struct page". Cc: Paolo Bonzini Cc: Radim Krčmář Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed --- virt/kvm/kvm_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 97da45e..0efb089 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1534,6 +1534,8 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, goto retry; if (r < 0) pfn = KVM_PFN_ERR_FAULT; + if (writable) + *writable = true; } else { if (async && vma_is_valid(vma, write_fault)) *async = true; -- 2.7.4
[PATCH] pci: Store more data about VFs into the SRIOV struct
... to avoid reading them from the config space of all the PCI VFs. This is specially a useful optimization when bringing up thousands of VFs. Cc: Bjorn Helgaas Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed --- drivers/pci/iov.c | 20 ++-- drivers/pci/pci.h | 6 +- drivers/pci/probe.c | 42 -- 3 files changed, 55 insertions(+), 13 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 168328a..78e9595 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -129,7 +129,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno) if (!dev->is_physfn) return 0; - return dev->sriov->barsz[resno - PCI_IOV_RESOURCES]; + return dev->sriov->vf_barsz[resno - PCI_IOV_RESOURCES]; } int batch_pci_iov_add_virtfn(struct pci_dev *dev, struct pci_bus **bus, @@ -325,6 +325,20 @@ static void pci_iov_wq_fn(struct work_struct *work) kfree(req); } +static void pci_read_vf_config_common(struct pci_bus *bus, + struct pci_dev *dev) +{ + int devfn = pci_iov_virtfn_devfn(dev, 0); + + pci_bus_read_config_dword(bus, devfn, PCI_CLASS_REVISION, + &dev->sriov->vf_class); + pci_bus_read_config_word(bus, devfn, PCI_SUBSYSTEM_ID, +&dev->sriov->vf_subsystem_device); + pci_bus_read_config_word(bus, devfn, PCI_SUBSYSTEM_VENDOR_ID, +&dev->sriov->vf_subsystem_vendor); + pci_bus_read_config_byte(bus, devfn, PCI_HEADER_TYPE, &dev->sriov->vf_hdr_type); +} + static struct workqueue_struct *pci_iov_wq; static int __init init_pci_iov_wq(void) @@ -361,6 +375,8 @@ static int enable_vfs(struct pci_dev *dev, int nr_vfs) goto add_bus_fail; } + pci_read_vf_config_common(bus[0], dev); + while (remaining_vfs > 0) { bool ret; struct pci_iov_wq_item *req; @@ -617,7 +633,7 @@ static int sriov_init(struct pci_dev *dev, int pos) rc = -EIO; goto failed; } - iov->barsz[i] = resource_size(res); + iov->vf_barsz[i] = resource_size(res); res->end = res->start + resource_size(res) * total - 1; dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for %d VFs)\n", i, res, i, total); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index f6b58b3..3264c9e 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -271,7 +271,11 @@ struct pci_sriov { u16 driver_max_VFs; /* max num VFs driver supports */ struct pci_dev *dev;/* lowest numbered PF */ struct pci_dev *self; /* this PF */ - resource_size_t barsz[PCI_SRIOV_NUM_BARS]; /* VF BAR size */ + u8 vf_hdr_type; /* VF header type */ + u32 vf_class; /* VF device */ + u16 vf_subsystem_vendor;/* VF subsystem vendor */ + u16 vf_subsystem_device;/* VF subsystem device */ + resource_size_t vf_barsz[PCI_SRIOV_NUM_BARS]; /* VF BAR size */ bool drivers_autoprobe; /* auto probing of VFs by driver */ }; diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 14e0ea1..65099d0 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -175,6 +175,7 @@ static inline unsigned long decode_bar(struct pci_dev *dev, u32 bar) int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, struct resource *res, unsigned int pos) { + int bar = res - dev->resource; u32 l = 0, sz = 0, mask; u64 l64, sz64, mask64; u16 orig_cmd; @@ -194,9 +195,13 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, res->name = pci_name(dev); pci_read_config_dword(dev, pos, &l); - pci_write_config_dword(dev, pos, l | mask); - pci_read_config_dword(dev, pos, &sz); - pci_write_config_dword(dev, pos, l); + if (dev->is_virtfn) { + sz = dev->physfn->sriov->vf_barsz[bar] & 0x; + } else { + pci_write_config_dword(dev, pos, l | mask); + pci_read_config_dword(dev, pos, &sz); + pci_write_config_dword(dev, pos, l); + } /* * All bits set in sz means the device isn't working properly. @@ -236,9 +241,14 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, if (res->flags & IORESOURCE_MEM_64) { pci_read_config_dword(dev, pos + 4, &l); - pci_write_config_dword(dev, pos + 4, ~0); - pci_read_config_dword(dev, pos + 4, &sz); - pci_write
[PATCH v2] kvm: Map PFN-type memory regions as writable (if possible)
For EPT-violations that are triggered by a read, the pages are also mapped with write permissions (if their memory region is also writable). That would avoid getting yet another fault on the same page when a write occurs. This optimization only happens when you have a "struct page" backing the memory region. So also enable it for memory regions that do not have a "struct page". Cc: Paolo Bonzini Cc: Radim Krčmář Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed --- v2: - Move setting writable to hva_to_pfn_remapped - Extend hva_to_pfn_remapped interface to accept writable as a parameter --- virt/kvm/kvm_main.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 97da45e..88702d5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1438,7 +1438,8 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) static int hva_to_pfn_remapped(struct vm_area_struct *vma, unsigned long addr, bool *async, - bool write_fault, kvm_pfn_t *p_pfn) + bool write_fault, bool *writable, + kvm_pfn_t *p_pfn) { unsigned long pfn; int r; @@ -1464,6 +1465,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, } + if (writable) + *writable = true; /* * Get a reference here because callers of *hva_to_pfn* and @@ -1529,7 +1532,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (vma == NULL) pfn = KVM_PFN_ERR_FAULT; else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) { - r = hva_to_pfn_remapped(vma, addr, async, write_fault, &pfn); + r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn); if (r == -EAGAIN) goto retry; if (r < 0) -- 2.7.4
[PATCH] pci: Do not read INTx PIN and LINE registers for virtual functions
... since INTx is not supported by-spec for virtual functions. Cc: Bjorn Helgaas Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed Signed-off-by: Jan H. Schönherr --- drivers/pci/probe.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 65099d0..61002fb 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1232,6 +1232,13 @@ static void pci_read_irq(struct pci_dev *dev) { unsigned char irq; + /* Virtual functions do not have INTx support */ + if (dev->is_virtfn) { + dev->pin = 0; + dev->irq = 0; + return; + } + pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &irq); dev->pin = irq; if (irq) -- 2.7.4
Re: [PATCH] pci: Do not read INTx PIN and LINE registers for virtual functions
On 01/17/2018 07:49 PM, Alex Williamson wrote: On Wed, 17 Jan 2018 19:30:29 +0100 KarimAllah Ahmed wrote: ... since INTx is not supported by-spec for virtual functions. But the spec also states that VFs must implement the interrupt pin register as read-only zero, so either this is redundant or it's a workaround for VFs that aren't quite compliant? Thanks, The end goal for me is just to NOT do the read across the PCI bus for no good reason. We have devices with thousands of virtual functions and this read is simply not useful in this case and can be optimized as I did. So from a functionality point of view probably the patch does not add any value as you mentioned, but it is really useful as a micro-optimization. Alex Cc: Bjorn Helgaas Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed Signed-off-by: Jan H. Schönherr --- drivers/pci/probe.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 65099d0..61002fb 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1232,6 +1232,13 @@ static void pci_read_irq(struct pci_dev *dev) { unsigned char irq; + /* Virtual functions do not have INTx support */ + if (dev->is_virtfn) { + dev->pin = 0; + dev->irq = 0; + return; + } + pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &irq); dev->pin = irq; if (irq) Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
[nf-next 1/3] netfilter: export SRH processing functions from seg6local
Some functions of seg6local are very useful to process SRv6 encapsulated packets This patch exports some functions of seg6local that are useful and can be re-used at different parts of the kernel, including netfilter. The set of exported functions are: (1) seg6_get_srh() (2) seg6_advance_nextseg() (3) seg6_lookup_nexthop Signed-off-by: Ahmed Abdelsalam --- include/net/seg6.h| 5 + net/ipv6/seg6_local.c | 37 - 2 files changed, 25 insertions(+), 17 deletions(-) diff --git a/include/net/seg6.h b/include/net/seg6.h index 099bad5..b637778 100644 --- a/include/net/seg6.h +++ b/include/net/seg6.h @@ -63,5 +63,10 @@ extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len); extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto); extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh); +extern struct ipv6_sr_hdr *seg6_get_srh(struct sk_buff *skb); +extern void seg6_advance_nextseg(struct ipv6_sr_hdr *srh, + struct in6_addr *daddr); +extern void seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr, + u32 tbl_id); #endif diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index ba3767e..1f1eaa3 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -59,7 +59,7 @@ static struct seg6_local_lwt *seg6_local_lwtunnel(struct lwtunnel_state *lwt) return (struct seg6_local_lwt *)lwt->data; } -static struct ipv6_sr_hdr *get_srh(struct sk_buff *skb) +struct ipv6_sr_hdr *seg6_get_srh(struct sk_buff *skb) { struct ipv6_sr_hdr *srh; int len, srhoff = 0; @@ -82,12 +82,13 @@ static struct ipv6_sr_hdr *get_srh(struct sk_buff *skb) return srh; } +EXPORT_SYMBOL_GPL(seg6_get_srh); static struct ipv6_sr_hdr *get_and_validate_srh(struct sk_buff *skb) { struct ipv6_sr_hdr *srh; - srh = get_srh(skb); + srh = seg6_get_srh(skb); if (!srh) return NULL; @@ -107,7 +108,7 @@ static bool decap_and_validate(struct sk_buff *skb, int proto) struct ipv6_sr_hdr *srh; unsigned int off = 0; - srh = get_srh(skb); + srh = seg6_get_srh(skb); if (srh && srh->segments_left > 0) return false; @@ -131,7 +132,7 @@ static bool decap_and_validate(struct sk_buff *skb, int proto) return true; } -static void advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr) +void seg6_advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr) { struct in6_addr *addr; @@ -139,9 +140,10 @@ static void advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr) addr = srh->segments + srh->segments_left; *daddr = *addr; } +EXPORT_SYMBOL_GPL(seg6_advance_nextseg); -static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr, - u32 tbl_id) +void seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr, +u32 tbl_id) { struct net *net = dev_net(skb->dev); struct ipv6hdr *hdr = ipv6_hdr(skb); @@ -188,6 +190,7 @@ static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr, skb_dst_drop(skb); skb_dst_set(skb, dst); } +EXPORT_SYMBOL_GPL(seg6_lookup_nexthop); /* regular endpoint function */ static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt) @@ -198,9 +201,9 @@ static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt) if (!srh) goto drop; - advance_nextseg(srh, &ipv6_hdr(skb)->daddr); + seg6_advance_nextseg(srh, &ipv6_hdr(skb)->daddr); - lookup_nexthop(skb, NULL, 0); + seg6_lookup_nexthop(skb, NULL, 0); return dst_input(skb); @@ -218,9 +221,9 @@ static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt) if (!srh) goto drop; - advance_nextseg(srh, &ipv6_hdr(skb)->daddr); + seg6_advance_nextseg(srh, &ipv6_hdr(skb)->daddr); - lookup_nexthop(skb, &slwt->nh6, 0); + seg6_lookup_nexthop(skb, &slwt->nh6, 0); return dst_input(skb); @@ -237,9 +240,9 @@ static int input_action_end_t(struct sk_buff *skb, struct seg6_local_lwt *slwt) if (!srh) goto drop; - advance_nextseg(srh, &ipv6_hdr(skb)->daddr); + seg6_advance_nextseg(srh, &ipv6_hdr(skb)->daddr); - lookup_nexthop(skb, NULL, slwt->table); + seg6_lookup_nexthop(skb, NULL, slwt->table); return dst_input(skb); @@ -331,7 +334,7 @@ static int input_action_end_dx6(struct sk_buff *skb, if (!ipv6_addr_any(&slwt->nh6)) nhaddr = &slwt->nh6; - lookup_nexthop(skb, nhaddr, 0); + seg6_lookup_nexthop(skb, n