OK

2019-09-11 Thread Ahmed Ahmed
Dear Friend.

  I am Mr. .Ahmed Zama .I am sending this brief letter to solicit your
partnership to € 15 MILLION Euros into your account. I shall send you
more information and procedures when I receive positive response from
you. If you are interested, send to me the followings immediately
Full Names
Age
Nationality
Occupation
Scanned copy of your International Passport
Direct Telephone Lines
Mr Ahmed Zama


OK

2019-02-22 Thread Ahmed Ahmed
Greetings,

I humbly solicit for your partnership to transfer €15 million Euros
into your personal or company’s account .As soon as the fund is
successfully transferred, You shall be entitled  to 30% of the total
sum.60% will be for me while 10% will be set aside for expenses that
may be incurred on the process of transferring the fund. Contact me
for more detailed explanation.

Kindly send me the followings

Full Names
Address
Occupation
Direct Mobile Telephone Lines
Nationality

Ahmed Zama
+22675844869


OK

2019-03-07 Thread Ahmed Ahmed
Greetings,

I humbly solicit for your partnership to transfer €15 million Euros
into your personal or company’s account .I will offer you 30% of the
total sum,60% will be for me while 10% will set aside for expenses
that may be incurred on the process of transferring the fund. Contact
me for more detailed explanation.

Kindly send me the followings

Full Names
Address
Occupation
Direct Mobile Telephone Lines
Nationality

Ahmed Zama


OK

2019-06-05 Thread Ahmed Ahmed
Greetings,

I humbly solicit for your partnership to transfer €15 million Euros
into your personal or company’s account .Contact me for more detailed
explanation.

Kindly send me the followings

Full Names
Address
Occupation
Direct Mobile Telephone Lines
Nationality

Ahmed Zama
+22675844869


OK

2019-04-13 Thread Ahmed Ahmed
Greetings,

I humbly solicit for your partnership to transfer €15 million Euros
into your personal or company’s account .Contact me for more detailed
explanation.

Kindly send me the followings

Full Names
Address
Occupation
Direct Mobile Telephone Lines
Nationality

Ahmed Zama


RE: MEETING IN DUBAI

2018-08-10 Thread AHMED





Dear
i only need your help to meet with Mr kelly adams who is right now in
Dubai,you will play a roll of the beneficiary of the funds which i
have agreed to give you 40% of the total sum,this money will be used
for investment in uae,please the box is right now in London security
company,once you cooperate with my representative and have meeting
together,we can now proceed to instruct the company to ship down the
consignment to Dubai through diplomatic shipment,the duty of my
representative is to stay with you until you receive the cash,then
once you have every thing in your control,you will give him one
million dollars cash from the box to bring down for me,as you know my
government over here has confiscated all my bank account,i only have
that said funds secret for now,please keep this transaction private to
your self,then invest the rest of the funds in good business of your
choice in ant country in the world,just book your ticket to
Dubai,three days we can finish ok,not every thing we talk on the phone
or email,must things will be discus face to face thanks for your
understanding fill free to call me at your convenient time
please cooperate with Mr kelly adams,  attached is my identity
i send you a proposals 
thanks for your understanding
we will put all money in real estate
REPLY BACK HERE :  nikolai.nikolai...@gmail.com



Best Regards
MOHAMED ABDUL


HELLO DEAR

2019-04-25 Thread Ahmed



With Due Respect,

I know that this mail will come to you as a surprise as we have never met 
before, but need not to worry as I am contacting you independently of my 
investigation and no one is informed of this communication. I need your urgent 
assistance in transferring the sum of $11.3million immediately to your private 
account.The money has been here in our Bank lying dormant for years now without 
anybody coming for the claim of it.

I want to release the money to you as the relative to our deceased customer 
(the account owner) who died a long with his supposed Next Of Kin since 16th 
October 2005. The Banking laws here does not allow such money to stay more than 
14 years, because the money will be recalled to the Bank treasury account as 
unclaimed fund.

By indicating your interest I will send you the full details on how the 
business will be executed.

Please respond urgently and delete if you are not interested.

Best Regards,
Mr.Ahmed Ouedraogo.


Please Respond Urgently.

2019-05-25 Thread Ahmed



With due respect, I am inviting you for a business deal of Eleven Million Three 
hundred thousand united states dollars where this money can be shared between 
us.

By indicating your interest I will send you the full details on how the 
business will be executed.

Please send your reply to my private email --- ouedraogoah...@outlook.com


With due respect.

2020-09-01 Thread Ahmed



Dear Friend,

I know that this mail will come to you as a surprise as we have never met 
before, but need not to worry as I am contacting you independently of my 
investigation and no one is informed of this communication. I need your urgent 
assistance in transferring the sum of $11.3million immediately to your private 
account.The money has been here in our Bank lying dormant for years now without 
anybody coming for the claim of it.

I want to release the money to you as the relative to our deceased customer 
(the account owner) who died a long with his supposed Next Of Kin since 16th 
October 2005. The Banking laws here does not allow such money to stay more than 
15 years, because the money will be recalled to the Bank treasury account as 
unclaimed fund.

By indicating your interest I will send you the full details on how the 
business will be executed.

Please respond urgently and delete if you are not interested.

Best Regards,
Ahmed Ouedraogo.


WITH DUE RESPECT.

2020-08-29 Thread Ahmed



Dear Friend,

I know that this mail will come to you as a surprise as we have never met 
before, but need not to worry as I am contacting you independently of my 
investigation and no one is informed of this communication. I need your urgent 
assistance in transferring the sum of $11.3million immediately to your private 
account.The money has been here in our Bank lying dormant for years now without 
anybody coming for the claim of it.

I want to release the money to you as the relative to our deceased customer 
(the account owner) who died a long with his supposed Next Of Kin since 16th 
October 2005. The Banking laws here does not allow such money to stay more than 
15 years, because the money will be recalled to the Bank treasury account as 
unclaimed fund.

By indicating your interest I will send you the full details on how the 
business will be executed.

Please respond urgently and delete if you are not interested.

Best Regards,
Ahmed Ouedraogo.


HELLO DEAR.

2020-08-04 Thread Ahmed



Dear Friend,

I know that this mail will come to you as a surprise as we have never met 
before, but need not to worry as I am contacting you independently of my 
investigation and no one is informed of this communication. I need your urgent 
assistance in transferring the sum of $11.3million immediately to your private 
account.The money has been here in our Bank lying dormant for years now without 
anybody coming for the claim of it.

I want to release the money to you as the relative to our deceased customer 
(the account owner) who died a long with his supposed Next Of Kin since 16th 
October 2005. The Banking laws here does not allow such money to stay more than 
15 years, because the money will be recalled to the Bank treasury account as 
unclaimed fund.

By indicating your interest I will send you the full details on how the 
business will be executed.

Please respond urgently and delete if you are not interested.

Best Regards,
Ahmed Ouedraogo.


HELLO DEAR .

2020-09-08 Thread Ahmed



Dear Friend,

I know that this mail will come to you as a surprise as we have never met 
before, but need not to worry as I am contacting you independently of my 
investigation and no one is informed of this communication. I need your urgent 
assistance in transferring the sum of $11.3million immediately to your private 
account.The money has been here in our Bank lying dormant for years now without 
anybody coming for the claim of it.

I want to release the money to you as the relative to our deceased customer 
(the account owner) who died a long with his supposed Next Of Kin since 16th 
October 2005. The Banking laws here does not allow such money to stay more than 
15 years, because the money will be recalled to the Bank treasury account as 
unclaimed fund.

By indicating your interest I will send you the full details on how the 
business will be executed.

Please respond urgently and delete if you are not interested.

Best Regards,
Ahmed Ouedraogo.


I need your cooperation.

2019-05-28 Thread Ahmed



With due respect, I am inviting you for a business deal of Eleven Million Three 
hundred thousand united states dollars where this money can be shared between 
us if you agree to my business proposal.

By indicating your interest I will send you the full details on how the 
business will be executed.

If you are interested please send your reply to my private email --- 
ouedraogoah...@outlook.com


Greeting!!!

2013-08-20 Thread Ahmed Hassan
Greeting!!!
 
I am Mr Ahmed Hassan, I have a business transaction of ($11.3 million) By 
indicating your interest I will send you the full details on how the business 
will be executed. Please respond urgently for more details and delete if you 
are not interested.
 
Best Regards
Mr Ahmed Hassan
+22968776349
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/11] FUSE - core

2005-02-15 Thread Faraz Ahmed
unsubscribe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NTOP for Redhat

2001-04-25 Thread Ahmed Warsame

I tried to install my Linux Redhat the Network Monitoring system call Ntop
and the following messages is what I am getting each time I execute make.

I thought Libpcap is what is needed and I installed but it did not help.

Can any body out there help me whit this.

The following is the message that I am receiving form the system
installations

Thanks

creating config.h
config.h is unchanged
make  all-recursive
make[1]: Entering directory `/etc/ntop/ntop-1.3.1'
Making all in gdchart0.94b
make[2]: Entering directory `/etc/ntop/ntop-1.3.1/gdchart0.94b'
cc -Igd1.3 -I. -g -c gdc.c
cc -Igd1.3 -I. -g -c gdchart.c
cc -g -c price_conv.c
cc -Igd1.3 -I. -g -c gdc_pie.c
cd gd1.3 ; make -f Makefile libgd.a
make[3]: Entering directory `/etc/ntop/ntop-1.3.1/gdchart0.94b/gd1.3'
cc -O   -c -o gd.o gd.c
cc -O   -c -o gdfontt.o gdfontt.c
cc -O   -c -o gdfonts.o gdfonts.c
cc -O   -c -o gdfontmb.o gdfontmb.c
cc -O   -c -o gdfontl.o gdfontl.c
cc -O   -c -o gdfontg.o gdfontg.c
rm -f libgd.a
ar rc libgd.a gd.o gdfontt.o gdfonts.o gdfontmb.o \
gdfontl.o gdfontg.o
make[3]: Leaving directory `/etc/ntop/ntop-1.3.1/gdchart0.94b/gd1.3'
make[2]: Leaving directory `/etc/ntop/ntop-1.3.1/gdchart0.94b'
Making all in .
make[2]: Entering directory `/etc/ntop/ntop-1.3.1'
/bin/sh ./libtool --mode=compile gcc -DHAVE_CONFIG_H -I. -I./gdchart0.94b
-I/usr/include/pcap-g -O2 -pipe -c admin.c
mkdir .libs
gcc -DHAVE_CONFIG_H -I. -I./gdchart0.94b -I/usr/include/pcap -g -O2 -pipe -c
admin.c  -fPIC -DPIC -o .libs/admin.lo
In file included from admin.c:23:
ntop.h:380: pcap.h: No such file or directory
In file included from admin.c:23:
ntop.h:465: field `h' has incomplete type
ntop.h:567: parse error before `pcap_t'
ntop.h:567: warning: no semicolon at end of struct or union
ntop.h:572: `filter' redeclared as different kind of symbol
/usr/include/ncurses.h:447: previous declaration of `filter'
ntop.h:655: parse error before `}'
ntop.h:655: warning: data definition has no type or storage class
ntop.h:1083: field `fcode' has incomplete type
ntop.h:1277: field `h' has incomplete type
In file included from ntop.h:1534,
 from admin.c:23:
globals-core.h:38: parse error before `device'
globals-core.h:38: warning: data definition has no type or storage class
make[2]: *** [admin.lo] Error 1
make[2]: Leaving directory `/etc/ntop/ntop-1.3.1'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/etc/ntop/ntop-1.3.1'
make: *** [all-recursive-am] Error 2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH v2] sparse: Track the boundaries of memory sections for accurate checks

2016-06-21 Thread KarimAllah Ahmed
When sparse memory model is used an array of memory sections is created to
track each block of contiguous physical pages. Each element of this array
contains PAGES_PER_SECTION pages. During the creation of this array the actual
boundaries of the memory block is lost, so the whole block is either considered
as present or not.

pfn_valid() in the sparse memory configuration checks which memory sections the
pfn belongs to then checks whether it's present or not. This yields sub-optimal
results when the available memory doesn't cover the whole memory section,
because pfn_valid will return 'true' even for the unavailable pfns at the
boundaries of the memory section.

If pfn_valid() returns 'true' this means that this is a valid RAM page and
that it is controlled by the kernel (there's a 'struct page' backing it) which
is not the case if this pfn happens to be unavailable and at the boundaries of
the memory section and given the pattern of using pfn_valid just before
accessing the 'struct page' (through pfn_to_page) which can lead to a lot of
surprises.

For example this hunk of code in '__ioremap_check_ram':

if (pfn_valid(start_pfn + i) &&
!PageReserved(pfn_to_page(start_pfn + i)))
return 1;

which can return '1' even for a pfn that's not valid!

or this other hunk (which is almost the same pattern) in 'kvm_is_reserved_pfn':

if (pfn_valid(pfn))
return PageReserved(pfn_to_page(pfn));

which can return false for the same reason (which will trigger a BUG_ON at the
call-site).

Using 'mem=' kernel parameter will have the same effect on pfn_valid() because
even though the memory at the memory section boundary can be RAM, it's not
valid because there's no 'struct page' for it.

Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Yaowei Bai 
Cc: Dan Williams 
Cc: Joe Perches 
Cc: Tejun Heo 
Cc: Anthony Liguori 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: Jan H. Schönherr 

---
v2: A little bit more verbose commit message to explain why 'sub-optimal'
results can actually cause problems.
---
 include/linux/mmzone.h | 22 --
 mm/sparse.c| 37 -
 2 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 02069c2..f76a0e1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1067,8 +1067,12 @@ struct mem_section {
 * section. (see page_ext.h about this.)
 */
struct page_ext *page_ext;
-   unsigned long pad;
+   unsigned long pad[3];
 #endif
+
+   unsigned long first_pfn;
+   unsigned long last_pfn;
+
/*
 * WARNING: mem_section must be a power-of-2 in size for the
 * calculation and use of SECTION_ROOT_MASK to make sense.
@@ -1140,23 +1144,29 @@ static inline int valid_section_nr(unsigned long nr)
 
 static inline struct mem_section *__pfn_to_section(unsigned long pfn)
 {
+   if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
+   return NULL;
+
return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
 static inline int pfn_valid(unsigned long pfn)
 {
-   if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
+   struct mem_section *ms;
+
+   ms = __pfn_to_section(pfn);
+
+   if (ms && !(ms->first_pfn <= pfn && ms->last_pfn >= pfn))
return 0;
-   return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+
+   return valid_section(ms);
 }
 #endif
 
 static inline int pfn_present(unsigned long pfn)
 {
-   if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
-   return 0;
-   return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
+   return present_section(__pfn_to_section(pfn));
 }
 
 /*
diff --git a/mm/sparse.c b/mm/sparse.c
index 5d0cf45..3c91837 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -166,24 +166,59 @@ void __meminit mminit_validate_memmodel_limits(unsigned 
long *start_pfn,
}
 }
 
+static int __init
+overlaps(u64 start1, u64 end1, u64 start2, u64 end2)
+{
+   u64 start, end;
+
+   start = max(start1, start2);
+   end = min(end1, end2);
+   return start <= end;
+}
+
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
+   unsigned long first_pfn = start;
unsigned long pfn;
 
start &= PAGE_SECTION_MASK;
mminit_validate_memmodel_limits(&start, &end);
for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
unsigned long section = pfn_to_section_nr(pfn);
+   unsign

[PATCH] kvm, x86: Properly check whether a pfn is an MMIO or not

2016-06-21 Thread KarimAllah Ahmed
pfn_valid check is not sufficient because it only checks if a page has a struct
page or not, if for example "mem=" was passed to the kernel some valid pages
won't have a struct page. This means that if guests were assigned valid memory
that lies after the mem= boundary it will be passed uncached to the guest no
matter what the guest caching attributes are for this memory.

Use the original e820 map to check whether a certain pfn belongs to RAM or not.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Borislav Petkov 
Cc: Denys Vlasenko 
Cc: Andrew Morton 
Cc: Toshi Kani 
Cc: Tony Luck 
Cc: linux-kernel@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/include/asm/e820.h |  1 +
 arch/x86/kernel/e820.c  | 18 ++
 arch/x86/kvm/mmu.c  |  2 +-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 3ab0537..2d4f7d8 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -16,6 +16,7 @@ extern struct e820map e820_saved;
 extern unsigned long pci_mem_start;
 extern int e820_any_mapped(u64 start, u64 end, unsigned type);
 extern int e820_all_mapped(u64 start, u64 end, unsigned type);
+extern bool e820_is_ram(u64 addr);
 extern void e820_add_region(u64 start, u64 size, int type);
 extern void e820_print_map(char *who);
 extern int
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 621b501..387cdba 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -105,6 +105,24 @@ int __init e820_all_mapped(u64 start, u64 end, unsigned 
type)
return 0;
 }
 
+bool
+e820_is_ram(u64 addr)
+{
+   int i;
+
+   for (i = 0; i < e820_saved.nr_map; i++) {
+   struct e820entry *ei = &e820_saved.map[i];
+
+   if (ei->type != E820_RAM)
+   continue;
+   if ((addr >= ei->addr) && (addr < (ei->addr + ei->size)))
+   return true;
+   }
+
+   return false;
+}
+EXPORT_SYMBOL_GPL(e820_is_ram);
+
 /*
  * Add a memory region to the kernel e820 map.
  */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 24e8001..5e07bf5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2507,7 +2507,7 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
if (pfn_valid(pfn))
return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
 
-   return true;
+   return !e820_is_ram(pfn << PAGE_SHIFT);
 }
 
 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-- 
2.8.2



RE: Congratulations!!!

2015-08-28 Thread Ahmed, Usmann
Congratulations money was donated to you reply,harold-diam...@outlook.com. for 
more info.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


OK

2018-08-04 Thread Ahmed Zama
Greetings

Please assist me to receive about 15 million euros into your personal
account. I will give you details as I hear from you.

Regard,

Mr Ahmed Zama


Please respond urgently!

2018-08-10 Thread Ahmed Hassan

Dear Friend,

I know that this mail will come to you as a surprise as we have never met 
before, but need not to worry as I am contacting you independently of my 
investigation and no one is informed of this communication. I need your urgent 
assistance in transferring the sum of $11.3million immediately to your private 
account.The money has been here in our Bank lying dormant for years now without 
anybody coming for the claim of it.

I want to release the money to you as the relative to our deceased customer 
(the account owner) who died a long with his supposed NEXT OF KIN since 16th 
October 2005. The Banking laws here does not allow such money to stay more than 
13 years, because the money will be recalled to the Bank treasury account as 
unclaimed fund.

By indicating your interest I will send you the full details on how the 
business will be executed.

Please respond urgently and delete if you are not interested.

Best Regards,
Mr. Ahmed Hassan.




OK

2018-08-23 Thread Ahmed Zama
Greetings

Please assist me to receive about 15 million euros into your personal
account. I will give you details as I hear from you.
Send me the followings,
Age
Nationality
Occupation
Telephone Line

Regard,

Mr Ahmed Zama


RE: Privileged and Confidential:

2018-07-29 Thread AHMED KARIM


RE: Privileged and Confidential:
 
Here I brought a potential Business Proposal at your door step for 
consideration.
 
I have a client that is interested to Invest in your Country and would like to 
engage you and your company on this project. The Investment Amount is valued at 
US$500 million.
 
If you are interested, kindly include your direct telephone numbers for full 
discussion of this offer when responding to this email.
 
Respectfully,
 
AHMED KARIM
Email reply here mohamedabdul1...@gmail.com



RE: Privileged and Confidential:

2018-07-21 Thread AHMED KARIM
Dear Respectfully,

My name is Ahmed Abdul  from Syria,please i need your urgent assistance to help 
me and my two daughters relocate out of Syria because of the recent bombing by 
president Trump and his intentions to bomb more..I need you to help us relocate 
including our belongings and funds for we are good people and would not like to 
be treated as refugees and we have the cash to buy a new house , good school 
for my kids and a good business for us to start a new life millions of dollars 
is involved right now move us to any Muslim world asap.  Please help us. the 
money is now cash with secrete security company in india, as Indian i need your 
help to talk with the delivering agent in India to deliver it for your home 
$40% is for you while the rest is for me,please keep this mail secrete and 
confidential send me your cell phone number and they will contact you from India
reply here " abdulmohamed66...@gmail.com
Yours Sincerely


AHMED ABDUL


hello

2018-07-24 Thread MR.MUSA AHMED




--
DEAR FRIEND

I am MR.MUSA AHMED With the business proposal deal of US(US$18.5 mllion US 
Dollars) to transfer into your account, if you are interested 
get back to me for more detail.at my
 E-mail (mr.musa.ahme...@gmail.com)


Best Regard

MR.MUSA AHMED
--



Re: [iptables] extensions: add support for 'srh' match

2018-01-11 Thread Ahmed Abdelsalam
On Wed, 10 Jan 2018 16:32:24 +0100
Pablo Neira Ayuso  wrote:

> On Fri, Dec 29, 2017 at 12:08:25PM +0100, Ahmed Abdelsalam wrote:
> > This patch adds a new exetension to iptables to supprt 'srh' match
> > The implementation considers revision 7 of the SRH draft.
> > https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07
> > 
> > Signed-off-by: Ahmed Abdelsalam 
> > ---
> >  extensions/libip6t_srh.c| 283 
> > 
> >  include/linux/netfilter_ipv6/ip6t_srh.h |  63 +++
> 
> Please, add a extensions/libip6t_srh.t test file and send a v2.
> 
> Thanks.
Ok, 
Is there minimum requirements of the test cases to be added to the 
extensions/libip6t_srh.t file ?

-- 
Ahmed 


Re: [PATCH v3 0/4] KVM: Expose speculation control feature to guests

2018-01-30 Thread KarimAllah Ahmed

On 01/30/2018 10:00 AM, David Woodhouse wrote:



On Tue, 2018-01-30 at 01:10 +0100, KarimAllah Ahmed wrote:

Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future
Intel processors to indicate RDCL_NO and IBRS_ALL.


Thanks. I think you've already fixed the SPEC_CTRL patch in the git
tree so that it adds F(IBRS) to kvm_cpuid_8000_0008_ebx_x86_features,
right?

Yup, this is already fixed in the tree.



The SVM part of Ashok's IBPB patch is still exposing the PRED_CMD MSR
to guests based on boot_cpu_has(IBPB), not based on the *guest*
capabilities. Looking back at Paolo's patch set from January 9th, it
was done differently there but I think it had the same behaviour?

The rest of Paolo's patch set I think has been covered, except 6/8:
  lkml.kernel.org/r/20180109120311.27565-7-pbonz...@redhat.com

That exposes SPEC_CTRL for SVM too (since AMD now apparently has it).
If adding that ends up with duplicate MSR handling for get/set, perhaps
that wants shifting up into kvm_[sg]et_msr_common()? Although I don't
see offhand where you'd put the ->spec_ctrl field in that case. It
doesn't want to live in the generic (even to non-x86) struct kvm_vcpu.
So maybe a little bit of duplication is the best answer.

Other than those details, I think we're mostly getting close. Do we
want to add STIBP on top? There is some complexity there which meant I
was happier getting these first bits ready first, before piling that on
too.

I believe Ashok sent you a change which made us do IBPB on *every*
vmexit; I don't think we need that. It's currently done in vcpu_load()
which means we'll definitely have done it between running one vCPU and
the next, and when vCPUs are pinned we basically never need to do it.

We know that VMM (e.g. qemu) userspace could be vulnerable to attacks
from guest ring 3, because there is no flush between the vmexit and the
host kernel "returning" to the userspace thread. Doing a full IBPB on
*every* vmexit would protect from that, but it's overkill. If that's
the reason, let's come up with something better.


Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH v3 4/4] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-30 Thread KarimAllah Ahmed

On 01/30/2018 06:49 PM, Jim Mattson wrote:

On Mon, Jan 29, 2018 at 4:10 PM, KarimAllah Ahmed  wrote:

[ Based on a patch from Ashok Raj  ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the
MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only
add_atomic_switch_msr when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
   when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
v3:
- Save/restore manually
- Fix CPUID handling
- Fix a copy & paste error in the name of SPEC_CTRL MSR in
   disable_intercept.
- support !cpu_has_vmx_msr_bitmap()
---
  arch/x86/kvm/cpuid.c |  7 +--
  arch/x86/kvm/vmx.c   | 59 
  arch/x86/kvm/x86.c   |  2 +-
  3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..662d0c0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,

 /* cpuid 7.0.edx*/
 const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+   F(ARCH_CAPABILITIES);

 /* all calls to cpuid_count() should be made on the same cpu */
 get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 g_phys_as = phys_as;
 entry->eax = g_phys_as | (virt_as << 8);
 entry->edx = 0;
-   /* IBPB isn't necessarily present in hardware cpuid */
+   /* IBRS and IBPB aren't necessarily present in hardware cpuid */
 if (boot_cpu_has(X86_FEATURE_IBPB))
 entry->ebx |= F(IBPB);
+   if (boot_cpu_has(X86_FEATURE_IBRS))
+   entry->ebx |= F(IBRS);
 entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
 cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
 break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 798a00b..9ac9747 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -582,6 +582,8 @@ struct vcpu_vmx {
 u64   msr_guest_kernel_gs_base;
  #endif
 u64   arch_capabilities;
+   u64   spec_ctrl;
+   bool  save_spec_ctrl_on_exit;

 u32 vm_entry_controls_shadow;
 u32 vm_exit_controls_shadow;
@@ -922,6 +924,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked);
  static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
 u16 error_code);
  static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long 
*msr_bitmap,
+ u32 msr, int type);

  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3226,6 +3230,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 case MSR_IA32_TSC:
 msr_info->data = guest_read_tsc(vcpu);
 break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   msr_info->data = to_vmx(vcpu)->spec_ctrl;
+   break;
 case MSR_IA32_ARCH_CAPABILITIES:
 if (!msr_info->host_initiated &&
 !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
@@ -3339,6 +3350,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,

Re: [PATCH v3 4/4] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-30 Thread KarimAllah Ahmed

On 01/30/2018 11:49 PM, Jim Mattson wrote:

On Tue, Jan 30, 2018 at 1:00 PM, KarimAllah Ahmed  wrote:

Ooops! I did not think at all about nested :)

This should be addressed now, I hope:

http://git.infradead.org/linux-retpoline.git/commitdiff/f7f0cbba3e0cffcee050a8a5a9597a162d57e572


+   if (cpu_has_vmx_msr_bitmap() && data &&
+   !vmx->save_spec_ctrl_on_exit) {
+   vmx->save_spec_ctrl_on_exit = true;
+
+   msr_bitmap = is_guest_mode(vcpu) ?
vmx->nested.vmcs02.msr_bitmap :
+
vmx->vmcs01.msr_bitmap;
+   vmx_disable_intercept_for_msr(msr_bitmap,
+ MSR_IA32_SPEC_CTRL,
+ MSR_TYPE_RW);
+   }

There are two ways to get to this point in vmx_set_msr while
is_guest_mode(vcpu) is true:
1) L0 is processing vmcs12's VM-entry MSR load list on emulated
VM-entry (see enter_vmx_non_root_mode).
2) L2 tried to execute WRMSR, writes to the MSR are intercepted in
vmcs02's MSR permission bitmap, and writes to the MSR are not
intercepted in vmcs12's MSR permission bitmap.

In the first case, disabling the intercepts for the MSR in
vmx->nested.vmcs02.msr_bitmap is incorrect, because we haven't yet
determined that the intercepts are clear in vmcs12's MSR permission
bitmap.
In the second case, disabling *both* of the intercepts for the MSR in
vmx->nested.vmcs02.msr_bitmap is incorrect, because we don't know that
the read intercept is clear in vmcs12's MSR permission bitmap.
Furthermore, disabling the write intercept for the MSR in
vmx->nested.vmcs02.msr_bitmap is somewhat fruitless, because
nested_vmx_merge_msr_bitmap is just going to undo that change on the
next emulated VM-entry.


Okay, I took a second look at the code (specially 
nested_vmx_merge_msr_bitmap).


This means that I simply should not touch the MSR bitmap in set_msr in
case of nested, I just need to properly update the l02 msr_bitmap in
nested_vmx_merge_msr_bitmap. As in here:

http://git.infradead.org/linux-retpoline.git/commitdiff/d90eedebdd16bb00741a2c93bc13c5e444c99c2b

or am I still missing something? (sorry, did not actually look at the
nested code before!)




Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH v3 4/4] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-30 Thread KarimAllah Ahmed

On 01/31/2018 01:27 AM, Jim Mattson wrote:

On Tue, Jan 30, 2018 at 4:19 PM, Paolo Bonzini  wrote:

The new code in nested_vmx_merge_msr_bitmap should be conditional on
vmx->save_spec_ctrl_on_exit.


But then if L1 doesn't use MSR_IA32_SPEC_CTRL itself and it uses the
VM-entry MSR load list to set up L2's MSR_IA32_SPEC_CTRL, you will
never set vmx->save_spec_ctrl_on_exit, and L2's accesses to the MSR
will always be intercepted by L0.


I can add another variable (actually two) to indicate if msr
interception should be disabled or not for SPEC_CTRL and PRED_CMD in
nested case.

That would allow us to have a fast alternative to guest_cpuid_has in
nested_vmx_merge_msr_bitmap and at the same time maintain the current
semantics of save_spec_ctrl_on_exit (i.e we would still differentiate 
between set_msr that is called from the loading MSRs for the emulated 
vm-entry vs L2 actually writing to it).


What do you think?
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


[PATCH v4 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-31 Thread KarimAllah Ahmed
[ Based on a patch from Paolo Bonzini  ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/svm.c | 58 ++
 1 file changed, 58 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 89495cf..e1ba4c6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -184,6 +184,9 @@ struct vcpu_svm {
u64 gs_base;
} host;
 
+   u64 spec_ctrl;
+   bool save_spec_ctrl_on_exit;
+
u32 *msrpm;
 
ulong nmi_iret_rip;
@@ -1583,6 +1586,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
u32 dummy;
u32 eax = 1;
 
+   svm->spec_ctrl = 0;
+
if (!init_event) {
svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
   MSR_IA32_APICBASE_ENABLE;
@@ -3604,6 +3609,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_VM_CR:
msr_info->data = svm->nested.vm_cr_msr;
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   msr_info->data = svm->spec_ctrl;
+   break;
case MSR_IA32_UCODE_REV:
msr_info->data = 0x0165;
break;
@@ -3695,6 +3707,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   /* The STIBP bit doesn't fault even if it's not advertised */
+   if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+   return 1;
+
+   svm->spec_ctrl = data;
+
+   /*
+* When it's written (to non-zero) for the first time, pass
+* it through. This means we don't have to take the perf
+* hit of saving it on vmexit for the common case of guests
+* that don't use it.
+*/
+   if (data && !svm->save_spec_ctrl_on_exit) {
+   svm->save_spec_ctrl_on_exit = true;
+   if (is_guest_mode(vcpu))
+   break;
+   set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 
1);
+   }
+   break;
case MSR_IA32_PRED_CMD:
if (!msr->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
@@ -4963,6 +4999,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
local_irq_enable();
 
+   /*
+* If this vCPU has touched SPEC_CTRL, restore the guest's value if
+* it's non-zero. Since vmentry is serialising on affected CPUs, there
+* is no need to worry about the conditional branch over the wrmsr
+* being speculatively taken.
+*/
+   if (svm->spec_ctrl)
+   wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
asm volatile (
"push %%" _ASM_BP "; \n\t"
"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
@@ -5055,6 +5100,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
);
 
+   /*
+* We do not use IBRS in the kernel. If this vCPU has used the
+* SPEC_CTRL MSR it may have left it on; save the value and
+* turn it off. This is much more efficient than blindly adding
+* it to the atomic save/restore list. Especially as the former
+* (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+*/
+   if (svm->save_spec_ctrl_on_exit)
+   rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
+   if (svm->spec_ctrl)
+   wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
/* Eliminate branch target predictions from guest mode */
vmexit_fill_RSB();
 
-- 
2.7.4



[PATCH v4 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-31 Thread KarimAllah Ahmed
[ Based on a patch from Ashok Raj  ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the
MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only
add_atomic_switch_msr when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
v4:
- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
- Handling nested guests
v3:
- Save/restore manually
- Fix CPUID handling
- Fix a copy & paste error in the name of SPEC_CTRL MSR in
  disable_intercept.
- support !cpu_has_vmx_msr_bitmap()
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
  when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
---
 arch/x86/kvm/cpuid.c |  9 ---
 arch/x86/kvm/vmx.c   | 68 
 arch/x86/kvm/x86.c   |  2 +-
 3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 0x8008.ebx */
const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-   F(IBPB);
+   F(IBPB) | F(IBRS);
 
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+   F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
entry->edx = 0;
-   /* IBPB isn't necessarily present in hardware cpuid */
+   /* IBRS and IBPB aren't necessarily present in hardware cpuid */
if (boot_cpu_has(X86_FEATURE_IBPB))
entry->ebx |= F(IBPB);
+   if (boot_cpu_has(X86_FEATURE_IBRS))
+   entry->ebx |= F(IBRS);
entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 40643b8..9080938 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -593,6 +593,8 @@ struct vcpu_vmx {
u64   msr_guest_kernel_gs_base;
 #endif
u64   arch_capabilities;
+   u64   spec_ctrl;
+   bool  save_spec_ctrl_on_exit;
 
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
@@ -938,6 +940,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked);
 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long 
*msr_bitmap,
+ u32 msr, int type);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3238,6 +3242,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   msr_info->data =

[PATCH v4 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX

2018-01-31 Thread KarimAllah Ahmed
[dwmw2: Stop using KF() for bits in it, too]
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Paolo Bonzini 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/cpuid.c | 8 +++-
 arch/x86/kvm/cpuid.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW 2
-#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+/* For scattered features from cpufeatures.h; we currently expose none */
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);
entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+   cpuid_mask(&entry->edx, CPUID_7_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2cea66..9a327d5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
-- 
2.7.4



[PATCH v4 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

2018-01-31 Thread KarimAllah Ahmed
Future intel processors will use MSR_IA32_ARCH_CAPABILITIES MSR to indicate
RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default
the contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Ashok Raj 
Reviewed-by: Paolo Bonzini 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/cpuid.c |  2 +-
 arch/x86/kvm/vmx.c   | 15 +++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 96e672e..40643b8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -592,6 +592,8 @@ struct vcpu_vmx {
u64   msr_host_kernel_gs_base;
u64   msr_guest_kernel_gs_base;
 #endif
+   u64   arch_capabilities;
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -3236,6 +3238,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+   return 1;
+   msr_info->data = to_vmx(vcpu)->arch_capabilities;
+   break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -3362,6 +3370,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, 
MSR_IA32_PRED_CMD,
  MSR_TYPE_W);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated)
+   return 1;
+   vmx->arch_capabilities = data;
+   break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5624,6 +5637,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
 
+   if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+   rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..4ec142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+   MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;
-- 
2.7.4



[PATCH v4 0/5] KVM: Expose speculation control feature to guests

2018-01-31 Thread KarimAllah Ahmed
Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future
Intel processors to indicate RDCL_NO and IBRS_ALL.

v4:
- Add IBRS passthrough for SVM (5/5).
- Handle nested guests properly.
- expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features

Ashok Raj (1):
  KVM: x86: Add IBPB support

KarimAllah Ahmed (4):
  KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

 arch/x86/kvm/cpuid.c |  22 +++---
 arch/x86/kvm/cpuid.h |   1 +
 arch/x86/kvm/svm.c   |  85 ++
 arch/x86/kvm/vmx.c   | 114 ++-
 arch/x86/kvm/x86.c   |   1 +
 5 files changed, 216 insertions(+), 7 deletions(-)

Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Ashok Raj 
Cc: Asit Mallick 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Janakarajan Natarajan 
Cc: Joerg Roedel 
Cc: Jun Nakajima 
Cc: Laura Abbott 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Tim Chen 
Cc: Tom Lendacky 
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org

-- 
2.7.4



[PATCH v4 2/5] KVM: x86: Add IBPB support

2018-01-31 Thread KarimAllah Ahmed
From: Ashok Raj 

Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor
barriers on switching between VMs to avoid inter VM Spectre-v2 attacks.

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
   - vmx: expose PRED_CMD if guest has it in CPUID
   - svm: only pass through IBPB if guest has it in CPUID
   - vmx: support !cpu_has_vmx_msr_bitmap()]
   - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
PRED_CMD is a write-only MSR]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Signed-off-by: Ashok Raj 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com
Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/kvm/cpuid.c | 11 ++-
 arch/x86/kvm/svm.c   | 27 +++
 arch/x86/kvm/vmx.c   | 31 ++-
 3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+   /* cpuid 0x8008.ebx */
+   const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+   F(IBPB);
+
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
-   entry->ebx = entry->edx = 0;
+   entry->edx = 0;
+   /* IBPB isn't necessarily present in hardware cpuid */
+   if (boot_cpu_has(X86_FEATURE_IBPB))
+   entry->ebx |= F(IBPB);
+   entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+   cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
}
case 0x8019:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f40d0da..89495cf 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -529,6 +529,7 @@ struct svm_cpu_data {
struct kvm_ldttss_desc *tss_desc;
 
struct page *save_area;
+   struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -1703,11 +1704,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
+   /*
+* The vmcb page can be recycled, causing a false negative in
+* svm_vcpu_load(). So do a full IBPB now.
+*/
+   indirect_branch_prediction_barrier();
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
int i;
 
if (unlikely(cpu != vcpu->cpu)) {
@@ -1736,6 +1743,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (static_cpu_has(X86_FEATURE_RDTSCP))
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
+   if (sd->current_vmcb != svm->vmcb) {
+   sd->current_vmcb = svm->vmcb;
+   indirect_branch_prediction_barrier();
+   }
avic_vcpu_load(vcpu, cpu);
 }
 
@@ -3684,6 +3695,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+   case MSR_IA32_PRED_CMD:
+   if (!msr->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
+   return 1;
+
+   if (data & ~PRED_CMD_IBPB)
+   return 1;
+
+   if (!data)
+   break;
+
+   wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+   if (is_guest_mode(vcpu))
+   break;
+   set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
+   break;
case MSR_STAR:
svm->vmcb->save.star = data;
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d46a61b..96e672e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2285,6 +2285,7 @@ static void vmx_vcpu_load(struct kvm_

Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

2018-01-31 Thread KarimAllah Ahmed

On 01/31/2018 05:50 PM, Jim Mattson wrote:

On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed  wrote:


+   vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, 
MSR_IA32_PRED_CMD,
+ MSR_TYPE_W);


Why not disable this intercept eagerly, rather than lazily? Unlike
MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there
is no cost to disabling the intercept if the guest cpuid info declares
support for it.



+   if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
+   nested_vmx_disable_intercept_for_msr(
+   msr_bitmap_l1, msr_bitmap_l0,
+   MSR_IA32_PRED_CMD,
+   MSR_TYPE_R);
+   }


I don't think this should be predicated on
"to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
"guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"?


Paolo suggested this on the previous revision because guest_cpuid_has()
would be slow.


Also, the final argument to
nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
MSR_TYPE_R.


Oops! will fix!
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

2018-01-31 Thread KarimAllah Ahmed

On 01/31/2018 05:55 PM, Paolo Bonzini wrote:

On 31/01/2018 11:50, Jim Mattson wrote:

+   if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
+   nested_vmx_disable_intercept_for_msr(
+   msr_bitmap_l1, msr_bitmap_l0,
+   MSR_IA32_PRED_CMD,
+   MSR_TYPE_R);
+   }

I don't think this should be predicated on
"to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
"guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to
nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
MSR_TYPE_R.


In fact this MSR can even be passed down unconditionally, since it needs
no save/restore and has no ill performance effect on the sibling
hyperthread.

Only MSR_IA32_SPEC_CTRL needs to be conditional on
"to_vmx(vcpu)->save_spec_ctrl_on_exit".


That used to be the case in an earlier version. There seems to be two
opinions here:

1) Pass it only if CPUID for the guest has it.
2) Pass it unconditionally.

I do not really have a preference.




Paolo


Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


[PATCH v5 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX

2018-01-31 Thread KarimAllah Ahmed
[dwmw2: Stop using KF() for bits in it, too]
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Paolo Bonzini 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/cpuid.c | 8 +++-
 arch/x86/kvm/cpuid.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW 2
-#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+/* For scattered features from cpufeatures.h; we currently expose none */
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);
entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+   cpuid_mask(&entry->edx, CPUID_7_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2cea66..9a327d5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
-- 
2.7.4



[PATCH v5 0/5] KVM: Expose speculation control feature to guests

2018-01-31 Thread KarimAllah Ahmed
Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future
Intel processors to indicate RDCL_NO and IBRS_ALL.

v5:
- svm: add PRED_CMD and SPEC_CTRL to direct_access_msrs list.
- vmx: check also for X86_FEATURE_SPEC_CTRL for msr reads and writes.
- vmx: Use MSR_TYPE_W instead of MSR_TYPE_R for the nested IBPB MSR
- rewrite commit message for IBPB patch [2/5] (Ashok)

v4:
- Add IBRS passthrough for SVM (5/5).
- Handle nested guests properly.
- expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features

Ashok Raj (1):
  KVM: x86: Add IBPB support

KarimAllah Ahmed (4):
  KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

 arch/x86/kvm/cpuid.c |  22 +++---
 arch/x86/kvm/cpuid.h |   1 +
 arch/x86/kvm/svm.c   |  87 ++
 arch/x86/kvm/vmx.c   | 117 +--
 arch/x86/kvm/x86.c   |   1 +
 5 files changed, 218 insertions(+), 10 deletions(-)

Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Ashok Raj 
Cc: Asit Mallick 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Janakarajan Natarajan 
Cc: Joerg Roedel 
Cc: Jun Nakajima 
Cc: Laura Abbott 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Tim Chen 
Cc: Tom Lendacky 
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org

-- 
2.7.4



[PATCH v5 2/5] KVM: x86: Add IBPB support

2018-01-31 Thread KarimAllah Ahmed
From: Ashok Raj 

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
there is a IBPB in that path.
(Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
   - vmx: expose PRED_CMD if guest has it in CPUID
   - svm: only pass through IBPB if guest has it in CPUID
   - vmx: support !cpu_has_vmx_msr_bitmap()]
   - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
PRED_CMD is a write-only MSR]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Signed-off-by: Ashok Raj 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com
Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 

v5:
- Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR.
- Always merge the bitmaps unconditionally.
- Add PRED_CMD to direct_access_msrs.
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
- rewrite the commit message (from ashok.raj@)
---
 arch/x86/kvm/cpuid.c | 11 ++-
 arch/x86/kvm/svm.c   | 28 
 arch/x86/kvm/vmx.c   | 29 +
 3 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+   /* cpuid 0x8008.ebx */
+   const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+   F(IBPB);
+
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
-   entry->ebx = entry->edx = 0;
+   entry->edx = 0;
+   /* IBPB isn't necessarily present in hardware cpuid */
+   if (boot_cpu_has(X86_FEATURE_IBPB))
+   entry->ebx |= F(IBPB);
+   entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+   cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
}
case 0x8019:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f40d0da..bfbb7b9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -250,6 +250,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_SYSCALL_MASK,.always = true  },
 #endif
{ .index = MSR_IA32_LASTBRANCHFROMIP,   

[PATCH v5 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

2018-01-31 Thread KarimAllah Ahmed
Future intel processors will use MSR_IA32_ARCH_CAPABILITIES MSR to indicate
RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default
the contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Ashok Raj 
Reviewed-by: Paolo Bonzini 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/cpuid.c |  2 +-
 arch/x86/kvm/vmx.c   | 15 +++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2e4e8af..a0b2bd1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -592,6 +592,8 @@ struct vcpu_vmx {
u64   msr_host_kernel_gs_base;
u64   msr_guest_kernel_gs_base;
 #endif
+   u64   arch_capabilities;
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -3236,6 +3238,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+   return 1;
+   msr_info->data = to_vmx(vcpu)->arch_capabilities;
+   break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -3363,6 +3371,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, 
MSR_IA32_PRED_CMD,
  MSR_TYPE_W);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated)
+   return 1;
+   vmx->arch_capabilities = data;
+   break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5625,6 +5638,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
 
+   if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+   rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..4ec142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+   MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;
-- 
2.7.4



[PATCH v5 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-31 Thread KarimAllah Ahmed
[ Based on a patch from Paolo Bonzini  ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
v5:
- Add SPEC_CTRL to direct_access_msrs.
---
 arch/x86/kvm/svm.c | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bfbb7b9..0016a8a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -184,6 +184,9 @@ struct vcpu_svm {
u64 gs_base;
} host;
 
+   u64 spec_ctrl;
+   bool save_spec_ctrl_on_exit;
+
u32 *msrpm;
 
ulong nmi_iret_rip;
@@ -250,6 +253,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_SYSCALL_MASK,.always = true  },
 #endif
{ .index = MSR_IA32_LASTBRANCHFROMIP,   .always = false },
+   { .index = MSR_IA32_SPEC_CTRL,  .always = false },
{ .index = MSR_IA32_PRED_CMD,   .always = false },
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
{ .index = MSR_IA32_LASTINTFROMIP,  .always = false },
@@ -1584,6 +1588,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
u32 dummy;
u32 eax = 1;
 
+   svm->spec_ctrl = 0;
+
if (!init_event) {
svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
   MSR_IA32_APICBASE_ENABLE;
@@ -3605,6 +3611,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_VM_CR:
msr_info->data = svm->nested.vm_cr_msr;
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   msr_info->data = svm->spec_ctrl;
+   break;
case MSR_IA32_UCODE_REV:
msr_info->data = 0x0165;
break;
@@ -3696,6 +3709,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   /* The STIBP bit doesn't fault even if it's not advertised */
+   if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+   return 1;
+
+   svm->spec_ctrl = data;
+
+   /*
+* When it's written (to non-zero) for the first time, pass
+* it through. This means we don't have to take the perf
+* hit of saving it on vmexit for the common case of guests
+* that don't use it.
+*/
+   if (data && !svm->save_spec_ctrl_on_exit) {
+   svm->save_spec_ctrl_on_exit = true;
+   if (is_guest_mode(vcpu))
+   break;
+   set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 
1);
+   }
+   break;
case MSR_IA32_PRED_CMD:
if (!msr->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
@@ -4964,6 +5001,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
local_irq_enable();
 
+   /*
+* If this vCPU has touched SPEC_CTRL, restore the guest's value if
+* it's non-zero. Since vmentry is serialising on affected CPUs, there
+* is no need to worry about the conditional branch over the wrmsr
+* being speculatively taken.
+*/
+   if (svm->spec_ctrl)
+   wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
asm volatile (
"push %%" _ASM_BP "; \n\t"
"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
@@ -5056,6 +5102,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
);
 
+   /*
+* We do not use IBRS in the kernel. If this vCPU has used the
+* SPEC_CTRL MSR it may have left it on; save the value and
+* turn it off. This is much more efficient than blindly adding
+* it to the atomic save/restore list. Especiall

[PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-31 Thread KarimAllah Ahmed
[ Based on a patch from Ashok Raj  ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the
MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only
add_atomic_switch_msr when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
v5:
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
v4:
- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
- Handling nested guests
v3:
- Save/restore manually
- Fix CPUID handling
- Fix a copy & paste error in the name of SPEC_CTRL MSR in
  disable_intercept.
- support !cpu_has_vmx_msr_bitmap()
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
  when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
---
 arch/x86/kvm/cpuid.c |  9 ---
 arch/x86/kvm/vmx.c   | 73 
 arch/x86/kvm/x86.c   |  2 +-
 3 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 0x8008.ebx */
const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-   F(IBPB);
+   F(IBPB) | F(IBRS);
 
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+   F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
entry->edx = 0;
-   /* IBPB isn't necessarily present in hardware cpuid */
+   /* IBRS and IBPB aren't necessarily present in hardware cpuid */
if (boot_cpu_has(X86_FEATURE_IBPB))
entry->ebx |= F(IBPB);
+   if (boot_cpu_has(X86_FEATURE_IBRS))
+   entry->ebx |= F(IBRS);
entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a0b2bd1..4ee93cb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -593,6 +593,8 @@ struct vcpu_vmx {
u64   msr_guest_kernel_gs_base;
 #endif
u64   arch_capabilities;
+   u64   spec_ctrl;
+   bool  save_spec_ctrl_on_exit;
 
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
@@ -938,6 +940,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked);
 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long 
*msr_bitmap,
+ u32 msr, int type);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3238,6 +3242,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, 

Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-31 Thread KarimAllah Ahmed



On 01/31/2018 08:53 PM, Jim Mattson wrote:

On Wed, Jan 31, 2018 at 11:37 AM, KarimAllah Ahmed  wrote:


+
+   if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
+   nested_vmx_disable_intercept_for_msr(
+   msr_bitmap_l1, msr_bitmap_l0,
+   MSR_IA32_SPEC_CTRL,
+   MSR_TYPE_R | MSR_TYPE_W);
+   }
+


As this is written, L2 will never get direct access to this MSR until
after L1 writes it.  What if L1 never writes it? The condition should
really be something that captures, "if L0 is willing to yield this MSR
to the guest..."


but save_spec_ctrl_on_exit is also set for L2 write. So once L2 writes
to it, this condition will be true and then the bitmap will be updated.




Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH v5 2/5] KVM: x86: Add IBPB support

2018-01-31 Thread KarimAllah Ahmed

On 01/31/2018 09:28 PM, Konrad Rzeszutek Wilk wrote:

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d46a61b..2e4e8af 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2285,6 +2285,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
vmcs_load(vmx->loaded_vmcs->vmcs);
+   indirect_branch_prediction_barrier();
}
  
  	if (!already_loaded) {

@@ -3342,6 +3343,26 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);
break;
+   case MSR_IA32_PRED_CMD:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+   return 1;
+
+   if (data & ~PRED_CMD_IBPB)
+   return 1;
+
+   if (!data)
+   break;
+
+   wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+
+   if (is_guest_mode(vcpu))
+   break;


Don't you want this the other way around? That is first do the disable_intercept
and then add the 'if (is_guest_mode(vcpu))' ? Otherwise the very first
MSR write from the guest is going to hit condition above and never end
up executing the disabling of the intercept?


is_guest_mode is checking if this is an L2 guest. I *should not* do
disable_intercept on the L1 guest bitmap if it is an L2 guest that is
why this check happens before disable_intercept.

For the short-circuited L2 path, nested_vmx_merge_msr_bitmap will
properly update the L02 MSR bitmap and use it.

So the checks are fine AFAICT.




+
+   vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, 
MSR_IA32_PRED_CMD,
+ MSR_TYPE_W);
+   break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))



Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-31 Thread KarimAllah Ahmed

On 01/31/2018 09:18 PM, Jim Mattson wrote:

On Wed, Jan 31, 2018 at 12:01 PM, KarimAllah Ahmed  wrote:


but save_spec_ctrl_on_exit is also set for L2 write. So once L2 writes
to it, this condition will be true and then the bitmap will be updated.


So if L1 or any L2 writes to the MSR, then save_spec_ctrl_on_exit is
set to true, even if the MSR permission bitmap for a particular VMCS
*doesn't* allow the MSR to be written without an intercept. That's
functionally correct, but inefficient. It seems to me that
save_spec_ctrl_on_exit should indicate whether or not the *current*
MSR permission bitmap allows unintercepted writes to IA32_SPEC_CTRL.
To that end, perhaps save_spec_ctrl_on_exit rightfully belongs in the
loaded_vmcs structure, alongside the msr_bitmap pointer that it is
associated with. For vmcs02, nested_vmx_merge_msr_bitmap() should set
the vmcs02 save_spec_ctrl_on_exit based on (a) whether L0 is willing
to yield the MSR to L1, and (b) whether L1 is willing to yield the MSR
to L2.


I actually got rid of this save_spec_ctrl_on_exit variable and replaced
it with another variable like the one suggested for IBPB. Just to avoid
doing an expensive guest_cpuid_has. Now I peak instead in the MSR bitmap
to figure out if this MSR was supposed to be intercepted or not. This
test should provide a similar semantics to save_spec_ctrl_on_exit.

Anyway, cleaning up/testing now and will post a new version.
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-31 Thread KarimAllah Ahmed

On 01/31/2018 11:52 PM, KarimAllah Ahmed wrote:

On 01/31/2018 09:18 PM, Jim Mattson wrote:
On Wed, Jan 31, 2018 at 12:01 PM, KarimAllah Ahmed 
 wrote:



but save_spec_ctrl_on_exit is also set for L2 write. So once L2 writes
to it, this condition will be true and then the bitmap will be updated.


So if L1 or any L2 writes to the MSR, then save_spec_ctrl_on_exit is
set to true, even if the MSR permission bitmap for a particular VMCS
*doesn't* allow the MSR to be written without an intercept. That's
functionally correct, but inefficient. It seems to me that
save_spec_ctrl_on_exit should indicate whether or not the *current*
MSR permission bitmap allows unintercepted writes to IA32_SPEC_CTRL.
To that end, perhaps save_spec_ctrl_on_exit rightfully belongs in the
loaded_vmcs structure, alongside the msr_bitmap pointer that it is
associated with. For vmcs02, nested_vmx_merge_msr_bitmap() should set
the vmcs02 save_spec_ctrl_on_exit based on (a) whether L0 is willing
to yield the MSR to L1, and (b) whether L1 is willing to yield the MSR
to L2.


I actually got rid of this save_spec_ctrl_on_exit variable and replaced
it with another variable like the one suggested for IBPB. Just to avoid
doing an expensive guest_cpuid_has. Now I peak instead in the MSR bitmap
to figure out if this MSR was supposed to be intercepted or not. This
test should provide a similar semantics to save_spec_ctrl_on_exit.

Anyway, cleaning up/testing now and will post a new version.


I think this patch should address all your concerns.
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
>From 9c19a8ac3f021efba6f70ad7e28f7ad06bb97e43 Mon Sep 17 00:00:00 2001
From: KarimAllah Ahmed 
Date: Mon, 29 Jan 2018 19:58:10 +
Subject: [PATCH] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Ashok Raj  ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the
MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only
add_atomic_switch_msr when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
v6:
- got rid of save_spec_ctrl_on_exit
- introduce spec_ctrl_intercepted
- introduce spec_ctrl_used
v5:
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
v4:
- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
- Handling nested guests
v3:
- Save/restore manually
- Fix CPUID handling
- Fix a copy & paste error in the name of SPEC_CTRL MSR in
  disable_intercept.
- support !cpu_has_vmx_msr_bitmap()
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
  when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
---
 arch/x86/kvm/cpuid.c |  9 +++--
 arch/x86/kvm/vmx.c   | 94 +++-
 arch/x86/kvm/x86.c   |  2 +-
 3 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 0x8008.ebx */
 	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-		F(IBPB);
+		F(IBPB) | F(IBRS);
 
 	/* cpuid 0xC001.edx */
 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
-		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+		F(ARCH_CAPABILITIES);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 

Re: [PATCH v5 2/5] KVM: x86: Add IBPB support

2018-01-31 Thread KarimAllah Ahmed

On 01/31/2018 08:55 PM, Jim Mattson wrote:

On Wed, Jan 31, 2018 at 11:53 AM, David Woodhouse  wrote:

Rather than doing the expensive guest_cpu_has() every time (which is
worse now as we realised we need two of them) perhaps we should
introduce a local flag for that too?


That sounds good to me.



Done.
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
>From d51391ae3667f85cd1d6160e83c1d6c28b47b7d8 Mon Sep 17 00:00:00 2001
From: Ashok Raj 
Date: Thu, 11 Jan 2018 17:32:19 -0800
Subject: [PATCH] KVM: x86: Add IBPB support

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
there is a IBPB in that path.
(Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
   - vmx: expose PRED_CMD if guest has it in CPUID
   - svm: only pass through IBPB if guest has it in CPUID
   - vmx: support !cpu_has_vmx_msr_bitmap()]
   - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
PRED_CMD is a write-only MSR]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Signed-off-by: Ashok Raj 
Signed-off-by: Peter Zijlstra (Intel) 
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com
Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
---
v6:
- introduce pred_cmd_used

v5:
- Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR.
- Always merge the bitmaps unconditionally.
- Add PRED_CMD to direct_access_msrs.
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
- rewrite the commit message (from ashok.raj@)
---
 arch/x86/kvm/cpuid.c | 11 ++-
 arch/x86/kvm/svm.c   | 28 
 arch/x86/kvm/vmx.c   | 42 --
 3 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
 		0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+	/* cpuid 0x8008.ebx */
+	const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+		F(IBPB);
+
 	/* cpuid 0xC001.edx */
 	const u32 kvm_cpuid_C000_0001_edx_x86_features =
 		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		if (!g_phys_as)
 			g_phys_as = phys_as;
 		entry->eax = g_phys_as | (virt_as << 8);
-		entry->ebx = entry->edx = 0;
+		entry->edx = 0;
+		/* IBPB isn't necessarily present in hardware cpuid */
+		if (boot_cpu_has(X86_FEATURE_IBPB))
+			entr

Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-02-01 Thread KarimAllah Ahmed

On 02/01/2018 03:19 PM, Konrad Rzeszutek Wilk wrote:

.snip..

+/* Is SPEC_CTRL intercepted for the currently running vCPU? */
+static bool spec_ctrl_intercepted(struct kvm_vcpu *vcpu)
+{
+   unsigned long *msr_bitmap;
+   int f = sizeof(unsigned long);
+
+   if (!cpu_has_vmx_msr_bitmap())
+   return true;
+
+   msr_bitmap = is_guest_mode(vcpu) ?
+   to_vmx(vcpu)->nested.vmcs02.msr_bitmap :
+   to_vmx(vcpu)->vmcs01.msr_bitmap;
+
+   return !!test_bit(MSR_IA32_SPEC_CTRL, msr_bitmap + 0x800 / f);
+}
+

..snip..

@@ -3359,6 +3393,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+   return 1;
+
+   vmx->spec_ctrl_used = true;
+
+   /* The STIBP bit doesn't fault even if it's not advertised */
+   if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+   return 1;
+
+   vmx->spec_ctrl = data;
+
+   /*
+* When it's written (to non-zero) for the first time, pass
+* it through. This means we don't have to take the perf


.. But only if it is a nested guest (as you have && is_guest_mode).

Do you want to update the comment a bit?


+* hit of saving it on vmexit for the common case of guests
+* that don't use it.
+*/
+   if (cpu_has_vmx_msr_bitmap() && data &&
+   spec_ctrl_intercepted(vcpu) &&
+   is_guest_mode(vcpu))

 ^^ <=== here


Would it be perhaps also good to mention the complexity of how
we ought to be handling L1 and L2 guests in the commit?

We are all stressed and I am sure some of us haven't gotten much
sleep - but it can help in say three months when some unluckly new
soul is trying to understand this and gets utterly confused.


Yup, I will go through the patches and add as much details as possible.

And yes, the is_guest_mode(vcpu) here is inverted :D I blame the late
night :)




+   vmx_disable_intercept_for_msr(
+   vmx->vmcs01.msr_bitmap,
+   MSR_IA32_SPEC_CTRL,
+   MSR_TYPE_RW);
+   break;
case MSR_IA32_PRED_CMD:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&



Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-02-01 Thread KarimAllah Ahmed

On 02/01/2018 02:25 PM, David Woodhouse wrote:



On Wed, 2018-01-31 at 23:26 -0500, Konrad Rzeszutek Wilk wrote:



diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6a9f4ec..bfc80ff 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -594,6 +594,14 @@ struct vcpu_vmx {
   #endif
   
    u64   arch_capabilities;

+ u64   spec_ctrl;
+
+ /*
+  * This indicates that:
+  * 1) guest_cpuid_has(X86_FEATURE_IBRS) == true &&
+  * 2) The guest has actually initiated a write against the MSR.
+  */
+ bool spec_ctrl_used;
   
    /*

     * This indicates that:


Thanks for persisting with the details here, Karim. In addition to
Konrad's heckling at the comments, I'll add my own request to his...

I'd like the comment for spec_ctrl_used to explain why it isn't
entirely redundant with the spec_ctrl_intercepted() function.

Without nesting, I believe it *would* be redundant, but the difference
comes when an L2 is running for which L1 has not permitted the MSR to
be passed through. That's when we have spec_ctrl_used = true but the
MSR *isn't* actually passed through in the active msr_bitmap.

Question: if spec_ctrl_used is always equivalent to the intercept bit
in the vmcs01.msr_bitmap, just not the guest bitmap... should we ditch
it and always use the bit from the vmcs01.msr_bitmap?


If I used the vmcs01.msr_bitmap, spec_ctrl_used will always be true if
L0 passed it to L1. Even if L1 did not actually pass it to L2 and even
if L2 has not written to it yet (!used).

This pretty much renders the short-circuit at
nested_vmx_merge_msr_bitmap useless:

if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
!to_vmx(vcpu)->pred_cmd_used &&
!to_vmx(vcpu)->spec_ctrl_used)
return false;

... and the default path will be kvm_vcpu_gpa_to_page + kmap.

That being said, I have to admit the logic for spec_ctrl_used is not
perfect either.

If L1 or any of the L2s touched the MSR, spec_ctrl_used will be set to
true. So if one L2 used the MSR, all other L2s will also skip the short-
circuit mentioned above and end up *always* going through
kvm_vcpu_gpa_to_page + kmap.

Maybe all of this is over-thinking and in reality the short-circuit
above is really useless and all L2 guests are happily using x2apic :)



Sorry :)


Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH v5 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-02-01 Thread KarimAllah Ahmed



On 02/01/2018 06:37 PM, KarimAllah Ahmed wrote:

On 02/01/2018 02:25 PM, David Woodhouse wrote:



On Wed, 2018-01-31 at 23:26 -0500, Konrad Rzeszutek Wilk wrote:



diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6a9f4ec..bfc80ff 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -594,6 +594,14 @@ struct vcpu_vmx {
   #endif
u64   arch_capabilities;
+ u64   spec_ctrl;
+
+ /*
+  * This indicates that:
+  * 1) guest_cpuid_has(X86_FEATURE_IBRS) == true &&
+  * 2) The guest has actually initiated a write against the MSR.
+  */
+ bool spec_ctrl_used;
/*
 * This indicates that:


Thanks for persisting with the details here, Karim. In addition to
Konrad's heckling at the comments, I'll add my own request to his...

I'd like the comment for spec_ctrl_used to explain why it isn't
entirely redundant with the spec_ctrl_intercepted() function.

Without nesting, I believe it *would* be redundant, but the difference
comes when an L2 is running for which L1 has not permitted the MSR to
be passed through. That's when we have spec_ctrl_used = true but the
MSR *isn't* actually passed through in the active msr_bitmap.

Question: if spec_ctrl_used is always equivalent to the intercept bit
in the vmcs01.msr_bitmap, just not the guest bitmap... should we ditch
it and always use the bit from the vmcs01.msr_bitmap?


If I used the vmcs01.msr_bitmap, spec_ctrl_used will always be true if
L0 passed it to L1. Even if L1 did not actually pass it to L2 and even
if L2 has not written to it yet (!used).

This pretty much renders the short-circuit at
nested_vmx_merge_msr_bitmap useless:

     if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
     !to_vmx(vcpu)->pred_cmd_used &&
     !to_vmx(vcpu)->spec_ctrl_used)
     return false;

... and the default path will be kvm_vcpu_gpa_to_page + kmap.

That being said, I have to admit the logic for spec_ctrl_used is not
perfect either.

If L1 or any of the L2s touched the MSR, spec_ctrl_used will be set to
true. So if one L2 used the MSR, all other L2s will also skip the short-
circuit mentioned above and end up *always* going through
kvm_vcpu_gpa_to_page + kmap.

Maybe all of this is over-thinking and in reality the short-circuit
above is really useless and all L2 guests are happily using x2apic :)



hehe ..

>> if spec_ctrl_used is always equivalent to the intercept bit in the
vmcs01.msr_bitmap

actually yes, we can.

I just forgot that we update the msr bitmap lazily! :)



Sorry :)


Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


[PATCH v6 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-02-01 Thread KarimAllah Ahmed
[ Based on a patch from Ashok Raj  ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
v6:
- got rid of save_spec_ctrl_on_exit
- introduce msr_write_intercepted
v5:
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
v4:
- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
- Handling nested guests
v3:
- Save/restore manually
- Fix CPUID handling
- Fix a copy & paste error in the name of SPEC_CTRL MSR in
  disable_intercept.
- support !cpu_has_vmx_msr_bitmap()
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
  when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
---
 arch/x86/kvm/cpuid.c |   9 +++--
 arch/x86/kvm/vmx.c   | 105 ++-
 arch/x86/kvm/x86.c   |   2 +-
 3 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 0x8008.ebx */
const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-   F(IBPB);
+   F(IBPB) | F(IBRS);
 
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+   F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
entry->edx = 0;
-   /* IBPB isn't necessarily present in hardware cpuid */
+   /* IBRS and IBPB aren't necessarily present in hardware cpuid */
if (boot_cpu_has(X86_FEATURE_IBPB))
entry->ebx |= F(IBPB);
+   if (boot_cpu_has(X86_FEATURE_IBRS))
+   entry->ebx |= F(IBRS);
entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b13314a..5d8a6a91 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -594,6 +594,7 @@ struct vcpu_vmx {
 #endif
 
u64   arch_capabilities;
+   u64   spec_ctrl;
 
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
@@ -1913,6 +1914,29 @@ static void update_exception_bitmap(struct kvm_vcpu 
*vcpu)
 }
 
 /*
+ * Check if MSR is intercepted for currently loaded MSR bitmap.
+ */
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+{
+   unsigned long *msr_bitmap;
+   int f = sizeof(unsigned long);
+
+   if (!cpu_has_vmx_msr_bitmap())
+   return true;
+
+   msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+
+   if (msr <= 0x1fff) {
+   return !!test_bit(msr, msr_bitmap + 0x800 / f);
+   } else if ((msr >= 0xc000) && (msr <= 0xc0001fff)) {
+   msr &= 0x1fff;
+   return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+   }
+
+   return true;
+}
+
+/*
  * Check if MSR is intercepted for L01 MSR bitmap.
  */
 static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
@@ -3264,6 +3288,14 @@ static int vmx_get_msr(struct kvm

[PATCH v6 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

2018-02-01 Thread KarimAllah Ahmed
Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Ashok Raj 
Reviewed-by: Paolo Bonzini 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/cpuid.c |  2 +-
 arch/x86/kvm/vmx.c   | 15 +++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 263eb1f..b13314a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -593,6 +593,8 @@ struct vcpu_vmx {
u64   msr_guest_kernel_gs_base;
 #endif
 
+   u64   arch_capabilities;
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -3262,6 +3264,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+   return 1;
+   msr_info->data = to_vmx(vcpu)->arch_capabilities;
+   break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -3397,6 +3405,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, 
MSR_IA32_PRED_CMD,
  MSR_TYPE_W);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated)
+   return 1;
+   vmx->arch_capabilities = data;
+   break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5659,6 +5672,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
 
+   if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+   rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..4ec142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+   MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;
-- 
2.7.4



[PATCH v6 2/5] KVM: x86: Add IBPB support

2018-02-01 Thread KarimAllah Ahmed
From: Ashok Raj 

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
there is a IBPB in that path.
(Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
   - vmx: expose PRED_CMD if guest has it in CPUID
   - svm: only pass through IBPB if guest has it in CPUID
   - vmx: support !cpu_has_vmx_msr_bitmap()]
   - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
PRED_CMD is a write-only MSR]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Signed-off-by: Ashok Raj 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com
Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
---
v6:
- introduce msr_write_intercepted_l01

v5:
- Use MSR_TYPE_W instead of MSR_TYPE_R for the MSR.
- Always merge the bitmaps unconditionally.
- Add PRED_CMD to direct_access_msrs.
- Also check for X86_FEATURE_SPEC_CTRL for the msr reads/writes
- rewrite the commit message (from ashok.raj@)
---
 arch/x86/kvm/cpuid.c | 11 +++-
 arch/x86/kvm/svm.c   | 28 ++
 arch/x86/kvm/vmx.c   | 80 ++--
 3 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+   /* cpuid 0x8008.ebx */
+   const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+   F(IBPB);
+
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
-   entry->ebx = entry->edx = 0;
+   entry->edx = 0;
+   /* IBPB isn't necessarily present in hardware cpuid */
+   if (boot_cpu_has(X86_FEATURE_IBPB))
+   entry->ebx |= F(IBPB);
+   entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+   cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
}
case 0x8019:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f40d0da..254eefb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -249,6 +249,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_CSTAR,   .always = true 

[PATCH v6 0/5] KVM: Expose speculation control feature to guests

2018-02-01 Thread KarimAllah Ahmed
Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is used by Intel processors to
indicate RDCL_NO and IBRS_ALL.

Keep in mind that the SVM part of the patch is unchanged this time. Mostly to
get feedback/confirmation about the nested handling for VMX first, once this is
done I will update SVM as well.

v6:
- Do not penalize (save/restore IBRS) all L2 guests when anyone of them starts
  using the SPEC_CTRL.

v5:
- svm: add PRED_CMD and SPEC_CTRL to direct_access_msrs list.
- vmx: check also for X86_FEATURE_SPEC_CTRL for msr reads and writes.
- vmx: Use MSR_TYPE_W instead of MSR_TYPE_R for the nested IBPB MSR
- rewrite commit message for IBPB patch [2/5] (Ashok)

v4:
- Add IBRS passthrough for SVM (5/5).
- Handle nested guests properly.
- expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features

Ashok Raj (1):
  KVM: x86: Add IBPB support

KarimAllah Ahmed (4):
  KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

 arch/x86/kvm/cpuid.c |  22 --
 arch/x86/kvm/cpuid.h |   1 +
 arch/x86/kvm/svm.c   |  87 +++
 arch/x86/kvm/vmx.c   | 196 ++-
 arch/x86/kvm/x86.c   |   1 +
 5 files changed, 299 insertions(+), 8 deletions(-)

Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Ashok Raj 
Cc: Asit Mallick 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Janakarajan Natarajan 
Cc: Joerg Roedel 
Cc: Jun Nakajima 
Cc: Laura Abbott 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Tim Chen 
Cc: Tom Lendacky 
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org

-- 
2.7.4



[PATCH v6 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

2018-02-01 Thread KarimAllah Ahmed
[ Based on a patch from Paolo Bonzini  ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
v5:
- Add SPEC_CTRL to direct_access_msrs.
---
 arch/x86/kvm/svm.c | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 254eefb..c6ab343 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -184,6 +184,9 @@ struct vcpu_svm {
u64 gs_base;
} host;
 
+   u64 spec_ctrl;
+   bool save_spec_ctrl_on_exit;
+
u32 *msrpm;
 
ulong nmi_iret_rip;
@@ -249,6 +252,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_CSTAR,   .always = true  },
{ .index = MSR_SYSCALL_MASK,.always = true  },
 #endif
+   { .index = MSR_IA32_SPEC_CTRL,  .always = false },
{ .index = MSR_IA32_PRED_CMD,   .always = false },
{ .index = MSR_IA32_LASTBRANCHFROMIP,   .always = false },
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
@@ -1584,6 +1588,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
u32 dummy;
u32 eax = 1;
 
+   svm->spec_ctrl = 0;
+
if (!init_event) {
svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
   MSR_IA32_APICBASE_ENABLE;
@@ -3605,6 +3611,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_VM_CR:
msr_info->data = svm->nested.vm_cr_msr;
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   msr_info->data = svm->spec_ctrl;
+   break;
case MSR_IA32_UCODE_REV:
msr_info->data = 0x0165;
break;
@@ -3696,6 +3709,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   /* The STIBP bit doesn't fault even if it's not advertised */
+   if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+   return 1;
+
+   svm->spec_ctrl = data;
+
+   /*
+* When it's written (to non-zero) for the first time, pass
+* it through. This means we don't have to take the perf
+* hit of saving it on vmexit for the common case of guests
+* that don't use it.
+*/
+   if (data && !svm->save_spec_ctrl_on_exit) {
+   svm->save_spec_ctrl_on_exit = true;
+   if (is_guest_mode(vcpu))
+   break;
+   set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 
1);
+   }
+   break;
case MSR_IA32_PRED_CMD:
if (!msr->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
@@ -4964,6 +5001,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
local_irq_enable();
 
+   /*
+* If this vCPU has touched SPEC_CTRL, restore the guest's value if
+* it's non-zero. Since vmentry is serialising on affected CPUs, there
+* is no need to worry about the conditional branch over the wrmsr
+* being speculatively taken.
+*/
+   if (svm->spec_ctrl)
+   wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
asm volatile (
"push %%" _ASM_BP "; \n\t"
"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
@@ -5056,6 +5102,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
);
 
+   /*
+* We do not use IBRS in the kernel. If this vCPU has used the
+* SPEC_CTRL MSR it may have left it on; save the value and
+* turn it off. This is much more efficient than blindly adding
+* it to the atomic save/restore list. Especiall

[PATCH v6 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX

2018-02-01 Thread KarimAllah Ahmed
[dwmw2: Stop using KF() for bits in it, too]
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Paolo Bonzini 
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/cpuid.c | 8 +++-
 arch/x86/kvm/cpuid.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW 2
-#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+/* For scattered features from cpufeatures.h; we currently expose none */
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);
entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+   cpuid_mask(&entry->edx, CPUID_7_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2cea66..9a327d5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
-- 
2.7.4



[PATCH] kvm: x86: Use X86_CR4_PAE instead of X86_CR4_PAE_BIT while validating sregs

2018-01-20 Thread KarimAllah Ahmed
Use the mask (X86_CR4_PAE) instead of the bit itself (X86_CR4_PAE_BIT) while
validating sregs.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index abd1723..6f452bc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7664,7 +7664,7 @@ int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct 
kvm_sregs *sregs)
 * 64-bit mode (though maybe in a 32-bit code segment).
 * CR4.PAE and EFER.LMA must be set.
 */
-   if (!(sregs->cr4 & X86_CR4_PAE_BIT)
+   if (!(sregs->cr4 & X86_CR4_PAE)
|| !(sregs->efer & EFER_LMA))
return -EINVAL;
} else {
-- 
2.7.4



Re: [PATCH] kvm: x86: Use X86_CR4_PAE instead of X86_CR4_PAE_BIT while validating sregs

2018-01-20 Thread KarimAllah Ahmed
Please ignore. I just noticed that a similar patch is already in Radim's 
tree and queued for linus.



On 01/20/2018 07:08 PM, KarimAllah Ahmed wrote:

Use the mask (X86_CR4_PAE) instead of the bit itself (X86_CR4_PAE_BIT) while
validating sregs.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
---
  arch/x86/kvm/x86.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index abd1723..6f452bc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7664,7 +7664,7 @@ int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct 
kvm_sregs *sregs)
 * 64-bit mode (though maybe in a 32-bit code segment).
 * CR4.PAE and EFER.LMA must be set.
 */
-   if (!(sregs->cr4 & X86_CR4_PAE_BIT)
+   if (!(sregs->cr4 & X86_CR4_PAE)
|| !(sregs->efer & EFER_LMA))
return -EINVAL;
} else {


Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


[RFC 01/10] x86/speculation: Add basic support for IBPB

2018-01-20 Thread KarimAllah Ahmed
From: Thomas Gleixner 

Expose indirect_branch_prediction_barrier() for use in subsequent patches.

[karahmed: remove the special-casing of skylake for using IBPB (wtf?),
   switch to using ALTERNATIVES instead of static_cpu_has]
[dwmw2:set up ax/cx/dx in the asm too so it gets NOP'd out]

Signed-off-by: Thomas Gleixner 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/nospec-branch.h | 16 
 arch/x86/kernel/cpu/bugs.c   |  7 +++
 3 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 624d978..8ec9588 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -207,6 +207,7 @@
 #define X86_FEATURE_RETPOLINE_AMD  ( 7*32+13) /* AMD Retpoline mitigation 
for Spectre variant 2 */
 #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory 
Number */
 
+#define X86_FEATURE_IBPB   ( 7*32+16) /* Using Indirect Branch 
Prediction Barrier */
 #define X86_FEATURE_AMD_PRED_CMD   ( 7*32+17) /* Prediction Command MSR 
(AMD) */
 #define X86_FEATURE_MBA( 7*32+18) /* Memory Bandwidth 
Allocation */
 #define X86_FEATURE_RSB_CTXSW  ( 7*32+19) /* Fill RSB on context 
switches */
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 4ad4108..c333c95 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -218,5 +218,21 @@ static inline void vmexit_fill_RSB(void)
 #endif
 }
 
+static inline void indirect_branch_prediction_barrier(void)
+{
+   unsigned long ax, cx, dx;
+
+   asm volatile(ALTERNATIVE("",
+"movl %[msr], %%ecx\n\t"
+"movl %[val], %%eax\n\t"
+"movl $0, %%edx\n\t"
+"wrmsr",
+X86_FEATURE_IBPB)
+: "=a" (ax), "=c" (cx), "=d" (dx)
+: [msr] "i" (MSR_IA32_PRED_CMD),
+  [val] "i" (PRED_CMD_IBPB)
+: "memory");
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __NOSPEC_BRANCH_H__ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc..96548ff 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -249,6 +249,13 @@ static void __init spectre_v2_select_mitigation(void)
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
pr_info("Filling RSB on context switch\n");
}
+
+   /* Initialize Indirect Branch Prediction Barrier if supported */
+   if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ||
+   boot_cpu_has(X86_FEATURE_AMD_PRED_CMD)) {
+   setup_force_cpu_cap(X86_FEATURE_IBPB);
+   pr_info("Enabling Indirect Branch Prediction Barrier\n");
+   }
 }
 
 #undef pr_fmt
-- 
2.7.4



[RFC 00/10] Speculation Control feature support

2018-01-20 Thread KarimAllah Ahmed
Start using the newly-added microcode features for speculation control on both
Intel and AMD CPUs to protect against Spectre v2.

This patch series covers interrupts, system calls, context switching between
processes, and context switching between VMs. It also exposes Indirect Branch
Prediction Barrier MSR, aka IBPB MSR, to KVM guests.

TODO:

- Introduce a microcode blacklist to disable the feature for broken microcodes.
- Restrict/Unrestrict the speculation (by toggling IBRS) around VMExit and
  VMEnter for KVM and expose IBRS to guests.

Ashok Raj (1):
  x86/kvm: Add IBPB support

David Woodhouse (1):
  x86/speculation: Add basic IBRS support infrastructure

KarimAllah Ahmed (1):
  x86: Simplify spectre_v2 command line parsing

Thomas Gleixner (4):
  x86/speculation: Add basic support for IBPB
  x86/speculation: Use Indirect Branch Prediction Barrier in context
switch
  x86/speculation: Add inlines to control Indirect Branch Speculation
  x86/idle: Control Indirect Branch Speculation in idle

Tim Chen (3):
  x86/mm: Only flush indirect branches when switching into non dumpable
process
  x86/enter: Create macros to restrict/unrestrict Indirect Branch
Speculation
  x86/enter: Use IBRS on syscall and interrupts

 Documentation/admin-guide/kernel-parameters.txt |   1 +
 arch/x86/entry/calling.h|  73 ++
 arch/x86/entry/entry_64.S   |  35 -
 arch/x86/entry/entry_64_compat.S|  21 ++-
 arch/x86/include/asm/cpufeatures.h  |   2 +
 arch/x86/include/asm/mwait.h|  14 ++
 arch/x86/include/asm/nospec-branch.h|  54 ++-
 arch/x86/kernel/cpu/bugs.c  | 183 +++-
 arch/x86/kernel/process.c   |  14 ++
 arch/x86/kvm/svm.c  |  14 ++
 arch/x86/kvm/vmx.c  |   4 +
 arch/x86/mm/tlb.c   |  21 ++-
 12 files changed, 359 insertions(+), 77 deletions(-)


Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Ashok Raj 
Cc: Asit Mallick 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Janakarajan Natarajan 
Cc: Joerg Roedel 
Cc: Jun Nakajima 
Cc: Laura Abbott 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Tim Chen 
Cc: Tom Lendacky 
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org

-- 
2.7.4



[RFC 03/10] x86/speculation: Use Indirect Branch Prediction Barrier in context switch

2018-01-20 Thread KarimAllah Ahmed
From: Thomas Gleixner 

[peterz: comment]

Signed-off-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: David Woodhouse 
---
 arch/x86/mm/tlb.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index a156195..304de7d 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-#include 
 
 /*
  * TLB flushing, formerly SMP-only
@@ -220,6 +221,13 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
u16 new_asid;
bool need_flush;
 
+   /*
+* Avoid user/user BTB poisoning by flushing the branch 
predictor
+* when switching between processes. This stops one process from
+* doing Spectre-v2 attacks on another.
+*/
+   indirect_branch_prediction_barrier();
+
if (IS_ENABLED(CONFIG_VMAP_STACK)) {
/*
 * If our current stack is in vmalloc space and isn't
-- 
2.7.4



[RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

2018-01-20 Thread KarimAllah Ahmed
From: Tim Chen 

Create macros to control Indirect Branch Speculation.

Name them so they reflect what they are actually doing.
The macros are used to restrict and unrestrict the indirect branch speculation.
They do not *disable* (or *enable*) indirect branch speculation. A trip back to
user-space after *restricting* speculation would still affect the BTB.

Quoting from a commit by Tim Chen:

"""
If IBRS is set, near returns and near indirect jumps/calls will not allow
their predicted target address to be controlled by code that executed in a
less privileged prediction mode *BEFORE* the IBRS mode was last written with
a value of 1 or on another logical processor so long as all Return Stack
Buffer (RSB) entries from the previous less privileged prediction mode are
overwritten.

Thus a near indirect jump/call/return may be affected by code in a less
privileged prediction mode that executed *AFTER* IBRS mode was last written
with a value of 1.
"""

[ tglx: Changed macro names and rewrote changelog ]
[ karahmed: changed macro names *again* and rewrote changelog ]

Signed-off-by: Tim Chen 
Signed-off-by: Thomas Gleixner 
Signed-off-by: KarimAllah Ahmed 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Peter Zijlstra 
Cc: Greg KH 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Paolo Bonzini 
Cc: Dan Williams 
Cc: Arjan Van De Ven 
Cc: Linus Torvalds 
Cc: David Woodhouse 
Cc: Ashok Raj 
Link: 
https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b45d788d61a.1515542293.git.tim.c.c...@linux.intel.com
Signed-off-by: David Woodhouse 
---
 arch/x86/entry/calling.h | 73 
 1 file changed, 73 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 3f48f69..5aafb51 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -6,6 +6,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
 
@@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel is 
built with
 .Lafter_call_\@:
 #endif
 .endm
+
+/*
+ * IBRS related macros
+ */
+.macro PUSH_MSR_REGS
+   pushq   %rax
+   pushq   %rcx
+   pushq   %rdx
+.endm
+
+.macro POP_MSR_REGS
+   popq%rdx
+   popq%rcx
+   popq%rax
+.endm
+
+.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req
+   movl\msr_nr, %ecx
+   movl\edx_val, %edx
+   movl\eax_val, %eax
+   wrmsr
+.endm
+
+.macro RESTRICT_IB_SPEC
+   ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+   PUSH_MSR_REGS
+   WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
+   POP_MSR_REGS
+.Lskip_\@:
+.endm
+
+.macro UNRESTRICT_IB_SPEC
+   ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+   PUSH_MSR_REGS
+   WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
+   POP_MSR_REGS
+.Lskip_\@:
+.endm
+
+.macro RESTRICT_IB_SPEC_CLOBBER
+   ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+   WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
+.Lskip_\@:
+.endm
+
+.macro UNRESTRICT_IB_SPEC_CLOBBER
+   ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+   WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
+.Lskip_\@:
+.endm
+
+.macro RESTRICT_IB_SPEC_SAVE_AND_CLOBBER save_reg:req
+   ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+   movl$MSR_IA32_SPEC_CTRL, %ecx
+   rdmsr
+   movl%eax, \save_reg
+   movl$0, %edx
+   movl$SPEC_CTRL_IBRS, %eax
+   wrmsr
+.Lskip_\@:
+.endm
+
+.macro RESTORE_IB_SPEC_CLOBBER save_reg:req
+   ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
+   /* Set IBRS to the value saved in the save_reg */
+   movl$MSR_IA32_SPEC_CTRL, %ecx
+   movl$0, %edx
+   movl\save_reg, %eax
+   wrmsr
+.Lskip_\@:
+.endm
-- 
2.7.4



[RFC 07/10] x86: Simplify spectre_v2 command line parsing

2018-01-20 Thread KarimAllah Ahmed
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/kernel/cpu/bugs.c | 106 +
 1 file changed, 58 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1d5e12f..349c7f4 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -99,13 +99,13 @@ static enum spectre_v2_mitigation spectre_v2_enabled = 
SPECTRE_V2_NONE;
 static void __init spec2_print_if_insecure(const char *reason)
 {
if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
-   pr_info("%s\n", reason);
+   pr_info("%s selected on command line.\n", reason);
 }
 
 static void __init spec2_print_if_secure(const char *reason)
 {
if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
-   pr_info("%s\n", reason);
+   pr_info("%s selected on command line.\n", reason);
 }
 
 static inline bool retp_compiler(void)
@@ -120,61 +120,71 @@ static inline bool match_option(const char *arg, int 
arglen, const char *opt)
return len == arglen && !strncmp(arg, opt, len);
 }
 
+static struct {
+   char *option;
+   enum spectre_v2_mitigation_cmd cmd;
+   bool secure;
+} mitigation_options[] = {
+   { "off",   SPECTRE_V2_CMD_NONE,  false },
+   { "on",SPECTRE_V2_CMD_FORCE, true },
+   { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false },
+   { "retpoline,amd", SPECTRE_V2_CMD_RETPOLINE_AMD, false },
+   { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false },
+   { "ibrs",  SPECTRE_V2_CMD_IBRS,  false },
+   { "auto",  SPECTRE_V2_CMD_AUTO,  false },
+};
+
+static const int mitigation_options_count = sizeof(mitigation_options) /
+   sizeof(mitigation_options[0]);
+
 static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
 {
char arg[20];
-   int ret;
+   int ret, i;
+   enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO;
+
+   if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
+   return SPECTRE_V2_CMD_NONE;
 
ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
  sizeof(arg));
-   if (ret > 0)  {
-   if (match_option(arg, ret, "off")) {
-   goto disable;
-   } else if (match_option(arg, ret, "on")) {
-   spec2_print_if_secure("force enabled on command line.");
-   return SPECTRE_V2_CMD_FORCE;
-   } else if (match_option(arg, ret, "retpoline")) {
-   if (!IS_ENABLED(CONFIG_RETPOLINE)) {
-   pr_err("retpoline selected but not compiled in. 
Switching to AUTO select\n");
-   return SPECTRE_V2_CMD_AUTO;
-   }
-   spec2_print_if_insecure("retpoline selected on command 
line.");
-   return SPECTRE_V2_CMD_RETPOLINE;
-   } else if (match_option(arg, ret, "retpoline,amd")) {
-   if (!IS_ENABLED(CONFIG_RETPOLINE)) {
-   pr_err("retpoline,amd selected but not compiled 
in. Switching to AUTO select\n");
-   return SPECTRE_V2_CMD_AUTO;
-   }
-   if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
-   pr_err("retpoline,amd selected but CPU is not 
AMD. Switching to AUTO select\n");
-   return SPECTRE_V2_CMD_AUTO;
-   }
-   spec2_print_if_insecure("AMD retpoline selected on 
command line.");
-   return SPECTRE_V2_CMD_RETPOLINE_AMD;
-   } else if (match_option(arg, ret, "retpoline,generic")) {
-   if (!IS_ENABLED(CONFIG_RETPOLINE)) {
-   pr_err("retpoline,generic selected but not 
compiled in. Switching to AUTO select\n");
-   return SPECTRE_V2_CMD_AUTO;
-   }
-   spec2_print_if_insecure("generic retpoline selected on 
command line.");
-   return SPECTRE_V2_CMD_RETPOLINE_GENERIC;
-   } else if (match_option(arg, ret, "ibrs")) {
-   if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
-   pr_err("IBRS selected but no CPU support. 
Switching to AUTO select\n");
-   return SPECTRE_V2_CMD_AUTO;
-   

[RFC 05/10] x86/speculation: Add basic IBRS support infrastructure

2018-01-20 Thread KarimAllah Ahmed
From: David Woodhouse 

Not functional yet; just add the handling for it in the Spectre v2
mitigation selection, and the X86_FEATURE_IBRS flag which will control
the code to be added in later patches.

Also take the #ifdef CONFIG_RETPOLINE from around the RSB-stuffing; IBRS
mode will want that too.

For now we are auto-selecting IBRS on Skylake. We will probably end up
changing that but for now let's default to the safest option.

XX: Do we want a microcode blacklist?

[karahmed: simplify the switch block and get rid of all the magic]

Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
---
 Documentation/admin-guide/kernel-parameters.txt |   1 +
 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/include/asm/nospec-branch.h|   2 -
 arch/x86/kernel/cpu/bugs.c  | 108 +++-
 4 files changed, 68 insertions(+), 44 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 8122b5f..e597650 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3932,6 +3932,7 @@
retpoline - replace indirect branches
retpoline,generic - google's original retpoline
retpoline,amd - AMD-specific minimal thunk
+   ibrs  - Intel: Indirect Branch Restricted 
Speculation
 
Not specifying this option is equivalent to
spectre_v2=auto.
diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 8ec9588..ae86ad9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
 #define X86_FEATURE_AMD_PRED_CMD   ( 7*32+17) /* Prediction Command MSR 
(AMD) */
 #define X86_FEATURE_MBA( 7*32+18) /* Memory Bandwidth 
Allocation */
 #define X86_FEATURE_RSB_CTXSW  ( 7*32+19) /* Fill RSB on context 
switches */
+#define X86_FEATURE_IBRS   ( 7*32+21) /* Use IBRS for Spectre v2 
safety */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index c333c95..8759449 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -205,7 +205,6 @@ extern char __indirect_thunk_end[];
  */
 static inline void vmexit_fill_RSB(void)
 {
-#ifdef CONFIG_RETPOLINE
unsigned long loops;
 
asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
@@ -215,7 +214,6 @@ static inline void vmexit_fill_RSB(void)
  "910:"
  : "=r" (loops), ASM_CALL_CONSTRAINT
  : : "memory" );
-#endif
 }
 
 static inline void indirect_branch_prediction_barrier(void)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 96548ff..1d5e12f 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -79,6 +79,7 @@ enum spectre_v2_mitigation_cmd {
SPECTRE_V2_CMD_RETPOLINE,
SPECTRE_V2_CMD_RETPOLINE_GENERIC,
SPECTRE_V2_CMD_RETPOLINE_AMD,
+   SPECTRE_V2_CMD_IBRS,
 };
 
 static const char *spectre_v2_strings[] = {
@@ -87,6 +88,7 @@ static const char *spectre_v2_strings[] = {
[SPECTRE_V2_RETPOLINE_MINIMAL_AMD]  = "Vulnerable: Minimal AMD ASM 
retpoline",
[SPECTRE_V2_RETPOLINE_GENERIC]  = "Mitigation: Full generic 
retpoline",
[SPECTRE_V2_RETPOLINE_AMD]  = "Mitigation: Full AMD 
retpoline",
+   [SPECTRE_V2_IBRS]   = "Mitigation: Indirect Branch 
Restricted Speculation",
 };
 
 #undef pr_fmt
@@ -132,9 +134,17 @@ static enum spectre_v2_mitigation_cmd __init 
spectre_v2_parse_cmdline(void)
spec2_print_if_secure("force enabled on command line.");
return SPECTRE_V2_CMD_FORCE;
} else if (match_option(arg, ret, "retpoline")) {
+   if (!IS_ENABLED(CONFIG_RETPOLINE)) {
+   pr_err("retpoline selected but not compiled in. 
Switching to AUTO select\n");
+   return SPECTRE_V2_CMD_AUTO;
+   }
spec2_print_if_insecure("retpoline selected on command 
line.");
return SPECTRE_V2_CMD_RETPOLINE;
} else if (match_option(arg, ret, "retpoline,amd")) {
+   if (!IS_ENABLED(CONFIG_RETPOLINE)) {
+   pr_err("retpoline,amd selected but not compiled 
in. Switching to AUTO select\n");
+   return SPECTRE_V2_CMD_AUTO

[RFC 04/10] x86/mm: Only flush indirect branches when switching into non dumpable process

2018-01-20 Thread KarimAllah Ahmed
From: Tim Chen 

Flush indirect branches when switching into a process that marked
itself non dumpable.  This protects high value processes like gpg
better, without having too high performance overhead.

Signed-off-by: Andi Kleen 
Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/mm/tlb.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 304de7d..f64e80c 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -225,8 +225,19 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 * Avoid user/user BTB poisoning by flushing the branch 
predictor
 * when switching between processes. This stops one process from
 * doing Spectre-v2 attacks on another.
+*
+* As an optimization: Flush indirect branches only when
+* switching into processes that disable dumping.
+*
+* This will not flush when switching into kernel threads.
+* But it would flush when switching into idle and back
+*
+* It might be useful to have a one-off cache here
+* to also not flush the idle case, but we would need some
+* kind of stable sequence number to remember the previous mm.
 */
-   indirect_branch_prediction_barrier();
+   if (tsk && tsk->mm && get_dumpable(tsk->mm) != SUID_DUMP_USER)
+   indirect_branch_prediction_barrier();
 
if (IS_ENABLED(CONFIG_VMAP_STACK)) {
/*
-- 
2.7.4



[RFC 10/10] x86/enter: Use IBRS on syscall and interrupts

2018-01-20 Thread KarimAllah Ahmed
From: Tim Chen 

Stop Indirect Branch Speculation on every user space to kernel space
transition and reenable it when returning to user space./

The NMI interrupt save/restore of IBRS state was based on Andrea
Arcangeli's implementation.  Here's an explanation by Dave Hansen on why we
save IBRS state for NMI.

The normal interrupt code uses the 'error_entry' path which uses the
Code Segment (CS) of the instruction that was interrupted to tell
whether it interrupted the kernel or userspace and thus has to switch
IBRS, or leave it alone.

The NMI code is different.  It uses 'paranoid_entry' because it can
interrupt the kernel while it is running with a userspace IBRS (and %GS
and CR3) value, but has a kernel CS.  If we used the same approach as
the normal interrupt code, we might do the following;

SYSENTER_entry
<-- NMI HERE
IBRS=1
do_something()
IBRS=0
SYSRET

The NMI code might notice that we are running in the kernel and decide
that it is OK to skip the IBRS=1.  This would leave it running
unprotected with IBRS=0, which is bad.

However, if we unconditionally set IBRS=1, in the NMI, we might get the
following case:

SYSENTER_entry
IBRS=1
do_something()
IBRS=0
<-- NMI HERE (set IBRS=1)
SYSRET

and we would return to userspace with IBRS=1.  Userspace would run
slowly until we entered and exited the kernel again.

Instead of those two approaches, we chose a third one where we simply
save the IBRS value in a scratch register (%r13) and then restore that
value, verbatim.

[karahmed use the new SPEC_CTRL_IBRS defines]

Co-developed-by: Andrea Arcangeli 
Signed-off-by: Andrea Arcangeli 
Signed-off-by: Tim Chen 
Signed-off-by: Thomas Gleixner 
Signed-off-by: KarimAllah Ahmed 
Cc: Andi Kleen 
Cc: Peter Zijlstra 
Cc: Greg KH 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Paolo Bonzini 
Cc: Dan Williams 
Cc: Arjan Van De Ven 
Cc: Linus Torvalds 
Cc: David Woodhouse 
Cc: Ashok Raj 
Link: 
https://lkml.kernel.org/r/d5e4c03ec290c61dfbe5a769f7287817283fa6b7.1515542293.git.tim.c.c...@linux.intel.com
---
 arch/x86/entry/entry_64.S| 35 ++-
 arch/x86/entry/entry_64_compat.S | 21 +++--
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 63f4320..b3d90cf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -171,6 +171,8 @@ ENTRY(entry_SYSCALL_64_trampoline)
 
/* Load the top of the task stack into RSP */
movqCPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp
+   /* Restrict indirect branch speculation */
+   RESTRICT_IB_SPEC
 
/* Start building the simulated IRET frame. */
pushq   $__USER_DS  /* pt_regs->ss */
@@ -214,6 +216,8 @@ ENTRY(entry_SYSCALL_64)
 */
movq%rsp, PER_CPU_VAR(rsp_scratch)
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
+   /* Restrict Indirect Branch Speculation */
+   RESTRICT_IB_SPEC
 
TRACE_IRQS_OFF
 
@@ -409,6 +413,8 @@ syscall_return_via_sysret:
pushq   RSP-RDI(%rdi)   /* RSP */
pushq   (%rdi)  /* RDI */
 
+   /* Unrestrict Indirect Branch Speculation */
+   UNRESTRICT_IB_SPEC
/*
 * We are on the trampoline stack.  All regs except RDI are live.
 * We can do future final exit work right here.
@@ -757,11 +763,12 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
/* Push user RDI on the trampoline stack. */
pushq   (%rdi)
 
+   /* Unrestrict Indirect Branch Speculation */
+   UNRESTRICT_IB_SPEC
/*
 * We are on the trampoline stack.  All regs except RDI are live.
 * We can do future final exit work right here.
 */
-
SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
 
/* Restore RDI. */
@@ -849,6 +856,13 @@ native_irq_return_ldt:
SWAPGS  /* to kernel GS */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi   /* to kernel CR3 */
 
+   /*
+* There is no point in disabling Indirect Branch Speculation
+* here as this is going to return to user space immediately
+* after fixing ESPFIX stack.  There is no vulnerable code
+* to protect so spare two MSR writes.
+*/
+
movqPER_CPU_VAR(espfix_waddr), %rdi
movq%rax, (0*8)(%rdi)   /* user RAX */
movq(1*8)(%rsp), %rax   /* user RIP */
@@ -982,6 +996,8 @@ ENTRY(switch_to_thread_stack)
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
movq%rsp, %rdi
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
+   /* Restrict Indirect Branch Speculation */
+   RESTRICT_IB_SPEC
UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI
 
pushq   7*8(%rdi)   /* regs->ss */

[RFC 08/10] x86/idle: Control Indirect Branch Speculation in idle

2018-01-20 Thread KarimAllah Ahmed
From: Thomas Gleixner 

Indirect Branch Speculation (IBS) is controlled per physical core. If one
thread disables it then it's disabled for the core. If a thread enters idle
it makes sense to reenable IBS so the sibling thread can run with full
speculation enabled in user space.

This makes only sense in mwait_idle_with_hints() because mwait_idle() can
serve an interrupt immediately before speculation can be stopped again. SKL
which requires IBRS should use mwait_idle_with_hints() so this is a non
issue and in the worst case a missed optimization.

Originally-by: Tim Chen 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/mwait.h | 14 ++
 arch/x86/kernel/process.c| 14 ++
 2 files changed, 28 insertions(+)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 39a2fb2..f173072 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -6,6 +6,7 @@
 #include 
 
 #include 
+#include 
 
 #define MWAIT_SUBSTATE_MASK0xf
 #define MWAIT_CSTATE_MASK  0xf
@@ -106,7 +107,20 @@ static inline void mwait_idle_with_hints(unsigned long 
eax, unsigned long ecx)
mb();
}
 
+   /*
+* Indirect Branch Speculation (IBS) is controlled per
+* physical core. If one thread disables it, then it's
+* disabled on all threads of the core. The kernel disables
+* it on entry from user space. Reenable it on the thread
+* which goes idle so the other thread has a chance to run
+* with full speculation enabled in userspace.
+*/
+   unrestrict_branch_speculation();
__monitor((void *)¤t_thread_info()->flags, 0, 0);
+   /*
+* Restrict IBS again to protect kernel execution.
+*/
+   restrict_branch_speculation();
if (!need_resched())
__mwait(eax, ecx);
}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3cb2486..f941c5d 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -461,6 +461,20 @@ static __cpuidle void mwait_idle(void)
mb(); /* quirk */
}
 
+   /*
+* Indirect Branch Speculation (IBS) is controlled per
+* physical core. If one thread disables it, then it's
+* disabled on all threads of the core. The kernel disables
+* it on entry from user space. For __sti_mwait() it's
+* wrong to reenable it because an interrupt can be served
+* before speculation can be stopped again.
+*
+* To plug that hole the interrupt entry code would need to
+* save current state and restore. Not worth the trouble as
+* SKL should not use mwait_idle(). It should use
+* mwait_idle_with_hints() which can do speculation control
+* safely.
+*/
__monitor((void *)¤t_thread_info()->flags, 0, 0);
if (!need_resched())
__sti_mwait(0, 0);
-- 
2.7.4



[RFC 02/10] x86/kvm: Add IBPB support

2018-01-20 Thread KarimAllah Ahmed
From: Ashok Raj 

Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor
barriers on switching between VMs to avoid inter VM specte-v2 attacks.

[peterz: rebase and changelog rewrite]
[dwmw2: fixes]
[karahmed: - vmx: expose PRED_CMD whenever it is available
   - svm: only pass through IBPB if it is available]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: David Woodhouse 
Cc: Paolo Bonzini 
Signed-off-by: Ashok Raj 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com

Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/kvm/svm.c | 14 ++
 arch/x86/kvm/vmx.c |  4 
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2744b973..cfdb9ab 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -529,6 +529,7 @@ struct svm_cpu_data {
struct kvm_ldttss_desc *tss_desc;
 
struct page *save_area;
+   struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -918,6 +919,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
 
set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
}
+
+   if (boot_cpu_has(X86_FEATURE_AMD_PRED_CMD))
+   set_msr_interception(msrpm, MSR_IA32_PRED_CMD, 1, 1);
 }
 
 static void add_msr_offset(u32 offset)
@@ -1706,11 +1710,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
+   /*
+* The vmcb page can be recycled, causing a false negative in
+* svm_vcpu_load(). So do a full IBPB now.
+*/
+   indirect_branch_prediction_barrier();
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
int i;
 
if (unlikely(cpu != vcpu->cpu)) {
@@ -1739,6 +1749,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (static_cpu_has(X86_FEATURE_RDTSCP))
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
+   if (sd->current_vmcb != svm->vmcb) {
+   sd->current_vmcb = svm->vmcb;
+   indirect_branch_prediction_barrier();
+   }
avic_vcpu_load(vcpu, cpu);
 }
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d1e25db..3b64de2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2279,6 +2279,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
vmcs_load(vmx->loaded_vmcs->vmcs);
+   indirect_branch_prediction_barrier();
}
 
if (!already_loaded) {
@@ -6791,6 +6792,9 @@ static __init int hardware_setup(void)
kvm_tsc_scaling_ratio_frac_bits = 48;
}
 
+   if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+   vmx_disable_intercept_for_msr(MSR_IA32_PRED_CMD, false);
+
vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
-- 
2.7.4



[RFC 06/10] x86/speculation: Add inlines to control Indirect Branch Speculation

2018-01-20 Thread KarimAllah Ahmed
From: Thomas Gleixner 

XX: I am utterly unconvinced that having "friendly, self-explanatory"
names for the IBRS-frobbing inlines is useful. There be dragons
here for anyone who isn't intimately familiar with what's going
on, and it's almost better to just call it IBRS, put a reference
to the spec, and have a clear "you must be →this← tall to ride."

[karahmed: switch to using ALTERNATIVES instead of static_cpu_has]
[dwmw2: wrmsr args inside the ALTERNATIVE again, bikeshed naming]

Signed-off-by: Thomas Gleixner 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/nospec-branch.h | 36 
 1 file changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 8759449..5be3443 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -232,5 +232,41 @@ static inline void indirect_branch_prediction_barrier(void)
 : "memory");
 }
 
+/*
+ * This also performs a barrier, and setting it again when it was already
+ * set is NOT a no-op.
+ */
+static inline void restrict_branch_speculation(void)
+{
+   unsigned long ax, cx, dx;
+
+   asm volatile(ALTERNATIVE("",
+"movl %[msr], %%ecx\n\t"
+"movl %[val], %%eax\n\t"
+"movl $0, %%edx\n\t"
+"wrmsr",
+X86_FEATURE_IBRS)
+: "=a" (ax), "=c" (cx), "=d" (dx)
+: [msr] "i" (MSR_IA32_SPEC_CTRL),
+  [val] "i" (SPEC_CTRL_IBRS)
+: "memory");
+}
+
+static inline void unrestrict_branch_speculation(void)
+{
+   unsigned long ax, cx, dx;
+
+   asm volatile(ALTERNATIVE("",
+"movl %[msr], %%ecx\n\t"
+"movl %[val], %%eax\n\t"
+"movl $0, %%edx\n\t"
+"wrmsr",
+X86_FEATURE_IBRS)
+: "=a" (ax), "=c" (cx), "=d" (dx)
+: [msr] "i" (MSR_IA32_SPEC_CTRL),
+  [val] "i" (0)
+: "memory");
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __NOSPEC_BRANCH_H__ */
-- 
2.7.4



Re: [RFC 10/10] x86/enter: Use IBRS on syscall and interrupts

2018-01-21 Thread KarimAllah Ahmed

On 01/21/2018 02:50 PM, Konrad Rzeszutek Wilk wrote:


On Sat, Jan 20, 2018 at 08:23:01PM +0100, KarimAllah Ahmed wrote:

From: Tim Chen 

Stop Indirect Branch Speculation on every user space to kernel space
transition and reenable it when returning to user space./

How about interrupts?

That is should .macro interrupt have the same treatment?


RESTRICT_IB_SPEC is called in switch_to_thread_stack which is almost the 
first thing called from ".macro interrupt".






Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH v2 5/8] x86/speculation: Add basic support for IBPB

2018-01-21 Thread KarimAllah Ahmed

On 01/21/2018 07:06 PM, Borislav Petkov wrote:


On Sun, Jan 21, 2018 at 09:49:06AM +, David Woodhouse wrote:

From: Thomas Gleixner 

Expose indirect_branch_prediction_barrier() for use in subsequent patches.

[karahmed: remove the special-casing of skylake for using IBPB (wtf?),
switch to using ALTERNATIVES instead of static_cpu_has]
[dwmw2:set up ax/cx/dx in the asm too so it gets NOP'd out]

Signed-off-by: Thomas Gleixner 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
  arch/x86/include/asm/cpufeatures.h   |  1 +
  arch/x86/include/asm/nospec-branch.h | 16 
  arch/x86/kernel/cpu/bugs.c   |  7 +++
  3 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 8c9e5c0..cf28399 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -207,6 +207,7 @@
  #define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* AMD Retpoline mitigation 
for Spectre variant 2 */
  #define X86_FEATURE_INTEL_PPIN( 7*32+14) /* Intel Processor 
Inventory Number */
  
+#define X86_FEATURE_IBPB		( 7*32+16) /* Using Indirect Branch Prediction Barrier */

Right, and as AMD has a separate bit for this in CPUID_8008_EBX[12],
we probably don't really need the synthetic bit here but simply use the
one at (13*32+12) - word 13.


  #define X86_FEATURE_AMD_PRED_CMD  ( 7*32+17) /* Prediction Command MSR 
(AMD) */
  #define X86_FEATURE_MBA   ( 7*32+18) /* Memory Bandwidth 
Allocation */
  #define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context 
switches */
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 4ad4108..c333c95 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -218,5 +218,21 @@ static inline void vmexit_fill_RSB(void)
  #endif
  }
  
+static inline void indirect_branch_prediction_barrier(void)

I like ibp_barrier() better.


+{
+   unsigned long ax, cx, dx;
+
+   asm volatile(ALTERNATIVE("",
+"movl %[msr], %%ecx\n\t"
+"movl %[val], %%eax\n\t"
+"movl $0, %%edx\n\t"
+"wrmsr",
+X86_FEATURE_IBPB)
+: "=a" (ax), "=c" (cx), "=d" (dx)
+: [msr] "i" (MSR_IA32_PRED_CMD),
+  [val] "i" (PRED_CMD_IBPB)
+: "memory");
+}

Btw, we can simplify this a bit by dropping the inputs and marking the 3
GPRs as clobbered:

 alternative_input("",
   "mov $0x49, %%ecx\n\t"
   "mov $1, %%eax\n\t"
   "xor %%edx, %%edx\n\t"
   "wrmsr\n\t",
   X86_FEATURE_IBPB,
   ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));


The "memory" clobber is probably not really needed but it wouldn't
hurt...

Also, above says:


switch to using ALTERNATIVES instead of static_cpu_has]

Why?

if (static_cpu_has(X86_FEATURE_IBPB))
wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);

It can't get any more readable than this. Why even f*ck with
alternatives?


Because static_cpu_has is an indirect branch which will cause 
speculation and

we have to avoid that.

David told me that Peter was working on a fix for static_cpu_has to 
avoid the

speculation but I do not know what is the status of this.




+
  #endif /* __ASSEMBLY__ */
  #endif /* __NOSPEC_BRANCH_H__ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc..96548ff 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -249,6 +249,13 @@ static void __init spectre_v2_select_mitigation(void)
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
pr_info("Filling RSB on context switch\n");
}
+
+   /* Initialize Indirect Branch Prediction Barrier if supported */
+   if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ||
+   boot_cpu_has(X86_FEATURE_AMD_PRED_CMD)) {
+   setup_force_cpu_cap(X86_FEATURE_IBPB);
+   pr_info("Enabling Indirect Branch Prediction Barrier\n");

We don't really need the pr_info as "ibpb" will appear in /proc/cpuinfo.



Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [net-next v2] ipv6: sr: export some functions of seg6local

2018-01-06 Thread Ahmed Abdelsalam
On Thu, 04 Jan 2018 13:37:33 -0500 (EST)
David Miller  wrote:

> From: Ahmed Abdelsalam 
> Date: Sat, 30 Dec 2017 00:08:32 +0100
> 
> > Some functions of seg6local are very useful to process SRv6
> > encapsulated packets
> > 
> > This patch exports some functions of seg6local that are useful and
> > can be re-used at different parts of the kernel.
> > 
> > The set of exported functions are:
> > (1) seg6_get_srh()
> > (2) seg6_advance_nextseg()
> > (3) seg6_lookup_nexthop
> > 
> > Signed-off-by: Ahmed Abdelsalam 
> 
> There is no way I am applying this as-is.
> 
> Until you can submit this alongside an in-tree user of these symbols,
> these symbol exports are not going to happen.
> 
> Thank you. 
I will submit the other patches once I'm done with the testing. 
Thanks 
-- 
Ahmed


Re: [net-next] netfilter: add segment routing header 'srh' match

2018-01-07 Thread Ahmed Abdelsalam
On Sun, 7 Jan 2018 00:40:03 +0100
Pablo Neira Ayuso  wrote:

> Hi Ahmed,
> 
> On Fri, Dec 29, 2017 at 12:07:52PM +0100, Ahmed Abdelsalam wrote:
> > It allows matching packets based on Segment Routing Header
> > (SRH) information.
> > The implementation considers revision 7 of the SRH draft.
> > https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07
> > 
> > Currently supported match options include:
> > (1) Next Header
> > (2) Hdr Ext Len
> > (3) Segments Left
> > (4) Last Entry
> > (5) Tag value of SRH
> > 
> > Signed-off-by: Ahmed Abdelsalam 
> > ---
> >  include/uapi/linux/netfilter_ipv6/ip6t_srh.h |  63 ++
> >  net/ipv6/netfilter/Kconfig   |   9 ++
> >  net/ipv6/netfilter/Makefile  |   1 +
> >  net/ipv6/netfilter/ip6t_srh.c| 165 
> > +++
> >  4 files changed, 238 insertions(+)
> >  create mode 100644 include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> >  create mode 100644 net/ipv6/netfilter/ip6t_srh.c
> > 
> > diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h 
> > b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> > new file mode 100644
> > index 000..1b5dbd8
> > --- /dev/null
> > +++ b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> > @@ -0,0 +1,63 @@
> > +/**
> > + * Definitions for Segment Routing Header 'srh' match
> > + *
> > + * Author:
> > + *   Ahmed Abdelsalam   
> > + */
> 
> Please, add this in SPDX format instead.
> 
> See include/uapi/linux/netfilter/xt_owner.h for instance.
> 
Ok
> > +#ifndef _IP6T_SRH_H
> > +#define _IP6T_SRH_H
> > +
> > +#include 
> > +#include 
> > +
> > +/* Values for "mt_flags" field in struct ip6t_srh */
> > +#define IP6T_SRH_NEXTHDR0x0001
> > +#define IP6T_SRH_LEN_EQ 0x0002
> > +#define IP6T_SRH_LEN_GT 0x0004
> > +#define IP6T_SRH_LEN_LT 0x0008
> > +#define IP6T_SRH_SEGS_EQ0x0010
> > +#define IP6T_SRH_SEGS_GT0x0020
> > +#define IP6T_SRH_SEGS_LT0x0040
> > +#define IP6T_SRH_LAST_EQ0x0080
> > +#define IP6T_SRH_LAST_GT0x0100
> > +#define IP6T_SRH_LAST_LT0x0200
> > +#define IP6T_SRH_TAG0x0400
> > +#define IP6T_SRH_MASK   0x07FF
> > +
> > +/* Values for "mt_invflags" field in struct ip6t_srh */
> > +#define IP6T_SRH_INV_NEXTHDR0x0001
> > +#define IP6T_SRH_INV_LEN_EQ 0x0002
> > +#define IP6T_SRH_INV_LEN_GT 0x0004
> > +#define IP6T_SRH_INV_LEN_LT 0x0008
> > +#define IP6T_SRH_INV_SEGS_EQ0x0010
> > +#define IP6T_SRH_INV_SEGS_GT0x0020
> > +#define IP6T_SRH_INV_SEGS_LT0x0040
> > +#define IP6T_SRH_INV_LAST_EQ0x0080
> > +#define IP6T_SRH_INV_LAST_GT0x0100
> > +#define IP6T_SRH_INV_LAST_LT0x0200
> > +#define IP6T_SRH_INV_TAG0x0400
> > +#define IP6T_SRH_INV_MASK   0x07FF
> 
> Looking at all these EQ, GT, LT... I think this should be very easy to
> implement in nf_tables with no kernel changes.
> 
> You only need to add the protocol definition to:
> 
> nftables/src/exthdr.c
> 
> Would you have a look into this? This would be very much appreciated
> to we keep nftables in sync with what we have in iptables.
Yes, I look into it. I will send you a patch for nf_tables as well. 
> 
> > +
> > +/**
> > + *  struct ip6t_srh - SRH match options
> > + *  @ next_hdr: Next header field of SRH
> > + *  @ hdr_len: Extension header length field of SRH
> > + *  @ segs_left: Segments left field of SRH
> > + *  @ last_entry: Last entry field of SRH
> > + *  @ tag: Tag field of SRH
> > + *  @ mt_flags: match options
> > + *  @ mt_invflags: Invert the sense of match options
> > + */
> > +
> > +struct ip6t_srh {
> > +   __u8next_hdr;
> > +   __u8hdr_len;
> > +   __u8segs_left;
> > +   __u8last_entry;
> > +   __u16   tag;
> > +   __u16   mt_flags;
> > +   __u16   mt_invflags;
> > +};
> > +
> > +#endif /*_IP6T_SRH_H*/
> > diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
> > index 6acb2ee..e1818eb 100644
> > --- a/net/ipv6/netfilter/Kconfig
> > +++ b/net/ipv6/netfilter/Kconfig
> > @@ -232,6 +232,15 @@ config IP6_NF_MATCH_RT
> >  
> >   To compile it as a module, choose M here.  If unsure, 

[net-next v2] netfilter: add segment routing header 'srh' match

2018-01-07 Thread Ahmed Abdelsalam
It allows matching packets based on Segment Routing Header
(SRH) information.
The implementation considers revision 7 of the SRH draft.
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07

Currently supported match options include:
(1) Next Header
(2) Hdr Ext Len
(3) Segments Left
(4) Last Entry
(5) Tag value of SRH

Signed-off-by: Ahmed Abdelsalam 
---
 include/uapi/linux/netfilter_ipv6/ip6t_srh.h |  57 ++
 net/ipv6/netfilter/Kconfig   |   9 ++
 net/ipv6/netfilter/Makefile  |   1 +
 net/ipv6/netfilter/ip6t_srh.c| 161 +++
 4 files changed, 228 insertions(+)
 create mode 100644 include/uapi/linux/netfilter_ipv6/ip6t_srh.h
 create mode 100644 net/ipv6/netfilter/ip6t_srh.c

diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h 
b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
new file mode 100644
index 000..cebf4e8
--- /dev/null
+++ b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _IP6T_SRH_H
+#define _IP6T_SRH_H
+
+#include 
+#include 
+
+/* Values for "mt_flags" field in struct ip6t_srh */
+#define IP6T_SRH_NEXTHDR0x0001
+#define IP6T_SRH_LEN_EQ 0x0002
+#define IP6T_SRH_LEN_GT 0x0004
+#define IP6T_SRH_LEN_LT 0x0008
+#define IP6T_SRH_SEGS_EQ0x0010
+#define IP6T_SRH_SEGS_GT0x0020
+#define IP6T_SRH_SEGS_LT0x0040
+#define IP6T_SRH_LAST_EQ0x0080
+#define IP6T_SRH_LAST_GT0x0100
+#define IP6T_SRH_LAST_LT0x0200
+#define IP6T_SRH_TAG0x0400
+#define IP6T_SRH_MASK   0x07FF
+
+/* Values for "mt_invflags" field in struct ip6t_srh */
+#define IP6T_SRH_INV_NEXTHDR0x0001
+#define IP6T_SRH_INV_LEN_EQ 0x0002
+#define IP6T_SRH_INV_LEN_GT 0x0004
+#define IP6T_SRH_INV_LEN_LT 0x0008
+#define IP6T_SRH_INV_SEGS_EQ0x0010
+#define IP6T_SRH_INV_SEGS_GT0x0020
+#define IP6T_SRH_INV_SEGS_LT0x0040
+#define IP6T_SRH_INV_LAST_EQ0x0080
+#define IP6T_SRH_INV_LAST_GT0x0100
+#define IP6T_SRH_INV_LAST_LT0x0200
+#define IP6T_SRH_INV_TAG0x0400
+#define IP6T_SRH_INV_MASK   0x07FF
+
+/**
+ *  struct ip6t_srh - SRH match options
+ *  @ next_hdr: Next header field of SRH
+ *  @ hdr_len: Extension header length field of SRH
+ *  @ segs_left: Segments left field of SRH
+ *  @ last_entry: Last entry field of SRH
+ *  @ tag: Tag field of SRH
+ *  @ mt_flags: match options
+ *  @ mt_invflags: Invert the sense of match options
+ */
+
+struct ip6t_srh {
+   __u8next_hdr;
+   __u8hdr_len;
+   __u8segs_left;
+   __u8last_entry;
+   __u16   tag;
+   __u16   mt_flags;
+   __u16   mt_invflags;
+};
+
+#endif /*_IP6T_SRH_H*/
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 6acb2ee..e1818eb 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -232,6 +232,15 @@ config IP6_NF_MATCH_RT
 
  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP6_NF_MATCH_SRH
+tristate '"srh" Segment Routing header match support'
+depends on NETFILTER_ADVANCED
+help
+  srh matching allows you to match packets based on the segment
+ routing header of the packet.
+
+  To compile it as a module, choose M here.  If unsure, say N.
+
 # The targets
 config IP6_NF_TARGET_HL
tristate '"HL" hoplimit target support'
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index c6ee0cd..e0d51a9 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -54,6 +54,7 @@ obj-$(CONFIG_IP6_NF_MATCH_MH) += ip6t_mh.o
 obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_hbh.o
 obj-$(CONFIG_IP6_NF_MATCH_RPFILTER) += ip6t_rpfilter.o
 obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o
+obj-$(CONFIG_IP6_NF_MATCH_SRH) += ip6t_srh.o
 
 # targets
 obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
diff --git a/net/ipv6/netfilter/ip6t_srh.c b/net/ipv6/netfilter/ip6t_srh.c
new file mode 100644
index 000..9642164
--- /dev/null
+++ b/net/ipv6/netfilter/ip6t_srh.c
@@ -0,0 +1,161 @@
+/* Kernel module to match Segment Routing Header (SRH) parameters. */
+
+/* Author:
+ * Ahmed Abdelsalam 
+ *
+ *  This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+/* Test a struct->mt_invflags and a boolean for inequali

[PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-28 Thread KarimAllah Ahmed
Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests
that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a
retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL
for guests that do not actually use the MSR, only add_atomic_switch_msr when a
non-zero is written to it.

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: Ashok Raj 
---
 arch/x86/kvm/cpuid.c |  4 +++-
 arch/x86/kvm/cpuid.h |  1 +
 arch/x86/kvm/vmx.c   | 63 
 3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..dc78095 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
 /* These are scattered features in cpufeatures.h. */
 #define KVM_CPUID_BIT_AVX512_4VNNIW 2
 #define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KVM_CPUID_BIT_SPEC_CTRL 26
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \
+   (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ? KF(SPEC_CTRL) : 0);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index cdc70a3..dcfe227 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aa8638a..1b743a0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -920,6 +920,9 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked);
 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long 
*msr_bitmap,
+ u32 msr, int type);
+
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -2007,6 +2010,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, 
unsigned msr,
m->host[i].value = host_val;
 }
 
+/* do not touch guest_val and host_val if the msr is not found */
+static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
+ u64 *guest_val, u64 *host_val)
+{
+   unsigned i;
+   struct msr_autoload *m = &vmx->msr_autoload;
+
+   for (i = 0; i < m->nr; ++i)
+   if (m->guest[i].index == msr)
+   break;
+
+   if (i == m->nr)
+   return 1;
+
+   if (guest_val)
+   *guest_val = m->guest[i].value;
+   if (host_val)
+   *host_val = m->host[i].value;
+
+   return 0;
+}
+
 static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 {
u64 guest_efer = vmx->vcpu.arch.efer;
@@ -3203,7 +3228,9 @@ static inline bool vmx_feature_control_msr_valid(struct 
kvm_vcpu *vcpu,
  */
 static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+   u64 spec_ctrl = 0;
struct shared_msr_entry *msr;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
 
switch (msr_info->index) {
 #ifdef CONFIG_X86_64
@@ -3223,6 +3250,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+   return 1;
+
+   /*
+* If the MSR is not in the atomic list yet, then it was never
+* written to. So the MSR value will be '0'.
+*/
+   read_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, &spec_ctrl, 
NULL);
+
+   msr_info->data = sp

[PATCH v2 2/4] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-28 Thread KarimAllah Ahmed
Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the
MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only
add_atomic_switch_msr when a non-zero is written to it.

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 

---
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
  when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
---
 arch/x86/kvm/cpuid.c |  4 +++-
 arch/x86/kvm/vmx.c   | 65 
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..32c0c14 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
 /* These are scattered features in cpufeatures.h. */
 #define KVM_CPUID_BIT_AVX512_4VNNIW 2
 #define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KVM_CPUID_BIT_IBRS  26
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \
+   (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aa8638a..dac564d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -920,6 +920,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked);
 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long 
*msr_bitmap,
+ u32 msr, int type);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -2007,6 +2009,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, 
unsigned msr,
m->host[i].value = host_val;
 }
 
+/* do not touch guest_val and host_val if the msr is not found */
+static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
+ u64 *guest_val, u64 *host_val)
+{
+   unsigned i;
+   struct msr_autoload *m = &vmx->msr_autoload;
+
+   for (i = 0; i < m->nr; ++i)
+   if (m->guest[i].index == msr)
+   break;
+
+   if (i == m->nr)
+   return 1;
+
+   if (guest_val)
+   *guest_val = m->guest[i].value;
+   if (host_val)
+   *host_val = m->host[i].value;
+
+   return 0;
+}
+
 static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 {
u64 guest_efer = vmx->vcpu.arch.efer;
@@ -3203,7 +3227,9 @@ static inline bool vmx_feature_control_msr_valid(struct 
kvm_vcpu *vcpu,
  */
 static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+   u64 spec_ctrl = 0;
struct shared_msr_entry *msr;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
 
switch (msr_info->index) {
 #ifdef CONFIG_X86_64
@@ -3223,6 +3249,20 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+   return 1;
+
+   /*
+* If the MSR is not in the atomic list yet, then the guest
+* never wrote a non-zero value to it yet i.e. the MSR value is
+* '0'.
+*/
+   read_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, &spec_ctrl, 
NULL);
+
+   msr_info->data = spec_ctrl;
+   break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
bre

[PATCH v2 1/4] x86: kvm: Update the reverse_cpuid list to include CPUID_7_EDX

2018-01-28 Thread KarimAllah Ahmed
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/kvm/cpuid.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index cdc70a3..dcfe227 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
-- 
2.7.4



[PATCH v2 4/4] x86: vmx: Allow direct access to MSR_IA32_ARCH_CAPABILITIES

2018-01-28 Thread KarimAllah Ahmed
Add direct access to MSR_IA32_SPEC_CTRL for guests. Future intel processors
will use this MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1).

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/kvm/cpuid.c | 4 +++-
 arch/x86/kvm/vmx.c   | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 32c0c14..2339b1a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -71,6 +71,7 @@ u64 kvm_supported_xcr0(void)
 #define KVM_CPUID_BIT_AVX512_4VNNIW 2
 #define KVM_CPUID_BIT_AVX512_4FMAPS 3
 #define KVM_CPUID_BIT_IBRS  26
+#define KVM_CPUID_BIT_ARCH_CAPABILITIES 29
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -394,7 +395,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \
-   (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0);
+   (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0) | \
+   (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) ? 
KF(ARCH_CAPABILITIES) : 0);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f82a44c..99cb761 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9617,6 +9617,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
 
if (boot_cpu_has(X86_FEATURE_IBPB))
vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PRED_CMD, 
MSR_TYPE_RW);
+   if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+   vmx_disable_intercept_for_msr(msr_bitmap, 
MSR_IA32_ARCH_CAPABILITIES, MSR_TYPE_R);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, 
MSR_TYPE_RW);
-- 
2.7.4



[PATCH v2 0/4] KVM: Expose speculation control feature to guests

2018-01-28 Thread KarimAllah Ahmed
Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future
Intel processors to indicate RDCL_NO and IBRS_ALL.

Ashok Raj (1):
  x86/kvm: Add IBPB support

KarimAllah Ahmed (3):
  x86: kvm: Update the reverse_cpuid list to include CPUID_7_EDX
  x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL
  x86: vmx: Allow direct access to MSR_IA32_ARCH_CAPABILITIES

 arch/x86/kvm/cpuid.c |  6 -
 arch/x86/kvm/cpuid.h |  1 +
 arch/x86/kvm/svm.c   | 14 +++
 arch/x86/kvm/vmx.c   | 71 
 arch/x86/kvm/x86.c   |  1 +
 5 files changed, 92 insertions(+), 1 deletion(-)

Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Ashok Raj 
Cc: Asit Mallick 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Janakarajan Natarajan 
Cc: Joerg Roedel 
Cc: Jun Nakajima 
Cc: Laura Abbott 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Tim Chen 
Cc: Tom Lendacky 
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org

-- 
2.7.4



[PATCH v2 3/4] x86/kvm: Add IBPB support

2018-01-28 Thread KarimAllah Ahmed
From: Ashok Raj 

Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor
barriers on switching between VMs to avoid inter VM Spectre-v2 attacks.

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
   - vmx: expose PRED_CMD whenever it is available
   - svm: only pass through IBPB if it is available]
Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Signed-off-by: Ashok Raj 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com
Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/kvm/svm.c | 14 ++
 arch/x86/kvm/vmx.c |  4 
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2744b973..c886e46 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -529,6 +529,7 @@ struct svm_cpu_data {
struct kvm_ldttss_desc *tss_desc;
 
struct page *save_area;
+   struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -918,6 +919,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
 
set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
}
+
+   if (boot_cpu_has(X86_FEATURE_IBPB))
+   set_msr_interception(msrpm, MSR_IA32_PRED_CMD, 1, 1);
 }
 
 static void add_msr_offset(u32 offset)
@@ -1706,11 +1710,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
+   /*
+* The vmcb page can be recycled, causing a false negative in
+* svm_vcpu_load(). So do a full IBPB now.
+*/
+   indirect_branch_prediction_barrier();
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
int i;
 
if (unlikely(cpu != vcpu->cpu)) {
@@ -1739,6 +1749,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (static_cpu_has(X86_FEATURE_RDTSCP))
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
+   if (sd->current_vmcb != svm->vmcb) {
+   sd->current_vmcb = svm->vmcb;
+   indirect_branch_prediction_barrier();
+   }
avic_vcpu_load(vcpu, cpu);
 }
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index dac564d..f82a44c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2296,6 +2296,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
vmcs_load(vmx->loaded_vmcs->vmcs);
+   indirect_branch_prediction_barrier();
}
 
if (!already_loaded) {
@@ -9613,6 +9614,9 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
goto free_msrs;
 
msr_bitmap = vmx->vmcs01.msr_bitmap;
+
+   if (boot_cpu_has(X86_FEATURE_IBPB))
+   vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PRED_CMD, 
MSR_TYPE_RW);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, 
MSR_TYPE_RW);
-- 
2.7.4



Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-28 Thread KarimAllah Ahmed

On 01/28/2018 09:21 PM, Konrad Rzeszutek Wilk wrote:

On January 28, 2018 2:29:10 PM EST, KarimAllah Ahmed  wrote:

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests
that will only mitigate Spectre V2 through IBRS+IBPB and will not be
using a
retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the
MSR_IA32_SPEC_CTRL
for guests that do not actually use the MSR, only add_atomic_switch_msr
when a
non-zero is written to it.



We tried this and found that it was about 3% slower that doing the old way of 
rdmsr and wrmsr.


I actually have not measured the performance difference between using 
the atomic_switch vs just just doing rdmsr/wrmsr. I was mostly focused 
on not saving and restoring when the guest does not actually use the MSRs.


Interesting data point though, I will update the code to use rdmsr/wrmsr 
and see if I see it in my hardware (I am using a skylake processor).




But that was also with the host doing IBRS  as well.

On what type of hardware did you run this?

Ccing Daniel.


Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: Ashok Raj 
---
arch/x86/kvm/cpuid.c |  4 +++-
arch/x86/kvm/cpuid.h |  1 +
arch/x86/kvm/vmx.c   | 63

3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..dc78095 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
/* These are scattered features in cpufeatures.h. */
#define KVM_CPUID_BIT_AVX512_4VNNIW 2
#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KVM_CPUID_BIT_SPEC_CTRL 26
#define KF(x) bit(KVM_CPUID_BIT_##x)

int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct
kvm_cpuid_entry2 *entry, u32 function,

/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \
+   (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ? KF(SPEC_CTRL) : 0);

/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index cdc70a3..dcfe227 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
};

static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned
x86_feature)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aa8638a..1b743a0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -920,6 +920,9 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu,
bool masked);
static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned
long *msr_bitmap,
+ u32 msr, int type);
+

static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -2007,6 +2010,28 @@ static void add_atomic_switch_msr(struct
vcpu_vmx *vmx, unsigned msr,
m->host[i].value = host_val;
}

+/* do not touch guest_val and host_val if the msr is not found */
+static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
+ u64 *guest_val, u64 *host_val)
+{
+   unsigned i;
+   struct msr_autoload *m = &vmx->msr_autoload;
+
+   for (i = 0; i < m->nr; ++i)
+   if (m->guest[i].index == msr)
+   break;
+
+   if (i == m->nr)
+   return 1;
+
+   if (guest_val)
+   *guest_val = m->guest[i].value;
+   if (host_val)
+   *host_val = m->host[i].value;
+
+   return 0;
+}
+
static bool update_transition_efer(struct vcpu_vmx *vmx, int
efer_offset)
{
u64 guest_efer = vmx->vcpu.arch.efer;
@@ -3203,7 +3228,9 @@ static inline bool
vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
  */
static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data
*msr_info)
{
+   u64 spec_ctrl = 0;
struct shared_msr_entry *msr;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);

switch (msr_info->index) {
#ifdef CONFIG_X86_64
@@ -3223,6 +3250,19 @@ static int vmx_get_msr(struct kvm_vcp

Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-29 Thread KarimAllah Ahmed

On 01/29/2018 09:46 AM, David Woodhouse wrote:

On Sun, 2018-01-28 at 16:39 -0800, Liran Alon wrote:


Windows use IBRS and Microsoft don't have any plans to switch to retpoline.
Running a Windows guest should be a pretty common use-case no?

In addition, your handle of the first WRMSR intercept could be different.
It could signal you to start doing the following:
1. Disable intercept on SPEC_CTRL MSR.
2. On VMEntry, Write vCPU SPEC_CTRL value into physical MSR.
3. On VMExit, read physical MSR into vCPU SPEC_CTRL value.
(And if IBRS is used at host, also set physical SPEC_CTRL MSR here to 1)

That way, you will both have fastest option as long as guest don't use IBRS
and also won't have the 3% performance hit compared to Konrad's proposal.

Am I missing something?


Reads from the SPEC_CTRL MSR are strangely slow. I suspect a large part
of the 3% speedup you observe is because in the above, the vmentry path
doesn't need to *read* the host's value and store it; the host is
expected to restore it for itself anyway?

I'd actually quite like to repeat the benchmark on the new fixed
microcode, if anyone has it yet, to see if that read/swap slowness is
still quite as excessive. I'm certainly not ruling this out, but I'm
just a little wary of premature optimisation, and I'd like to make sure
we have everything *else* in the KVM patches right first.

The fact that the save-and-restrict macros I have in the tip of my
working tree at the moment are horrid and causing 0-day nastygrams,
probably doesn't help persuade me to favour the approach ;)

... hm, the CPU actually has separate MSR save/restore lists for
entry/exit, doesn't it? Is there any way to sanely make use of that and
do the restoration manually on vmentry but let it be automatic on
vmexit, by having it *only* in the guest's MSR-store area to be saved
on exit and restored on exit, but *not* in the host's MSR-store area?

Reading the code and comparing with the SDM, I can't see where we're
ever setting VM_EXIT_MSR_STORE_{ADDR,COUNT} except in the nested
case...


Hmmm ... you are probably right! I think all users of this interface
always trap + update save area and never passthrough the MSR. That is
why only LOAD is needed *so far*.

Okay, let me sort this out in v3 then.




Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


NOTE

2018-01-29 Thread Ahmed Zama
Attn

I was able to trace a huge sum of money in my department that bellongs
to our deceased customer according to my findings.I want to present
you as the beneficiary of this huge sum of money.I will give you the
full explanation as soon as you respond to this email.

Ahmed Zama


Re: [PATCH v2 2/4] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-29 Thread KarimAllah Ahmed

On 01/29/2018 11:44 AM, Paolo Bonzini wrote:

On 29/01/2018 01:58, KarimAllah Ahmed wrote:

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the
MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only
add_atomic_switch_msr when a non-zero is written to it.


You are not storing the guest's MSR value on though vmexit, aren't you?


I originally thought that atomic_switch was also saving the guest MSR on 
VM-exit. Now I know it is not.



Also, there's an obvious typo here:

+   add_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, msr_info->data, 
0);
+
+   msr_bitmap = vmx->vmcs01.msr_bitmap;
+   vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, 
MSR_TYPE_RW);
+


oops! copy & paste error :)



Finally, apparently add_atomic_switch_msr is slower than just rdmsr/wrmsr
on vmexit.  Can you reuse the patches I had posted mid January instead?  They
are also assuming no IBRS usage on the host, so the changes shouldn't be large,
and limited mostly to using actual X86_FEATURE_* bits instead of cpuid_count().

They lack the code to only read/write SPEC_CTRL if the direct access is enabled,
but that's small too...  Enabling the direct access on the first write, as in
this patches, is okay.

Thanks,

Paolo


Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 

---
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
   when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
---
  arch/x86/kvm/cpuid.c |  4 +++-
  arch/x86/kvm/vmx.c   | 65 
  arch/x86/kvm/x86.c   |  1 +
  3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..32c0c14 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
  /* These are scattered features in cpufeatures.h. */
  #define KVM_CPUID_BIT_AVX512_4VNNIW 2
  #define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KVM_CPUID_BIT_IBRS  26
  #define KF(x) bit(KVM_CPUID_BIT_##x)
  
  int kvm_update_cpuid(struct kvm_vcpu *vcpu)

@@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
  
  	/* cpuid 7.0.edx*/

const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \
+   (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0);
  
  	/* all calls to cpuid_count() should be made on the same cpu */

get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aa8638a..dac564d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -920,6 +920,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked);
  static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
  static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long 
*msr_bitmap,
+ u32 msr, int type);
  
  static DEFINE_PER_CPU(struct vmcs *, vmxarea);

  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -2007,6 +2009,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, 
unsigned msr,
m->host[i].value = host_val;
  }
  
+/* do not touch guest_val and host_val if the msr is not found */

+static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
+ u64 *guest_val, u64 *host_val)
+{
+   unsigned i;
+   struct msr_autoload *m = &vmx->msr_autoload;
+
+   for (i = 0; i < m->nr; ++i)
+   if (m->guest[i].index == msr)
+   break;
+
+   if (i == m->nr)
+   return 1;
+
+   if (guest_val)
+   *guest_val = m->guest[i].value;
+   if (host_val)
+   *host_val = m->host[i].value;
+
+   return 0;
+}
+
  static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
  {
u64 guest_efer = vmx->vcpu.arch.efer;
@@ -3203,7 +3227,9 @@ static inline bool vmx_feature_control_msr_valid(struct 
kvm_vcpu *vcpu,
   */
  static int vmx_g

Re: [PATCH v2 4/4] x86: vmx: Allow direct access to MSR_IA32_ARCH_CAPABILITIES

2018-01-29 Thread KarimAllah Ahmed

On 01/29/2018 07:55 PM, Jim Mattson wrote:

Why should this MSR be pass-through? I doubt that it would be accessed
frequently.


True. Will update it to be emulated and allow user-space to set the 
value exposed.




On Sun, Jan 28, 2018 at 4:58 PM, KarimAllah Ahmed  wrote:

Add direct access to MSR_IA32_SPEC_CTRL for guests. Future intel processors
will use this MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1).

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
---
  arch/x86/kvm/cpuid.c | 4 +++-
  arch/x86/kvm/vmx.c   | 2 ++
  2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 32c0c14..2339b1a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -71,6 +71,7 @@ u64 kvm_supported_xcr0(void)
  #define KVM_CPUID_BIT_AVX512_4VNNIW 2
  #define KVM_CPUID_BIT_AVX512_4FMAPS 3
  #define KVM_CPUID_BIT_IBRS  26
+#define KVM_CPUID_BIT_ARCH_CAPABILITIES 29
  #define KF(x) bit(KVM_CPUID_BIT_##x)

  int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -394,7 +395,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 /* cpuid 7.0.edx*/
 const u32 kvm_cpuid_7_0_edx_x86_features =
 KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \
-   (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0);
+   (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0) | \
+   (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) ? 
KF(ARCH_CAPABILITIES) : 0);

 /* all calls to cpuid_count() should be made on the same cpu */
 get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f82a44c..99cb761 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9617,6 +9617,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)

 if (boot_cpu_has(X86_FEATURE_IBPB))
 vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PRED_CMD, 
MSR_TYPE_RW);
+   if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+   vmx_disable_intercept_for_msr(msr_bitmap, 
MSR_IA32_ARCH_CAPABILITIES, MSR_TYPE_R);
 vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW);
 vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW);
 vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, 
MSR_TYPE_RW);
--
2.7.4




Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-29 Thread KarimAllah Ahmed

On 01/29/2018 08:04 PM, Jim Mattson wrote:

Can I assume you'll send out a new version with the fixes?


Yes, I am currently doing some tests and once I am done I will send a 
new round.


... and the typo is already fixed in 'ibpb-wip' :)



On Mon, Jan 29, 2018 at 11:01 AM, David Woodhouse  wrote:


(Top-posting; sorry.)

Much of that is already fixed during our day, in
http://git.infradead.org/linux-retpoline.git/shortlog/refs/heads/ibpb

I forgot to fix up the wrong-MSR typo though, and we do still need to address 
reset.

On Mon, 2018-01-29 at 10:43 -0800, Jim Mattson wrote:

On Sun, Jan 28, 2018 at 11:29 AM, KarimAllah Ahmed  wrote:


Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests
that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a
retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL
for guests that do not actually use the MSR, only add_atomic_switch_msr when a
non-zero is written to it.

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: Ashok Raj 
---
  arch/x86/kvm/cpuid.c |  4 +++-
  arch/x86/kvm/cpuid.h |  1 +
  arch/x86/kvm/vmx.c   | 63 
  3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..dc78095 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
  /* These are scattered features in cpufeatures.h. */
  #define KVM_CPUID_BIT_AVX512_4VNNIW 2
  #define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KVM_CPUID_BIT_SPEC_CTRL 26
  #define KF(x) bit(KVM_CPUID_BIT_##x)

  int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,

 /* cpuid 7.0.edx*/
 const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \
+   (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ? KF(SPEC_CTRL) : 0);

Isn't 'boot_cpu_has()' superflous here? And aren't there two bits to
pass through for existing CPUs (26 and 27)?




 /* all calls to cpuid_count() should be made on the same cpu */
 get_cpu();
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index cdc70a3..dcfe227 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 [CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
 [CPUID_7_ECX] = { 7, 0, CPUID_ECX},
 [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
  };

  static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned 
x86_feature)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aa8638a..1b743a0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -920,6 +920,9 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked);
  static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
 u16 error_code);
  static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long 
*msr_bitmap,
+ u32 msr, int type);
+

  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -2007,6 +2010,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, 
unsigned msr,
 m->host[i].value = host_val;
  }

+/* do not touch guest_val and host_val if the msr is not found */
+static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
+ u64 *guest_val, u64 *host_val)
+{
+   unsigned i;
+   struct msr_autoload *m = &vmx->msr_autoload;
+
+   for (i = 0; i < m->nr; ++i)
+   if (m->guest[i].index == msr)
+   break;
+
+   if (i == m->nr)
+   return 1;
+
+   if (guest_val)
+   *guest_val = m->guest[i].value;
+   if (host_val)
+   *host_val = m->host[i].value;
+
+   return 0;
+}
+
  static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
  {
 u64 guest_efer = vmx->vcpu.arch.efer;
@@ -3203,7 +3228,9 @@ static inline bool vmx_feature_control_msr_valid(struct 
kvm_vcpu *vcpu,
   */
  static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
  {
+   u64 spec_ctrl = 0;
 struct shared_msr_entry *

[PATCH v3 1/4] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX

2018-01-29 Thread KarimAllah Ahmed
[dwmw2: Stop using KF() for bits in it, too]
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/cpuid.c | 8 +++-
 arch/x86/kvm/cpuid.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW 2
-#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+/* For scattered features from cpufeatures.h; we currently expose none */
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);
entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+   cpuid_mask(&entry->edx, CPUID_7_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index cdc70a3..dcfe227 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
-- 
2.7.4



[PATCH v3 2/4] KVM: x86: Add IBPB support

2018-01-29 Thread KarimAllah Ahmed
From: Ashok Raj 

Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor
barriers on switching between VMs to avoid inter VM Spectre-v2 attacks.

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
   - vmx: expose PRED_CMD whenever it is available
   - svm: only pass through IBPB if it is available
   - vmx: support !cpu_has_vmx_msr_bitmap()]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
PRED_CMD is a write-only MSR]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Signed-off-by: Ashok Raj 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com
Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
---
 arch/x86/kvm/cpuid.c | 11 ++-
 arch/x86/kvm/svm.c   | 14 ++
 arch/x86/kvm/vmx.c   | 12 
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+   /* cpuid 0x8008.ebx */
+   const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+   F(IBPB);
+
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
-   entry->ebx = entry->edx = 0;
+   entry->edx = 0;
+   /* IBPB isn't necessarily present in hardware cpuid */
+   if (boot_cpu_has(X86_FEATURE_IBPB))
+   entry->ebx |= F(IBPB);
+   entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+   cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
}
case 0x8019:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2744b973..c886e46 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -529,6 +529,7 @@ struct svm_cpu_data {
struct kvm_ldttss_desc *tss_desc;
 
struct page *save_area;
+   struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -918,6 +919,9 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
 
set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
}
+
+   if (boot_cpu_has(X86_FEATURE_IBPB))
+   set_msr_interception(msrpm, MSR_IA32_PRED_CMD, 1, 1);
 }
 
 static void add_msr_offset(u32 offset)
@@ -1706,11 +1710,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
+   /*
+* The vmcb page can be recycled, causing a false negative in
+* svm_vcpu_load(). So do a full IBPB now.
+*/
+   indirect_branch_prediction_barrier();
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
int i;
 
if (unlikely(cpu != vcpu->cpu)) {
@@ -1739,6 +1749,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (static_cpu_has(X86_FEATURE_RDTSCP))
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
+   if (sd->current_vmcb != svm->vmcb) {
+   sd->current_vmcb = svm->vmcb;
+   indirect_branch_prediction_barrier();
+   }
avic_vcpu_load(vcpu, cpu);
 }
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aa8638a..ea278ce 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2272,6 +2272,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
vmcs_load(vmx->loaded_vmcs->vmcs);
+   indirect_branch_prediction_barrier();
}
 
if (!already_loaded) {
@@ -3330,6 +3331,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);
break;
+   case MSR_IA32_PRED_CMD:
+   if (!msr_info

[PATCH v3 3/4] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

2018-01-29 Thread KarimAllah Ahmed
Future intel processors will use MSR_IA32_ARCH_CAPABILITIES MSR to indicate
RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default
the contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/cpuid.c |  2 +-
 arch/x86/kvm/vmx.c   | 15 +++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ea278ce..798a00b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -581,6 +581,8 @@ struct vcpu_vmx {
u64   msr_host_kernel_gs_base;
u64   msr_guest_kernel_gs_base;
 #endif
+   u64   arch_capabilities;
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -3224,6 +3226,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+   return 1;
+   msr_info->data = to_vmx(vcpu)->arch_capabilities;
+   break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -3339,6 +3347,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (data & PRED_CMD_IBPB)
wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated)
+   return 1;
+   vmx->arch_capabilities = data;
+   break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5599,6 +5612,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
 
+   if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+   rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb..8e889dc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1006,6 +1006,7 @@ static u32 msrs_to_save[] = {
 #endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+   MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;
-- 
2.7.4



[PATCH v3 4/4] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-01-29 Thread KarimAllah Ahmed
[ Based on a patch from Ashok Raj  ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the
MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only
add_atomic_switch_msr when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: David Woodhouse 
Cc: Greg KH 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
  when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
v3:
- Save/restore manually
- Fix CPUID handling
- Fix a copy & paste error in the name of SPEC_CTRL MSR in
  disable_intercept.
- support !cpu_has_vmx_msr_bitmap()
---
 arch/x86/kvm/cpuid.c |  7 +--
 arch/x86/kvm/vmx.c   | 59 
 arch/x86/kvm/x86.c   |  2 +-
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..662d0c0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+   F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
entry->edx = 0;
-   /* IBPB isn't necessarily present in hardware cpuid */
+   /* IBRS and IBPB aren't necessarily present in hardware cpuid */
if (boot_cpu_has(X86_FEATURE_IBPB))
entry->ebx |= F(IBPB);
+   if (boot_cpu_has(X86_FEATURE_IBRS))
+   entry->ebx |= F(IBRS);
entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 798a00b..9ac9747 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -582,6 +582,8 @@ struct vcpu_vmx {
u64   msr_guest_kernel_gs_base;
 #endif
u64   arch_capabilities;
+   u64   spec_ctrl;
+   bool  save_spec_ctrl_on_exit;
 
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
@@ -922,6 +924,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool 
masked);
 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long 
*msr_bitmap,
+ u32 msr, int type);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3226,6 +3230,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   msr_info->data = to_vmx(vcpu)->spec_ctrl;
+   break;
case MSR_IA32_ARCH_CAPABILITIES:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
@@ -3339,6 +3350,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);

[PATCH v3 0/4] KVM: Expose speculation control feature to guests

2018-01-29 Thread KarimAllah Ahmed
Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future
Intel processors to indicate RDCL_NO and IBRS_ALL.

Ashok Raj (1):
  KVM: x86: Add IBPB support

KarimAllah Ahmed (3):
  KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
  KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

 arch/x86/kvm/cpuid.c | 22 ++
 arch/x86/kvm/cpuid.h |  1 +
 arch/x86/kvm/svm.c   | 14 +
 arch/x86/kvm/vmx.c   | 86 
 arch/x86/kvm/x86.c   |  1 +
 5 files changed, 118 insertions(+), 6 deletions(-)

Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Ashok Raj 
Cc: Asit Mallick 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Janakarajan Natarajan 
Cc: Joerg Roedel 
Cc: Jun Nakajima 
Cc: Laura Abbott 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: Tim Chen 
Cc: Tom Lendacky 
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org

-- 
2.7.4



Re: [PATCH v3 3/4] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

2018-01-29 Thread KarimAllah Ahmed

On 01/30/2018 01:22 AM, Raj, Ashok wrote:

On Tue, Jan 30, 2018 at 01:10:27AM +0100, KarimAllah Ahmed wrote:

Future intel processors will use MSR_IA32_ARCH_CAPABILITIES MSR to indicate
RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default
the contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Cc: Asit Mallick 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Tim Chen 
Cc: Linus Torvalds 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Jun Nakajima 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Ashok Raj 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
---
  arch/x86/kvm/cpuid.c |  2 +-
  arch/x86/kvm/vmx.c   | 15 +++
  arch/x86/kvm/x86.c   |  1 +
  3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
  
  	/* cpuid 7.0.edx*/

const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
  
  	/* all calls to cpuid_count() should be made on the same cpu */

get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ea278ce..798a00b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -581,6 +581,8 @@ struct vcpu_vmx {
u64   msr_host_kernel_gs_base;
u64   msr_guest_kernel_gs_base;
  #endif
+   u64   arch_capabilities;
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -3224,6 +3226,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+   return 1;
+   msr_info->data = to_vmx(vcpu)->arch_capabilities;
+   break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -3339,6 +3347,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (data & PRED_CMD_IBPB)
wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated)
+   return 1;
+   vmx->arch_capabilities = data;
+   break;


arch capabilities is read only. You don't need the set_msr handling for this.


This is only for host driven writes. This would allow QEMU/whatever to
override the default value (i.e. the value from the hardware).




case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5599,6 +5612,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
  
+	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))

+   rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
  
  	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
  
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c

index 03869eb..8e889dc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1006,6 +1006,7 @@ static u32 msrs_to_save[] = {
  #endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+   MSR_IA32_ARCH_CAPABILITIES


Same here.. no need to save/restore this.


  };
  
  static unsigned num_msrs_to_save;

--
2.7.4




Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


[PATCH] kvm: Map PFN-type memory regions as writable (if possible)

2018-01-17 Thread KarimAllah Ahmed
For EPT-violations that are triggered by a read, the pages are also mapped with
write permissions (if their memory region is also writable). That would avoid
getting yet another fault on the same page when a write occurs.

This optimization only happens when you have a "struct page" backing the memory
region. So also enable it for memory regions that do not have a "struct page".

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
---
 virt/kvm/kvm_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 97da45e..0efb089 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1534,6 +1534,8 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool 
atomic, bool *async,
goto retry;
if (r < 0)
pfn = KVM_PFN_ERR_FAULT;
+   if (writable)
+   *writable = true;
} else {
if (async && vma_is_valid(vma, write_fault))
*async = true;
-- 
2.7.4



[PATCH] pci: Store more data about VFs into the SRIOV struct

2018-01-17 Thread KarimAllah Ahmed
... to avoid reading them from the config space of all the PCI VFs. This is
specially a useful optimization when bringing up thousands of VFs.

Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
---
 drivers/pci/iov.c   | 20 ++--
 drivers/pci/pci.h   |  6 +-
 drivers/pci/probe.c | 42 --
 3 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 168328a..78e9595 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -129,7 +129,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, 
int resno)
if (!dev->is_physfn)
return 0;
 
-   return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
+   return dev->sriov->vf_barsz[resno - PCI_IOV_RESOURCES];
 }
 
 int batch_pci_iov_add_virtfn(struct pci_dev *dev, struct pci_bus **bus,
@@ -325,6 +325,20 @@ static void pci_iov_wq_fn(struct work_struct *work)
kfree(req);
 }
 
+static void pci_read_vf_config_common(struct pci_bus *bus,
+ struct pci_dev *dev)
+{
+   int devfn = pci_iov_virtfn_devfn(dev, 0);
+
+   pci_bus_read_config_dword(bus, devfn, PCI_CLASS_REVISION,
+ &dev->sriov->vf_class);
+   pci_bus_read_config_word(bus, devfn, PCI_SUBSYSTEM_ID,
+&dev->sriov->vf_subsystem_device);
+   pci_bus_read_config_word(bus, devfn, PCI_SUBSYSTEM_VENDOR_ID,
+&dev->sriov->vf_subsystem_vendor);
+   pci_bus_read_config_byte(bus, devfn, PCI_HEADER_TYPE, 
&dev->sriov->vf_hdr_type);
+}
+
 static struct workqueue_struct *pci_iov_wq;
 
 static int __init init_pci_iov_wq(void)
@@ -361,6 +375,8 @@ static int enable_vfs(struct pci_dev *dev, int nr_vfs)
goto add_bus_fail;
}
 
+   pci_read_vf_config_common(bus[0], dev);
+
while (remaining_vfs > 0) {
bool ret;
struct pci_iov_wq_item *req;
@@ -617,7 +633,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
rc = -EIO;
goto failed;
}
-   iov->barsz[i] = resource_size(res);
+   iov->vf_barsz[i] = resource_size(res);
res->end = res->start + resource_size(res) * total - 1;
dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for 
%d VFs)\n",
 i, res, i, total);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index f6b58b3..3264c9e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -271,7 +271,11 @@ struct pci_sriov {
u16 driver_max_VFs; /* max num VFs driver supports */
struct pci_dev *dev;/* lowest numbered PF */
struct pci_dev *self;   /* this PF */
-   resource_size_t barsz[PCI_SRIOV_NUM_BARS];  /* VF BAR size */
+   u8 vf_hdr_type; /* VF header type */
+   u32 vf_class;   /* VF device */
+   u16 vf_subsystem_vendor;/* VF subsystem vendor */
+   u16 vf_subsystem_device;/* VF subsystem device */
+   resource_size_t vf_barsz[PCI_SRIOV_NUM_BARS];   /* VF BAR size */
bool drivers_autoprobe; /* auto probing of VFs by driver */
 };
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 14e0ea1..65099d0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -175,6 +175,7 @@ static inline unsigned long decode_bar(struct pci_dev *dev, 
u32 bar)
 int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
struct resource *res, unsigned int pos)
 {
+   int bar = res - dev->resource;
u32 l = 0, sz = 0, mask;
u64 l64, sz64, mask64;
u16 orig_cmd;
@@ -194,9 +195,13 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
type,
res->name = pci_name(dev);
 
pci_read_config_dword(dev, pos, &l);
-   pci_write_config_dword(dev, pos, l | mask);
-   pci_read_config_dword(dev, pos, &sz);
-   pci_write_config_dword(dev, pos, l);
+   if (dev->is_virtfn) {
+   sz = dev->physfn->sriov->vf_barsz[bar] & 0x;
+   } else {
+   pci_write_config_dword(dev, pos, l | mask);
+   pci_read_config_dword(dev, pos, &sz);
+   pci_write_config_dword(dev, pos, l);
+   }
 
/*
 * All bits set in sz means the device isn't working properly.
@@ -236,9 +241,14 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
type,
 
if (res->flags & IORESOURCE_MEM_64) {
pci_read_config_dword(dev, pos + 4, &l);
-   pci_write_config_dword(dev, pos + 4, ~0);
-   pci_read_config_dword(dev, pos + 4, &sz);
-   pci_write

[PATCH v2] kvm: Map PFN-type memory regions as writable (if possible)

2018-01-17 Thread KarimAllah Ahmed
For EPT-violations that are triggered by a read, the pages are also mapped with
write permissions (if their memory region is also writable). That would avoid
getting yet another fault on the same page when a write occurs.

This optimization only happens when you have a "struct page" backing the memory
region. So also enable it for memory regions that do not have a "struct page".

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 

---
v2:
- Move setting writable to hva_to_pfn_remapped
- Extend hva_to_pfn_remapped interface to accept writable as a parameter
---
 virt/kvm/kvm_main.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 97da45e..88702d5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1438,7 +1438,8 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool 
write_fault)
 
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
   unsigned long addr, bool *async,
-  bool write_fault, kvm_pfn_t *p_pfn)
+  bool write_fault, bool *writable,
+  kvm_pfn_t *p_pfn)
 {
unsigned long pfn;
int r;
@@ -1464,6 +1465,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 
}
 
+   if (writable)
+   *writable = true;
 
/*
 * Get a reference here because callers of *hva_to_pfn* and
@@ -1529,7 +1532,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool 
atomic, bool *async,
if (vma == NULL)
pfn = KVM_PFN_ERR_FAULT;
else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
-   r = hva_to_pfn_remapped(vma, addr, async, write_fault, &pfn);
+   r = hva_to_pfn_remapped(vma, addr, async, write_fault, 
writable, &pfn);
if (r == -EAGAIN)
goto retry;
if (r < 0)
-- 
2.7.4



[PATCH] pci: Do not read INTx PIN and LINE registers for virtual functions

2018-01-17 Thread KarimAllah Ahmed
... since INTx is not supported by-spec for virtual functions.

Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: Jan H. Schönherr 
---
 drivers/pci/probe.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 65099d0..61002fb 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1232,6 +1232,13 @@ static void pci_read_irq(struct pci_dev *dev)
 {
unsigned char irq;
 
+   /* Virtual functions do not have INTx support */
+   if (dev->is_virtfn) {
+   dev->pin = 0;
+   dev->irq = 0;
+   return;
+   }
+
pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &irq);
dev->pin = irq;
if (irq)
-- 
2.7.4



Re: [PATCH] pci: Do not read INTx PIN and LINE registers for virtual functions

2018-01-17 Thread KarimAllah Ahmed



On 01/17/2018 07:49 PM, Alex Williamson wrote:

On Wed, 17 Jan 2018 19:30:29 +0100
KarimAllah Ahmed  wrote:


... since INTx is not supported by-spec for virtual functions.

But the spec also states that VFs must implement the interrupt pin
register as read-only zero, so either this is redundant or it's a
workaround for VFs that aren't quite compliant?  Thanks,


The end goal for me is just to NOT do the read across the PCI bus for no 
good reason. We have devices with thousands of virtual functions and 
this read is simply not useful in this case and can be optimized as I 
did. So from a functionality point of view probably the patch does not 
add any value as you mentioned, but it is really useful as a 
micro-optimization.




Alex


Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: Jan H. Schönherr 
---
  drivers/pci/probe.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 65099d0..61002fb 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1232,6 +1232,13 @@ static void pci_read_irq(struct pci_dev *dev)
  {
unsigned char irq;
  
+	/* Virtual functions do not have INTx support */

+   if (dev->is_virtfn) {
+   dev->pin = 0;
+   dev->irq = 0;
+   return;
+   }
+
pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &irq);
dev->pin = irq;
if (irq)




Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


[nf-next 1/3] netfilter: export SRH processing functions from seg6local

2018-01-15 Thread Ahmed Abdelsalam
Some functions of seg6local are very useful to process SRv6
encapsulated packets

This patch exports some functions of seg6local that are useful and
can be re-used at different parts of the kernel, including netfilter.

The set of exported functions are:
(1) seg6_get_srh()
(2) seg6_advance_nextseg()
(3) seg6_lookup_nexthop

Signed-off-by: Ahmed Abdelsalam 
---
 include/net/seg6.h|  5 +
 net/ipv6/seg6_local.c | 37 -
 2 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/include/net/seg6.h b/include/net/seg6.h
index 099bad5..b637778 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -63,5 +63,10 @@ extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int 
len);
 extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
 int proto);
 extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
+extern struct ipv6_sr_hdr *seg6_get_srh(struct sk_buff *skb);
+extern void seg6_advance_nextseg(struct ipv6_sr_hdr *srh,
+   struct in6_addr *daddr);
+extern void seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+   u32 tbl_id);
 
 #endif
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index ba3767e..1f1eaa3 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -59,7 +59,7 @@ static struct seg6_local_lwt *seg6_local_lwtunnel(struct 
lwtunnel_state *lwt)
return (struct seg6_local_lwt *)lwt->data;
 }
 
-static struct ipv6_sr_hdr *get_srh(struct sk_buff *skb)
+struct ipv6_sr_hdr *seg6_get_srh(struct sk_buff *skb)
 {
struct ipv6_sr_hdr *srh;
int len, srhoff = 0;
@@ -82,12 +82,13 @@ static struct ipv6_sr_hdr *get_srh(struct sk_buff *skb)
 
return srh;
 }
+EXPORT_SYMBOL_GPL(seg6_get_srh);
 
 static struct ipv6_sr_hdr *get_and_validate_srh(struct sk_buff *skb)
 {
struct ipv6_sr_hdr *srh;
 
-   srh = get_srh(skb);
+   srh = seg6_get_srh(skb);
if (!srh)
return NULL;
 
@@ -107,7 +108,7 @@ static bool decap_and_validate(struct sk_buff *skb, int 
proto)
struct ipv6_sr_hdr *srh;
unsigned int off = 0;
 
-   srh = get_srh(skb);
+   srh = seg6_get_srh(skb);
if (srh && srh->segments_left > 0)
return false;
 
@@ -131,7 +132,7 @@ static bool decap_and_validate(struct sk_buff *skb, int 
proto)
return true;
 }
 
-static void advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr)
+void seg6_advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr)
 {
struct in6_addr *addr;
 
@@ -139,9 +140,10 @@ static void advance_nextseg(struct ipv6_sr_hdr *srh, 
struct in6_addr *daddr)
addr = srh->segments + srh->segments_left;
*daddr = *addr;
 }
+EXPORT_SYMBOL_GPL(seg6_advance_nextseg);
 
-static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
-  u32 tbl_id)
+void seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+u32 tbl_id)
 {
struct net *net = dev_net(skb->dev);
struct ipv6hdr *hdr = ipv6_hdr(skb);
@@ -188,6 +190,7 @@ static void lookup_nexthop(struct sk_buff *skb, struct 
in6_addr *nhaddr,
skb_dst_drop(skb);
skb_dst_set(skb, dst);
 }
+EXPORT_SYMBOL_GPL(seg6_lookup_nexthop);
 
 /* regular endpoint function */
 static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt)
@@ -198,9 +201,9 @@ static int input_action_end(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
if (!srh)
goto drop;
 
-   advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
+   seg6_advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-   lookup_nexthop(skb, NULL, 0);
+   seg6_lookup_nexthop(skb, NULL, 0);
 
return dst_input(skb);
 
@@ -218,9 +221,9 @@ static int input_action_end_x(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
if (!srh)
goto drop;
 
-   advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
+   seg6_advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-   lookup_nexthop(skb, &slwt->nh6, 0);
+   seg6_lookup_nexthop(skb, &slwt->nh6, 0);
 
return dst_input(skb);
 
@@ -237,9 +240,9 @@ static int input_action_end_t(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
if (!srh)
goto drop;
 
-   advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
+   seg6_advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-   lookup_nexthop(skb, NULL, slwt->table);
+   seg6_lookup_nexthop(skb, NULL, slwt->table);
 
return dst_input(skb);
 
@@ -331,7 +334,7 @@ static int input_action_end_dx6(struct sk_buff *skb,
if (!ipv6_addr_any(&slwt->nh6))
nhaddr = &slwt->nh6;
 
-   lookup_nexthop(skb, nhaddr, 0);
+   seg6_lookup_nexthop(skb, n

  1   2   3   4   5   6   7   8   9   10   >