On 26/09/19 11:35, Li Qiang wrote: > So without unrestrict guest the mainline is this: KVM set guest's > rflag bit X86_EFLAGS_VM, so when the guest enter guest mode, it is in > vm86 mode. In this mode, the CPU will access the address like in > real mode(seg*4+offset), this address is linear address. And in fact, > the vm86 is still in protected, so the linear address will be > translated to gpa by the identity mapping table. Then goes to EPT > table?
Yes. > ... as soon as the guest tries to enter protected mode, it will get into > a situation which is not real mode but doesn't have the segment > registers properly loaded with selectors. > > Therefore, it will either > hack things together (enter_pmode) or emulate instructions until the > state is accepted even without unrestricted guest support. > > Could you please explain this situation more detailed? Why this happen? Protected mode entry looks like this: mov %cr0, %eax or $1, %al mov %eax, %cr0 # [1] now in 16-bit protected mode lgdtl gdt32 ljmpl $8, 2f # [2] now in 32-bit protected mode 2: .code32 mov $16, %ax mov %ax, %ds mov %ax, %es mov %ax, %fs mov %ax, %gs mov %ax, %ss # [3] now everything is okay Between [1] and [3] the vmentry could fail if not in unrestricted mode. For example (see checks on guest segment registers in the SDM): - "CS. Type must be 9, 11, 13, or 15 (accessed code segment)." CS in real-mode is a RW data segment, not a code segment. This applies between [1] and [2]. - "SS. If the guest will not be virtual-8086 and the “unrestricted guest” VM-execution control is 0, the RPL (bits 1:0) must equal the RPL of the selector field for CS." This may not be the case if the segment register still holds real-mode values (which are not selectors, just base >> 4). This applies between [1] and [3]. - "DS, ES, FS, GS. The DPL cannot be less than the RPL in the selector field" Again, the real-mode DPL is zero but the RPL makes no sense if the segment registers hold a real-mode value. You can find more about these checks in guest_state_valid(); look at the "else" branch of that function, the "then" branch is for pmode->rmode transitions. When any of the checks fail, KVM emulates instructions instead of using VMX non-root mode (usually it's just a handful of them, as in the case above). Paolo