On 23.11.2012, at 22:42, Paul Mackerras wrote:
> On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote:
>>
>> On 22.11.2012, at 10:25, Paul Mackerras wrote:
>>
>>> + /* Do they have an SLB shadow buffer registered? */
>>> + slb = vcpu->arch.slb_shadow.pinned_addr;
>>> + if (!slb)
>>> + return;
>>
>> Mind to explain this case? What happens here? Do we leave the guest with an
>> empty SLB? Why would this ever happen? What happens next as soon as we go
>> back into the guest?
>
> Yes, we leave the guest with an empty SLB, the access gets retried and
> this time the guest gets an SLB miss interrupt, which it can hopefully
> handle using an SLB miss handler that runs entirely in real mode.
> This could happen for instance while the guest is in SLOF or yaboot or
> some other code that runs basically in real mode but occasionally
> turns the MMU on for some accesses, and happens to have a bug where it
> creates a duplicate SLB entry.
Is this what pHyp does? Also, is this what we want? Why don't we populate an
#MC into the guest so it knows it did something wrong?
Alex
>
>>> + /* Sanity check */
>>> + n = slb->persistent;
>>> + if (n > SLB_MIN_SIZE)
>>> + n = SLB_MIN_SIZE;
>>
>> Please use a max() macro here.
>
> OK.
>
>>> + rb = 0x800; /* IS field = 0b10, flush congruence class */
>>> + for (i = 0; i < 128; ++i) {
>>
>> Please put up a #define for this. POWER7_TLB_SIZE or so. Is there any way to
>> fetch that number from an SPR? I don't really want to have a p7+ and a p8
>> function in here too ;).
>>
>>> + asm volatile("tlbiel %0" : : "r" (rb));
>>> + rb += 0x1000;
>>
>> I assume this also means (1 << TLBIE_ENTRY_SHIFT)? Would be nice to keep the
>> code readable without guessing :).
>
> The 0x800 and 0x1000 are taken from the architecture - it defines
> fields in the RB value for the flush type and TLB index. The 128 is
> POWER7-specific and isn't in any SPR that I know of. Eventually we'll
> probably have to put it (the number of TLB congruence classes) in the
> cputable, but for now I'll just do a define.
>
>> So I take it that p7 does not implement tlbia?
>
> Correct.
>
>> So we never return 0? How about ECC errors and the likes? Wouldn't those
>> also be #MCs that the host needs to handle?
>
> Yes, true. In fact the OPAL firmware gets to see the machine checks
> before we do (see the opal_register_exception_handler() calls in
> arch/powerpc/platforms/powernv/opal.c), so it should have already
> handled recoverable things like L1 cache parity errors.
>
> I'll make the function return 0 if there's an error that it doesn't
> know about.
>
>>> ld r8, HSTATE_VMHANDLER(r13)
>>> ld r7, HSTATE_HOST_MSR(r13)
>>>
>>> cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
>>> +BEGIN_FTR_SECTION
>>> beq 11f
>>> - cmpwi r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
>>
>> Mind to explain the logic that happens here? Do we get external interrupts
>> on 970? If not, the cmpwi should also be inside the FTR section. Also, if we
>> do a beq here, why do the beqctr below again?
>
> I was making it not call the host kernel machine check handler if it
> was a machine check that pulled us out of the guest. In fact we
> probably do still want to call the handler, but we don't want to jump
> to 0x200, since that has been patched by OPAL, and we don't want to
> make OPAL think we just got another machine check. Instead we would
> need to jump to machine_check_pSeries.
>
> The feature section is because POWER7 sets HSRR0/1 on external
> interrupts, whereas PPC970 sets SRR0/1.
>
> Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html