Re: Found the commit for: 5.3.7 64-bits kernel doesn't boot on G5 Quad [regression]

2019-12-10 Thread Aneesh Kumar K.V
John Paul Adrian Glaubitz  writes:

> Hi!
>
> On 12/10/19 9:35 AM, Romain Dolbeau wrote:
>> Le sam. 16 nov. 2019 à 17:34, Romain Dolbeau  a écrit :
>>> So it seems to me that 0034d395f89d9c092bb15adbabdca5283e258b41
>>> introduced the bug that crashes the PowerMac G5
>> 
>> There's been some commits in that subsystem, so I tried again; as of
>> 6794862a16ef41f753abd75c03a152836e4c8028, the kernel still crashes
>> when trying to boot my PowerMac G5.
>
> If Aneesh is currently unable to look at the problem, I would suggest 
> reverting
> the commit in question since I don't think it's acceptable that users are 
> unable
> to boot their machines anymore after a kernel upgrade.
>

The PowerMac system we have internally was not able to recreate this.
Hence we have not been able to make progress on this.

At this point, I am not sure what would cause the Machine check with
that patch series because we have not changed the VA bits in that patch.

-aneesh



Re: PPC64: G5 & 4k/64k page size (was: Re: Call for report - G5/PPC970 status)

2019-12-20 Thread Aneesh Kumar K.V
Romain Dolbeau  writes:

> Le jeu. 12 déc. 2019 à 22:40, Andreas Schwab  a écrit :
>> I'm using 4K pages, in case that matters
>
> Yes it does matter, as it seems to be the difference between "working"
> and "not working" :-)
> Thank you for the config & pointing out the culprit!
>
> With your config, my machine boots (though it's missing some features
> as the config seems quite tuned).
>
> Moving from 64k pages to 4k pages on 'my' config (essentially,
> Debian's 5.3 with default values for changes since), my machine boots
> as well & everything seems to work fine.
>
> So question to Aneesh - did you try 64k pages on your G5, or only 4k?
> In the second case, could you try with 64k to see if you can reproduce
> the crash?

I don't have direct access to this system, I have asked if we can get a run
with 64K. 

Meanwhile is there a way to find out what caused MachineCheck? more
details on this? I was checking the manual and I don't see any
restrictions w.r.t effective address. We now have very high EA with 64K
page size. 

-aneesh



Re: PPC64: G5 & 4k/64k page size (was: Re: Call for report - G5/PPC970 status)

2020-01-06 Thread Aneesh Kumar K.V
Romain Dolbeau  writes:

> Le sam. 21 déc. 2019 à 05:31, Aneesh Kumar K.V
>  a écrit :
>> I don't have direct access to this system, I have asked if we can get a run
>> with 64K.
>
> OK, thanks! Do you know which model it is? It seems to be working on
> some systems,
> but we don't have enough samples to figure out why at this time, I think.
>
>> Meanwhile is there a way to find out what caused MachineCheck? more
>> details on this? I was checking the manual and I don't see any
>> restrictions w.r.t effective address. We now have very high EA with 64K
>> page size.
>
> Sorry, no idea, completely out of my depth here. I can try some kernel
> (build, runtime) options and/or patch, but someone will have to tell
> me what to try,
> as I have no ideas.


Can you try this change.

modified   arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -580,7 +580,7 @@ extern void slb_set_size(u16 size);
 #if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT)
 #define MAX_KERNEL_CTX_CNT (1UL << (MAX_PHYSMEM_BITS - 
MAX_EA_BITS_PER_CONTEXT))
 #else
-#define MAX_KERNEL_CTX_CNT 1
+#define MAX_KERNEL_CTX_CNT 4
 #endif
 
 #define MAX_VMALLOC_CTX_CNT1


-aneesh



Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6,7)

2020-05-20 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> Le 18/05/2020 à 17:19, Rui Salvaterra a écrit :
>> Hi again, Christophe,
>> 
>> On Mon, 18 May 2020 at 15:03, Christophe Leroy
>>  wrote:
>>>
>>> Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It may
>>> have broken swap.
>> 
>> Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top
>> survives the beating. I'll be happy to test a definitive patch!
>> 
>
> Yeah I discovered recently that the way swap is implemented on powerpc 
> expects RW and other important bits not be one of the 3 least 
> significant bits (see __pte_to_swp_entry() )

The last 3 bits are there to track the _PAGE_PRESENT right? What is the
RW dependency there? Are you suggesting of read/write migration entry?
A swap entry should not retain the pte rw bits right? 

A swap entry is built using swap type + offset. And it should not have a
dependency on pte RW bits. Along with type and offset we also should
have the ability to mark it as a pte entry and also set not present
bits. With that understanding what am I missing here?

>
> I guess the easiest for the time being is to revert the commit with a 
> proper explanation of the issue, then one day we'll modify the way 
> powerpc manages swap.
>

-aneesh



Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6,7)

2020-05-20 Thread Aneesh Kumar K.V

On 5/20/20 7:23 PM, Christophe Leroy wrote:



Le 20/05/2020 à 15:43, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


Le 18/05/2020 à 17:19, Rui Salvaterra a écrit :

Hi again, Christophe,

On Mon, 18 May 2020 at 15:03, Christophe Leroy
 wrote:


Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It 
may

have broken swap.


Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top
survives the beating. I'll be happy to test a definitive patch!



Yeah I discovered recently that the way swap is implemented on powerpc
expects RW and other important bits not be one of the 3 least
significant bits (see __pte_to_swp_entry() )


The last 3 bits are there to track the _PAGE_PRESENT right? What is the
RW dependency there? Are you suggesting of read/write migration entry?
A swap entry should not retain the pte rw bits right?

A swap entry is built using swap type + offset. And it should not have a
dependency on pte RW bits. Along with type and offset we also should
have the ability to mark it as a pte entry and also set not present
bits. With that understanding what am I missing here?


That's probably me who is missing something, I have not digged into the 
swap functionning yet indeed, so that was only my first feeling.


By the way, the problems is definitely due to the order changes in the 
PTE bits, whether that's because _PAGE_RW was moved to the last 3 bits 
or whether that's because _PAGE_PRESENT was moved out of the last 3 
bits, I don't know yet.


My (bad) understanding is from the fact that  __pte_to_swp_entry() is a 
right shift by 3 bits, so it looses the last 3 bits, and therefore 
__swp_entry_to_pte(__pte_to_swp_entry(pte)) looses the last 3 bits of a 
PTE.


Is there somewhere a description of how swap works exactly ?



Looking at  __set_pte_at(), I am wondering whether this was due to 
_PAGE_HASHPTE? . This would mean we end up wrongly updating some swap 
entry details. We call set_pte_at() on swap pte entries.


-aneesh