I have used more debug flags, which increased the execution time by a lot, but I got some new information out of it:
 
Addresses : var = 39b765b0, start = 198325b0, phys = 198325b0 (output in meltdown "reliability.c" code, after line 39)
 
O3CPU: Ticking main, O3CPU.
15059411234500: system.repeat_switch_cpus1.mmu.dtb: Translating vaddr 0x7ffe39b765b0.
15059411234500: system.repeat_switch_cpus1.mmu.dtb: In protected mode.
15059411234500: system.repeat_switch_cpus1.mmu.dtb: Paging enabled.
15059411234500: system.repeat_switch_cpus1.mmu.dtb: pageAlignedVaddr for lookup: 0x7ffe39b76000
15059411234500: system.repeat_switch_cpus1.mmu.dtb: Handling a TLB miss for address 0x7ffe39b765b0 at pc 0x401b34.                    <--- First a TLB miss
15059411234500: system.repeat_switch_cpus1: Scheduling next tick!
[...]
O3CPU: Ticking main, O3CPU.
15059411262000: system.repeat_switch_cpus1: Scheduling next tick!
15059411262500: system.repeat_switch_cpus1.mmu.dtb.walker: Got long mode PTE entry 0x00000019832067.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: Translating vaddr 0x7ffe39b765b0.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: In protected mode.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: Paging enabled.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: pageAlignedVaddr for lookup: 0x7ffe39b76000
15059411262500: system.repeat_switch_cpus1.mmu.dtb: Entry found with paddr 0x19832000, doing protection checks.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0
15059411262500: system.repeat_switch_cpus1.mmu.dtb: Translated 0x7ffe39b765b0 -> 0x198325b0.                                                 <--- Translated virt to phys
[...]
O3CPU: Ticking main, O3CPU.
15059514670500: system.repeat_switch_cpus1.mmu.dtb: Translating vaddr 0xffff8800198325b0.
15059514670500: system.repeat_switch_cpus1.mmu.dtb: In protected mode.
15059514670500: system.repeat_switch_cpus1.mmu.dtb: Paging enabled.
15059514670500: system.repeat_switch_cpus1.mmu.dtb: pageAlignedVaddr for lookup: 0xffff880019832000
15059514670500: system.repeat_switch_cpus1.mmu.dtb: Handling a TLB miss for address 0xffff8800198325b0 at pc 0x402e09.
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e09=>0x402e10).(1=>2) [sn:251369]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e10=>0x402e13).(0=>1) [sn:251370]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e13=>0x402e15).(0=>1) [sn:251371]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e15=>0x402e17).(0=>1) [sn:251372]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e15=>0x402e17).(1=>2) [sn:251373]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e15=>0x402e17).(2=>3) [sn:251374]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e17=>0x402e1e).(0=>1) [sn:251375]
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251369] PC (0x402e09=>0x402e10).(1=>2)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251370] PC (0x402e10=>0x402e13).(0=>1)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251371] PC (0x402e13=>0x402e15).(0=>1)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251372] PC (0x402e15=>0x402e17).(0=>1)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251373] PC (0x402e15=>0x402e17).(1=>2)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251374] PC (0x402e15=>0x402e17).(2=>3)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251375] PC (0x402e17=>0x402e1e).(0=>1)
15059514670500: system.repeat_switch_cpus1: Scheduling next tick!
[...]
O3CPU: Ticking main, O3CPU.
15059514683000: system.repeat_switch_cpus1: Scheduling next tick!
15059514683500: system.repeat_switch_cpus1.mmu.dtb.walker: Got long mode PML4 entry 0x00000000000000.
15059514683500: system.repeat_switch_cpus1.mmu.dtb.walker: Raising page fault.
[...]
O3CPU: Ticking main, O3CPU.
15059514688500: Page-Fault: RIP 0x402e1e: vector 14: #PF(0x4) at 0xffff8800198325b0
15059514688500: system.repeat_switch_cpus1: Scheduling next tick!
 
>>> This is a snippet of the debugging output.
 
 
For more context: https://github.com/IAIK/meltdown/blob/master/reliability.c  (kaslr disabled in gem5 full-system simulation kernel command line)
- First, the address is translated from virt to phys without a problem (line 30)
- Next, the code wants to access the translated kernel address (line 49). Here seems to be the problem. It gets a TLB miss for the address, but after that the PageTableWalker gets the PML4 entry 0x00000000000000 and raises a Page fault.
- My expectation (and goal) is, that during the read of the kernel address, the Page Table Walk is successfull until the Page Table Entry.
 
Now I have a few questions:
 
1. After the TLB miss at tick 15059514670500, the CPU removes many commited instructions at the PC the miss occured. Why are these instructions commited, although the Page Fault is being raised?
2. Does anyone have an idea, why the Page Fault already occurs at the PML4 entry level? And why this entry is only 0x0?
 
 
Thank you again in advance. I am very happy if someone could help or clarify this.
 
Kind regards,
Robin
 
 
Gesendet: Montag, 09. Oktober 2023 um 14:29 Uhr
Von: "Yuan Yao via gem5-users" <gem5-users@gem5.org>
An: "The gem5 Users mailing list" <gem5-users@gem5.org>
Cc: "Yuan Yao" <yuan....@it.uu.se>
Betreff: [gem5-users] Re: Squashing Instructions after Page Table Fault

Hi Robin,

    The "Page-Fault" message is printed out on the constructor of a fault, so gdb that line and move up frames can help.

    By the way, a page fault can also be generated during page walks (see here). The faulty PTE is not inserted into TLB. Debug flag PageTableWalker tracks all these errands. 

    Hope this helps.

Br,

Y.

On 10/9/23 13:37, reverent.green--- via gem5-users wrote:
Hey Eliot,
 
thank you for your help. I experimented with the checks and I was a bit suprised, that the Page Fault seems not to be raised after a unsuccessful user/supervisor check. After enabling the necessary debug flags and including more Debug statements into the code, I observed that the Page Fault is not raised after entering the If-statement, but before it. Here is a short snippet of my outputs:
 
14442496349500: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0            (Line 470)
14442496349500: system.repeat_switch_cpus5.mmu.dtb: Checks done!                                                      (Line 485)
14442496350000: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0
14442496350000: system.repeat_switch_cpus5.mmu.dtb: Checks done!
14442496361000: Page-Fault: RIP 0x402da9: vector 14: #PF(0x4) at 0xffff880019688110
14442496387000: system.repeat_switch_cpus5.mmu.itb: inUser = 1 | entry_user = 0 | badWrite = 1
14442496387000: system.repeat_switch_cpus5.mmu.itb: ***************************** If [Line 471]. *****************************************
14442496424000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1
14442496424000: system.repeat_switch_cpus5.mmu.dtb: Checks done!
14442496464000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1
14442496464000: system.repeat_switch_cpus5.mmu.dtb: Checks done!
 
I expected, that the Page Fault is raised at line 476, but it doesn't seem so.
 
For further context, my goal is to get this code (https://github.com/IAIK/meltdown/blob/master/reliability.c) working in gem5. Currently, "libkdump_read" (https://github.com/IAIK/meltdown/blob/master/libkdump/libkdump.c#L528) only returns 0 in gem5.
 
My guess is, that I need to change much more than initially thought. With reference to the answer of Yuan, I guess that I also need to change stuff in the function chain for handling a fault. Can anyone confirm this?
 
Best regards,
Robin
 
 
Gesendet: Mittwoch, 04. Oktober 2023 um 17:00 Uhr
Von: "Eliot Moss via gem5-users" <gem5-users@gem5.org>
An: "The gem5 Users mailing list" <gem5-users@gem5.org>, yuan....@it.uu.se
Cc: reverent.gr...@web.de, "Eliot Moss" <m...@cs.umass.edu>
Betreff: [gem5-users] Re: Squashing Instructions after Page Table Fault
On 10/4/2023 10:03 AM, reverent.green--- via gem5-users wrote:
> Hi Yuan,

> thank you very much for your detailed response. My understanding of the
> fault handling in gem5 is getting better and better. Using debug flags, I
> can trace the control flow during the execution of my code.

> I am currently inspecting tlb.cc in further detail, but I am still searching
> for the exact check for my problem. To further specify my question:

> During the attempt to access kernel memory, the “user/supervisor” (U/S)
> pagetable attribute is used to check whether this page table belongs to
> kernel memory or not. If I want to access the memory, it should raise the
> page table fault. I am looking for this specific check. My goal is, to
> experiment with gem5 and to customize it. Currently, the instruction is not
> executed when raising a Page Table Fault. In a first step, I want to change
> the check in order to execute the instruction although it wants to access
> kernel memory. So I explicitly search for this check inside this command
> chain during the Page Fault handling.

> Thank you very much in advance.

> Best regards

> Robin

Assuming we're talking about the x86 architecture, line 471 in tlb.cc is where
the check in question happens:

https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/tlb.cc#L471

Note that the raw bits of the PTE have been abstracted out in the gem5 TLB
entry data structure, hence properties such as entry->user.

HTH

Eliot Moss
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

 

VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
 
 
 
 
 
 
 

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to