Hi, Am 19.03.2015 um 09:34 schrieb Robert Kaiser: > Hi Alex, > > Am 18.03.2015 um 23:20 schrieb Alexander Kroh: >> Hi Robert, >> >> Yes, the async abort is caused by access to a physical address which is not >> backed by memory or registers, regardless of virtual address translation. > OK, so: iff the page table contains a mapping for user space address > 0x13294, but (due to a bug in the page table initialization) that page > is mapped to a page frame which is not backed by RAM (or ROM), then, an > attempt to execute user code at that address would cause an async abort. > Is that correct? > > If so, it would be great if someone could point me to the code that sets > up the page table entries for the > first user space thread. (I already did an unsuccessful search for this > in the board specific initialization code but I can not say that I fully > understand that code, so I may well have overlooked something.. ) > >> You could try masking IRQs to further isolate the interrupt as the trigger.
I tried this: result: No interrupt before start of user code, async fault still occurs in the same way as before -> I guess this shows that the interrupt has nothing to do with it. >> Another option is to mask the async abort. You might find additional >> symptoms which will help to identify the issue. Now, that was interesting: After disabling the async abort in user mode (it is always disabled in kernel mode), the board starts executing the test suite! It runs a few tests successfully, but then crashes with a *kernel* data abort when running test "Run threads in domains()". There goes my theory about a memory mapping issue, I guess. But how can it have a kernel mode data abort when it is disabled? Any ideas? Cheers Robert >> - Alex >> >> ________________________________________ >> From: Robert Kaiser [[email protected]] >> Sent: Wednesday, 18 March 2015 19:27 >> To: Alexander Kroh >> Cc: [email protected] >> Subject: Re: [seL4] Wandboard Port >> >> Hi Alex >> >> Am 16.03.2015 um 02:52 schrieb Alexander Kroh: >>> On Sun, 2015-03-15 at 15:33 +0100, Robert Kaiser wrote: >>>> Am 15.03.2015 um 11:23 schrieb Alexander Kroh: >>>>> Hi Robert, >>>>> >>>>> The FSR value of 0x1c06 represents an asynchronous abort. In this case, >>>>> the address reported cannot be trusted! >>>> [...] >>>>> The abort occurs when a physical address is accessed that has no valid >>>>> backing RAM or device register. >>>> So, could it also happen when accessing a virtual address that is mapped >>>> to an invalid physical address (that might explain what I'm seeing)? >>> The virtual to physical address translation has been completed >>> successfully, else you would get an synchronous abort. The key here is >>> that there was a problem with the underlying physical address. >> Thats what I meant to suggest: If the virtual address is correctly >> translated to a physical address by the MMU, but that physical address >> is not backed by memory or registers, could that also generate this kind >> of exception? >> >>>>> We have had lots of fun with this feature on the SabreLite. Common >>>>> causes are: >>>>> * Accessing device registers that do exist (some devices have voids in >>>>> the middle of their address map). >>>>> * If you (for some reason) map a device with the cacheable attribute, all >>>>> addresses which would be used to fill the cache line must be valid >>>>> (again, watch out for voids). >>>>> * Some UART registers are unavailable when the appropriate enable bits >>>>> are not set. >>>>> >>>>> My advice to you is to check that you are using the correct physical >>>>> address for your device mappings (Including the kernel IRQ controller and >>>>> timer). >>>>> >>>>> Also, the first printf at userspace may trigger the initialisation of the >>>>> default UART (which will be incorrect in your case). >>>>> https://github.com/seL4/libplatsupport/blob/master/plat_include/imx6/platsupport/plat/serial.h#L40 >>>> Thanks for this hint! That would have been the next thing for me to >>>> stumble over. However, quickliy fixing it had no effect on my current >>>> problem. >>>> >>>>> There may also be slight differences in the availability of device >>>>> registers between the 2 SoCs. >>>> Is that really a possibility, given that U-boot reports the same chip >>>> revision on both boards? >>> It is unlikely, but it is still a possibility. Is it only the ARM chip >>> revisions that match or also the i.MX6 chip revisions? >> Hmm, I'm sure I saw exactly the same outputs from both boards at some >> point, however, in the meantime I have re-flashed U-Boot on both of >> them. The situation now is that on the Sabre, U-Boot reports >> >> "CPU: Freescale i.MX6 family TO1.2 at 792 MHz" >> >> while on the wand it says: >> >> "CPU: Freescale i.MX6Q rev1.2 at 792 MHz" >> >> No idea wether that "1.2" refers to the core or the SoC. >> >> >> >>>> [...] >>>> Wish I had a JTAG-debugger.... >>>> >>>> What I am still uncertain about is wether a fault upon entering user >>>> code is to be expected, i.e. do those pages get mapped in by a page >>>> fault handler or are they pre-mapped before the code is invoked? >>> The fault is unexpected. The pages are pre-mapped by the kernel, but >>> again, this is not a virtual memory mapping issue. >>> However, one thing that is typical is the occurrence of an IRQ exception >>> as soon as the mode switch to user space occurs. >> Indeed, that happens! I'm consistently seeing a timer interrupt at this >> point. Probably it has been pending for a while and fires as soon as the >> interrupt mask is dropped. Apart from its housekeeping work, this timer >> ISR does a few hardware accesses to the "private timer" and the >> interrupt controller (both components, as I understand, are part of the >> A9 core). >> >> I tried putting isb/dmb and dsb instructions right after these hardware >> accesses, hoping this might change the behaviour in some way, thus >> indicating which of them triggered the async fault. Alas, no effect at >> all :-(. >> >>> One thing to try is to insert an "isb" instruction just before switching >>> to user space. This will ensure that all memory accesses are completed >>> before continuing and it will force the asynchronous abort to occur at >>> this instruction rather than some future instruction, when the >>> load/store buffer finally drains. >>> You should also add an isb here in case you are returning from an IRQ: >>> https://github.com/seL4/seL4/blob/master/src/arch/arm/traps.S#L49 >> I also tried this. And I tried sequences of dmb, dsb and isb >> instructions. All of this had no visible effect. The behaivour stays the >> same all the time: upon leaving privileged mode, the interrupt fires, >> gets serviced, then the async fault happens. I know the fault address >> can not be trusted, but it never changed during these experiments. No >> matter where in the ISR or else i placed those isb instructions, it >> always pointed to the entry point of the user code. >> >> Any suggestions how to further systematically pinpoint this problem? >> >> Thanks in advance for any help. >> >> Robert >> >>> - Alex >>> >>> >>>> Again, thanks for any help >>>> >>>> Cheers >>>> >>>> Robert >>>> >>>> >>>> >>>>> - Alex >>>>> >>>>> >>>>> ________________________________________ >>>>> From: Devel [[email protected]] on behalf of Robert Kaiser >>>>> [[email protected]] >>>>> Sent: Sunday, 15 March 2015 19:03 >>>>> To: [email protected] >>>>> Subject: [seL4] Wandboard Port >>>>> >>>>> Hello, >>>>> >>>>> in an attempt to familiarize myself with the seL4 code, I am trying to >>>>> "port" it to the Wandboard (see www.wandboard.org). This should be an >>>>> easy task for a beginner (thought I) since the board is very similar to >>>>> the SabeLite, and seL4 is already running well on that board. I have >>>>> access to a SabreLite and a Wandboard Quad, both (according to U-boot) >>>>> have the same revision of the iMX6 SoC installed. >>>>> >>>>> Differences between the Sabre and the Wand I have noticed so far are: >>>>> >>>>> - 2GB of RAM from (0x10000000 to 0x90000000) on the Wand (Sabrelite has >>>>> 1GB) >>>>> - Wand uses UART1 for debug output, Sabrelite: UART2 >>>>> >>>>> I compiled an sel4test project where I adapted the UART port in >>>>> kernel/include/plat/imx6/plat/machine/devices.h and >>>>> elfloader/src/arch-arm/plat-imx6/platform.h and the RAM size in kernel >>>>> src/plat/imx6/machine/hardware.c. When I boot this system, I get: >>>>> >>>>> Jumping to kernel-image entry point... >>>>> Bootstrapping kernel >>>>> Caught cap fault in send phase at address 0x0 >>>>> while trying to handle: >>>>> vm fault on data at address 0x9f11c2e0 with status 0x1c06 >>>>> in thread 0xffdfad00 at address 0x13294 >>>>> >>>>> (Needless to say, "all is well in the universe" on the SabreLite... ) >>>>> What is not shown here are a ton of other debug messages which I have >>>>> added to convince myself that kernel initialization completes as >>>>> expected. The crash seems to happen upon entry into user code. The >>>>> address 0x13294 is the virtual address of the entry point: >>>>> >>>>> $ nm build/arm/imx6/sel4test-driver/sel4test-driver.bin | grep 13294 >>>>> 00013294 T _sel4_start >>>>> >>>>> I suspect that this fault happens on opcode fetch, because the user code >>>>> is not properly mapped when invoked. Does "status 0x1c06" confirm this? >>>>> >>>>> If so, *should* the code be mapped at this point or are these mappings >>>>> expected to be installed "on demand", i.e. through page fault handling? >>>>> >>>>> Thanks for any help... >>>>> >>>>> Robert >>>>> >>>>> >>>>> -- >>>>> Robert Kaiser >>>>> Computer Engineering >>>>> RheinMain University of Applied Sciences >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Devel mailing list >>>>> [email protected] >>>>> https://sel4.systems/lists/listinfo/devel >>>>> >>>>> ________________________________ >>>>> >>>>> The information in this e-mail may be confidential and subject to legal >>>>> professional privilege and/or copyright. National ICT Australia Limited >>>>> accepts no liability for any damage caused by this email or its >>>>> attachments. >> -- >> Prof. Dr. Robert Kaiser >> >> Technische Informatik >> Hochschule RheinMain >> Wiesbaden Rüsselsheim >> >> Computer Engineering >> RheinMain University of Applied Sciences >> >> [email protected] >> http://www.cs.hs-rm.de/~kaiser >> >> tel:(+49)611-9495-1292 >> fax:(+49)611-9495-1210 >> >> Postanschrift/Postal Address: >> Robert Kaiser, Hochschule RheinMain, FB DCSM/Informatik >> Unter den Eichen 5, 65195 Wiesbaden, Germany >> >> -- Robert Kaiser Computer Engineering RheinMain University of Applied Sciences _______________________________________________ Devel mailing list [email protected] https://sel4.systems/lists/listinfo/devel
