Re: [seL4] Wandboard Port

Robert Kaiser Fri, 20 Mar 2015 12:23:51 -0700

Hi,

Am 19.03.2015 um 09:34 schrieb Robert Kaiser:
> Hi Alex,
>
> Am 18.03.2015 um 23:20 schrieb Alexander Kroh:
>> Hi Robert,
>>
>> Yes, the async abort is caused by access to a physical address which is not 
>> backed by memory or registers, regardless of virtual address translation.
> OK, so: iff the page table contains a mapping for user space address
> 0x13294, but (due to a bug in the page table initialization) that page
> is mapped to a page frame which is not backed by RAM (or ROM), then, an
> attempt to execute user code at that address would cause an async abort.
> Is that correct?
>
> If so, it would be great if someone could point me to the code that sets
> up the page table entries for the
> first user space thread. (I already did an unsuccessful search for this
> in the board specific initialization code but I can not say that I fully
> understand that code, so I may well have overlooked something.. )
>
>> You could try masking IRQs to further isolate the interrupt as the trigger.


I tried this: result: No interrupt before start of user code, async
fault still occurs in the same way as before -> I guess this shows that
the interrupt has nothing to do with it.

>> Another option is to mask the async abort. You might find additional 
>> symptoms which will help to identify the issue.

Now, that was interesting: After disabling the async abort in user mode
(it is always disabled in kernel mode), the board starts executing the
test suite! It runs a few tests successfully, but then crashes with a
*kernel* data abort when running test "Run threads in domains()". There
goes my theory about a memory mapping issue, I guess. But how can it
have a kernel mode data abort when it is disabled?

Any ideas?

Cheers

Robert


>>   - Alex
>>
>> ________________________________________
>> From: Robert Kaiser [[email protected]]
>> Sent: Wednesday, 18 March 2015 19:27
>> To: Alexander Kroh
>> Cc: [email protected]
>> Subject: Re: [seL4] Wandboard Port
>>
>> Hi Alex
>>
>> Am 16.03.2015 um 02:52 schrieb Alexander Kroh:
>>> On Sun, 2015-03-15 at 15:33 +0100, Robert Kaiser wrote:
>>>> Am 15.03.2015 um 11:23 schrieb Alexander Kroh:
>>>>> Hi Robert,
>>>>>
>>>>> The FSR value of 0x1c06 represents an asynchronous abort. In this case, 
>>>>> the address reported cannot be trusted!
>>>> [...]
>>>>> The abort occurs when a physical address is accessed that has no valid 
>>>>> backing RAM or device register.
>>>> So, could it also happen when accessing a virtual address that is mapped
>>>> to an invalid physical address (that might explain what I'm seeing)?
>>> The virtual to physical address translation has been completed
>>> successfully, else you would get an synchronous abort. The key here is
>>> that there was a problem with the underlying physical address.
>> Thats what I meant to suggest: If the virtual address is correctly
>> translated to a physical address by the MMU, but that physical address
>> is not backed by memory or registers, could that also generate this kind
>> of exception?
>>
>>>>>  We have had lots of fun with this feature on the SabreLite. Common 
>>>>> causes are:
>>>>> * Accessing device registers that do exist (some devices have voids in 
>>>>> the middle of their address map).
>>>>> * If you (for some reason) map a device with the cacheable attribute, all 
>>>>> addresses which would be used to fill the cache line must be valid 
>>>>> (again, watch out for voids).
>>>>> * Some UART registers are unavailable when the appropriate enable bits 
>>>>> are not set.
>>>>>
>>>>> My advice to you is to check that you are using the correct physical 
>>>>> address for your device mappings (Including the kernel IRQ controller and 
>>>>> timer).
>>>>>
>>>>> Also, the first printf at userspace may trigger the initialisation of the 
>>>>> default UART (which will be incorrect in your case).
>>>>> https://github.com/seL4/libplatsupport/blob/master/plat_include/imx6/platsupport/plat/serial.h#L40
>>>> Thanks for this hint! That would have been the next thing for me to
>>>> stumble over. However, quickliy fixing it had no effect on my current
>>>> problem.
>>>>
>>>>> There may also be slight differences in the availability of device 
>>>>> registers between the 2 SoCs.
>>>> Is that really a possibility, given that U-boot reports the same chip
>>>> revision on both boards?
>>> It is unlikely, but it is still a possibility. Is it only the ARM chip
>>> revisions that match or also the i.MX6 chip revisions?
>> Hmm, I'm sure I saw exactly the same outputs from both boards at some
>> point, however, in the meantime I have re-flashed U-Boot on both of
>> them. The situation now is that on the Sabre, U-Boot reports
>>
>> "CPU: Freescale i.MX6 family TO1.2 at 792 MHz"
>>
>> while on the wand it says:
>>
>> "CPU:   Freescale i.MX6Q rev1.2 at 792 MHz"
>>
>> No idea wether that "1.2" refers to the core or the SoC.
>>
>>
>>
>>>> [...]
>>>> Wish I had a JTAG-debugger....
>>>>
>>>> What I am still uncertain about is wether a fault upon entering user
>>>> code is to be expected, i.e. do those pages get mapped in by a page
>>>> fault handler or are they pre-mapped before the code is invoked?
>>> The fault is unexpected. The pages are pre-mapped by the kernel, but
>>> again, this is not a virtual memory mapping issue.
>>> However, one thing that is typical is the occurrence of an IRQ exception
>>> as soon as the mode switch to user space occurs.
>> Indeed, that happens! I'm consistently seeing a timer interrupt at this
>> point. Probably it has been pending for a while and fires as soon as the
>> interrupt mask is dropped. Apart from its housekeeping work, this timer
>> ISR does a few hardware accesses to the "private timer"  and the
>> interrupt controller (both components, as I understand, are part of the
>> A9 core).
>>
>> I tried putting isb/dmb and dsb instructions right after these hardware
>> accesses, hoping this might change the behaviour  in some way, thus
>> indicating which of them  triggered the async fault. Alas, no effect at
>> all :-(.
>>
>>> One thing to try is to insert an "isb" instruction just before switching
>>> to user space. This will ensure that all memory accesses are completed
>>> before continuing and it will force the asynchronous abort to occur at
>>> this instruction rather than some future instruction, when the
>>> load/store buffer finally drains.
>>> You should also add an isb here in case you are returning from an IRQ:
>>> https://github.com/seL4/seL4/blob/master/src/arch/arm/traps.S#L49
>> I also tried this. And I tried sequences of dmb, dsb and isb
>> instructions. All of this had no visible effect. The behaivour stays the
>> same all the time: upon leaving privileged mode, the interrupt fires,
>> gets serviced, then the async fault happens. I know the fault address
>> can not be trusted, but it never changed during these experiments. No
>> matter where in the ISR or else i placed those isb instructions, it
>> always pointed to the entry point of the user code.
>>
>> Any suggestions how to further systematically pinpoint this problem?
>>
>> Thanks in advance for any help.
>>
>> Robert
>>
>>>  - Alex
>>>
>>>
>>>> Again, thanks for any help
>>>>
>>>> Cheers
>>>>
>>>> Robert
>>>>
>>>>
>>>>
>>>>>  - Alex
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Devel [[email protected]] on behalf of Robert Kaiser 
>>>>> [[email protected]]
>>>>> Sent: Sunday, 15 March 2015 19:03
>>>>> To: [email protected]
>>>>> Subject: [seL4] Wandboard Port
>>>>>
>>>>> Hello,
>>>>>
>>>>> in an attempt to familiarize myself with the seL4 code, I am trying to
>>>>> "port" it to the Wandboard (see www.wandboard.org). This should be an
>>>>> easy task for a beginner (thought I) since the board is very similar to
>>>>> the SabeLite, and seL4 is already running well on that board. I have
>>>>> access to a SabreLite and a Wandboard Quad, both (according to U-boot)
>>>>> have the same revision of the iMX6 SoC installed.
>>>>>
>>>>> Differences between the Sabre and the Wand I have noticed so far are:
>>>>>
>>>>> - 2GB of RAM from (0x10000000 to 0x90000000) on the Wand (Sabrelite has 
>>>>> 1GB)
>>>>> - Wand uses UART1 for debug output, Sabrelite: UART2
>>>>>
>>>>> I compiled an sel4test project where I adapted the UART port in
>>>>> kernel/include/plat/imx6/plat/machine/devices.h and
>>>>> elfloader/src/arch-arm/plat-imx6/platform.h and the RAM size in kernel
>>>>> src/plat/imx6/machine/hardware.c. When I boot this system, I get:
>>>>>
>>>>> Jumping to kernel-image entry point...
>>>>> Bootstrapping kernel
>>>>> Caught cap fault in send phase at address 0x0
>>>>> while trying to handle:
>>>>> vm fault on data at address 0x9f11c2e0 with status 0x1c06
>>>>> in thread 0xffdfad00 at address 0x13294
>>>>>
>>>>> (Needless to say, "all is well in the universe" on the SabreLite... )
>>>>> What is not shown here are a ton of other debug messages which I have
>>>>> added to convince myself that kernel initialization completes as
>>>>> expected. The crash seems to happen upon entry into user code. The
>>>>> address 0x13294 is the virtual address of the entry point:
>>>>>
>>>>> $ nm build/arm/imx6/sel4test-driver/sel4test-driver.bin | grep 13294
>>>>> 00013294 T _sel4_start
>>>>>
>>>>> I suspect that this fault happens on opcode fetch, because the user code
>>>>> is not properly mapped when invoked. Does "status 0x1c06" confirm this?
>>>>>
>>>>> If so, *should* the code be mapped at this point or are these mappings
>>>>> expected to be installed "on demand", i.e. through page fault handling?
>>>>>
>>>>> Thanks for any help...
>>>>>
>>>>> Robert
>>>>>
>>>>>
>>>>> --
>>>>> Robert Kaiser
>>>>> Computer Engineering
>>>>> RheinMain University of Applied Sciences
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Devel mailing list
>>>>> [email protected]
>>>>> https://sel4.systems/lists/listinfo/devel
>>>>>
>>>>> ________________________________
>>>>>
>>>>> The information in this e-mail may be confidential and subject to legal 
>>>>> professional privilege and/or copyright. National ICT Australia Limited 
>>>>> accepts no liability for any damage caused by this email or its 
>>>>> attachments.
>> --
>> Prof. Dr. Robert Kaiser
>>
>> Technische Informatik
>> Hochschule RheinMain
>> Wiesbaden Rüsselsheim
>>
>> Computer Engineering
>> RheinMain University of Applied Sciences
>>
>> [email protected]
>> http://www.cs.hs-rm.de/~kaiser
>>
>> tel:(+49)611-9495-1292
>> fax:(+49)611-9495-1210
>>
>> Postanschrift/Postal Address:
>> Robert Kaiser, Hochschule RheinMain, FB DCSM/Informatik
>> Unter den Eichen 5, 65195 Wiesbaden, Germany
>>
>>

-- 
Robert Kaiser

Computer Engineering
RheinMain University of Applied Sciences



_______________________________________________
Devel mailing list
[email protected]
https://sel4.systems/lists/listinfo/devel

Re: [seL4] Wandboard Port

Reply via email to