On 7/10/2018 11:11 AM, Marek Vasut wrote: > On 07/10/2018 02:10 PM, Jason Rush wrote: >> On 7/9/2018 3:08 AM, Marek Vasut wrote: >>> On 07/07/2018 12:56 AM, Jason Rush wrote: >>>> On 7/5/2018 6:10 PM, Marek Vasut wrote: >>>>> On 07/06/2018 01:11 AM, Jason Rush wrote: >>>>>> On 7/4/2018 2:23 AM, Marek Vasut wrote: >>>>>>> On 07/04/2018 01:45 AM, Jason Rush wrote: >>>>>>>> On 7/3/2018 9:08 AM, Marek Vasut wrote: >>>>>>>>> On 07/03/2018 03:58 PM, Jason Rush wrote: >>>>>>>>>> On 6/29/2018 10:17 AM, Marek Vasut wrote: >>>>>>>>>>> On 06/29/2018 05:06 PM, Jason Rush wrote: >>>>>>>>>>>> On 6/29/2018 9:52 AM, Marek Vasut wrote: >>>>>>>>>>>>> On 06/29/2018 04:44 PM, Jason Rush wrote: >>>>>>>>>>>>>> On 6/29/2018 9:34 AM, Marek Vasut wrote: >>>>>>>>>>>>>>> On 06/29/2018 04:31 PM, Jason Rush wrote: >>>>>>>>>>>>>>>> Dinh, >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> A while ago, you posted the following patchset for SoCFPGA to >>>>>>>>>>>>>>>> add the PL330 >>>>>>>>>>>>>>>> DMA driver, and updated the SoCFPGA SDRAM init to write zeros >>>>>>>>>>>>>>>> to SDRAM to >>>>>>>>>>>>>>>> initialize the ECC bits if ECC was enabled: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> https://lists.denx.de/pipermail/u-boot/2016-October/269643.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I know it's been a long time, so I'll summarize some of the >>>>>>>>>>>>>>>> conversation... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> At the time, you had a problem with the patchset causing the >>>>>>>>>>>>>>>> SPL to fail to >>>>>>>>>>>>>>>> find the MMC. You had tracked it down to an issue with the >>>>>>>>>>>>>>>> following commit >>>>>>>>>>>>>>>> "a78cd8613204 ARM: Rework and correct barrier definitions". >>>>>>>>>>>>>>>> You and Marek >>>>>>>>>>>>>>>> discussed it a bit, but I don't think there was a real >>>>>>>>>>>>>>>> conclusion. You >>>>>>>>>>>>>>>> submitted a second version of the patchset asking for advice >>>>>>>>>>>>>>>> on debugging >>>>>>>>>>>>>>>> the issue: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> https://lists.denx.de/pipermail/u-boot/2016-December/275822.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> No real conversation came from the second patchset, and that >>>>>>>>>>>>>>>> was the end of >>>>>>>>>>>>>>>> the patch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I was hoping we could revisit adding your patchset again. I am >>>>>>>>>>>>>>>> working on a >>>>>>>>>>>>>>>> custom SoCFPGA board with a Cyclone V and ECC SDRAM. I rebased >>>>>>>>>>>>>>>> your patchset >>>>>>>>>>>>>>>> against v2018.05 and it is working on my custom board >>>>>>>>>>>>>>>> (although I don't have >>>>>>>>>>>>>>>> an MMC). I also tested it on a SoCKit booting from an MMC (I >>>>>>>>>>>>>>>> forced it to >>>>>>>>>>>>>>>> scrub the SDRAM on the SoCKit, because it doesn't have ECC >>>>>>>>>>>>>>>> RAM), and the >>>>>>>>>>>>>>>> SoCKit finds the MMC and boots. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don't have any suggestions on why it is working now on my >>>>>>>>>>>>>>>> board and not >>>>>>>>>>>>>>>> back when you first submitted the patchset. Maybe something >>>>>>>>>>>>>>>> else was fixed >>>>>>>>>>>>>>>> in the MMC? I was hoping you and Marek could test this patch >>>>>>>>>>>>>>>> again on some >>>>>>>>>>>>>>>> different SoCFPGA boards to see if you get the same results. >>>>>>>>>>>>>>> Look at this patch >>>>>>>>>>>>>>> http://git.denx.de/?p=u-boot/u-boot-socfpga.git;a=commit;h=9bb8a249b292d26f152c20e3641600b3d7b3924b >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You likely want similar approach, it's faster then the DMA and >>>>>>>>>>>>>>> much simpler. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks Marek. I'll give it a try. Would you be interested in a >>>>>>>>>>>>>> similar patch for the Gen 5? >>>>>>>>>>>>> I don't have any Gen5 board which uses ECC, do you ? >>>>>>>>>>>>> If so, yes, prepare a patch, it should be very similar. >>>>>>>>>>>>> >>>>>>>>>>>>> Make sure to measure how long it takes to scrub the memory and >>>>>>>>>>>>> how much >>>>>>>>>>>>> memory you have, I'd be interested in the numbers. >>>>>>>>>>>>> >>>>>>>>>>>> Looking at the master branch, it doesn't look like that code is >>>>>>>>>>>> ever being called? >>>>>>>>>>>> The sdram_init_ecc_bits() function is called from the >>>>>>>>>>>> ddr_calibration_sequence function(), >>>>>>>>>>>> but I can't find where ddr_calibration_sequence is called(). >>>>>>>>>>> git grep for it, it's called from somewhere in the >>>>>>>>>>> arch/arm/mach-socfpga/ >>>>>>>>>>> >>>>>>>>>>>> Either way, I can test it. I have a custom Cyclone V board with >>>>>>>>>>>> ECC, and the Intel Arria V SoC >>>>>>>>>>>> Dev Kit I can test it on too which I think has ECC. >>>>>>>>>>> Please do. >>>>>>>>>>> >>>>>>>>>> I implemented a similar memset approach for the gen 5 socfpga. It's >>>>>>>>>> basically the same >>>>>>>>>> code as in that patch; however, when I performed a single memset the >>>>>>>>>> processor would >>>>>>>>>> reset for some reason. I changed it to loop over calling memset >>>>>>>>>> with a size of 32MB over >>>>>>>>>> the entire address the address, and that worked as opposed to doing >>>>>>>>>> a single memset on >>>>>>>>>> the RAM. >>>>>>>>> Can you do grep MEMSET .config in your U-Boot build dir ? The arch >>>>>>>>> memset is implemented in assembler and doesn't trigger WDT , so if it >>>>>>>>> takes too long, it could be that the WDT resets the platform. >>>>>>>> Both CONFIG_USE_ARCH_MEMSET and CONFIG_SPL_USE_ARCH_MEMSET >>>>>>>> are set in my .config, so it must be the WDT triggering as you suspect. >>>>>>>> >>>>>>>>>> I started on a SoCKit because it was handy, I know it doesn't have >>>>>>>>>> ECC >>>>>>>>> It doesn't by default. >>>>>>>>> >>>>>>>>>> , but I forced it to >>>>>>>>>> initialize the RAM as a quick test. It seems much slower than the >>>>>>>>>> DMA approach. It >>>>>>>>>> should be noted, I didn't implement any code to time the scrubbing, >>>>>>>>>> but rather just >>>>>>>>>> roughly monitored the time to get a rough idea of how long it took. >>>>>>>>>> >>>>>>>>>> On the SoCKit, which has 1GB of RAM, the memset takes around 8 >>>>>>>>>> seconds to complete, >>>>>>>>>> and the DMA takes under 2 seconds. >>>>>>>>> Did you enable i/d cache in the SPL ? It's mandatory, otherwise it's >>>>>>>>> slow. >>>>>>>> I have calls to icache_enable() and dcache_enable() just as you do in >>>>>>>> the Arria 10 sdram_init_ecc_bits() function. >>>>>>>> >>>>>>>> I did double check that both these enable functions call the versions >>>>>>>> of the functions in the ./arch/arm/lib/cache-cp15.c file that are >>>>>>>> implemented in the SPL. So I believe that both icache and dcache is >>>>>>>> enabled. >>>>>>> Are you sure it's not just the stubs that are called ? Or that the code >>>>>>> doesn't skip the dcache enabling due to some funny stuff, like MMU being >>>>>>> already enabled ? >>>>>> I added prints to ensure it is calling the real >>>>>> icache_enable()/dcache_enable() >>>>>> functions, and not the stubs. >>>>>> >>>>>>>> I probably should have added a print of icache_status() and >>>>>>>> dcache_status() to verify the caches are enabled. I'll add that >>>>>>>> tomorrow. >>>>>>> Yes, you really should verify that the dcache was enabled. >>>>>>> >>>>>>>>> Just be careful about the MMU tables placement, they are big and >>>>>>>>> if you place them in RAM, make sure you don't overwrite them with the >>>>>>>>> memset. The trick might be to memset the first 1 MiB of RAM, then put >>>>>>>>> MMU tables at some offset therein (since 0x0 can be used for ARM >>>>>>>>> vectors) and then turn on i/d cache and memset the rest. >>>>>>>> That is essentially what I am doing I believe, with the exception that >>>>>>>> I >>>>>>>> am only clearing the first 32KiB before initializing the MMU table >>>>>>>> (which >>>>>>>> is what you did in the Arria 10 version). >>>>>>>> >>>>>>>> I modeled my code almost identically to yours with the exception that >>>>>>>> I loop over the memset calls 32MiB at a time. Here's the order of >>>>>>>> operations I perform: >>>>>>>> >>>>>>>> 1. icache_enable() >>>>>>>> 2. memset the first 0x8000 bytes to zero >>>>>>>> 3. setup gd->arch.tlb_arch and gd->arch.tlb_size >>>>>>>> 4. dcache_enable() >>>>>>>> 5. loop over remaining memory, memsetting 32MiB at a time to zero >>>>>>>> 6. flush_dcache_all() >>>>>>>> 7. dcache_disable() >>>>>>>> >>>>>>>> It looks like the call to dcache_enable is what sets up the MMU tables. >>>>>>>> I suspect that's why you did a memset of the first 32KiB before >>>>>>>> enabling >>>>>>>> the dcache on the Arria 10. I think the MMU is initialized okay since >>>>>>>> the >>>>>>>> SPL keeps executing, u-boot loads, and Linux boots after running the >>>>>>>> above (maybe that's not a fair assumption). >>>>>>> I had to write zeroes to the first 32kiB to init the ECC counters before >>>>>>> putting MMU tables there. >>>>>>> >>>>>>> You really should double check if the MMU and dcache are enabled, 8 >>>>>>> seconds to scrub the memory is too long I think. >>>>>> I added checks to verify that the MMU, icache, and dcache are all setup >>>>>> and >>>>>> enabled. >>>>>> >>>>>> Calling icache_enable() set the CR_I bit (Icache enable) in the CR >>>>>> (control >>>>>> register). Then calling dcache_enable() called the mmu_setup() function, >>>>>> which setup the MMU and set the CR_M bit (MMU enable) in the CR, and >>>>>> finally dcache_enable() set the CR_C bit (Dcache enable) bit in the CR. >>>>>> >>>>>> I also printed out the control register before the memset calls, and it >>>>>> indicated that the mmu, icache, and dcache were enabled. >>>>> Is the DRAM area set as cacheable in the MMU tables ? >>>>> >>>> Good news bad news... The MMU tables weren't being set up because the >>>> bd->bi_dram[bank].start and bd->bi_dram[bank].size weren't set up. As a >>>> quick >>>> test, I hardcoded start to 0 and size to 1GiB. After that, the memset was >>>> really quick, U-Boot loads, Linux loads, and everything seems to work >>>> great. >>> Good. >>> >>>> However, if I press the HPS_RST push button on the SoCKit (which is >>>> connected >>>> to power on reset), occasionally U-Boot will lock up while booting. It >>>> always >>>> boots and operates correctly from the initial power on, but it almost >>>> always >>>> fails to boot after pressing the HPS_RST button. >>>> >>>> Usually after pressing the HPS_RST button, U-Boot makes it past the SPL, >>>> and >>>> hangs somewhere after the call to setup_reloc() in ./common/board_f.c. >>>> Once >>>> it hangs there, pressing the HPS_RST button again usually causes the SPL to >>>> hang while setting up the MMU (before my call to memset). Eventually the >>>> WDT kicks in, and it just keeps hanging up in the same place. Once it >>>> gets in >>>> this mode, the only way to recover it is by toggling power on the board. >>>> >>>> I spent a bunch of time today trying to track down where it was hanging, >>>> but >>>> I couldn't pin point anything. The MMU tables looked correct. The MMU >>>> registers looked good. I'm not sure the best way to debug what's going on. >>> Try triggering warm reset and cold reset via the reset register: >>> >>> mw 0xffd05004 1 >>> mw 0xffd05004 2 >>> >>> Does it hang in one case and not in the other ? >>> >> It hangs in both cases. >> >> I did find that if I do not metset the last 1MiB of DRAM with the cache on, >> both warm and cold resets work. >> >> I changed the ecc scrubbing to zero out the first 0x8000 bytes and the last >> 0x10000 bytes before the MMU is setup and I enable dcache. Then with >> the dcache enabled, I zero out the rest of memory. The resets work in this >> case as well. So there seems to be some side effect of clearing out the >> relocate address space with the cache on. > Can you investigate ? > I'd be happy to investigate more, but I'm not really sure what my next step should be.
Something appears to be happening differently when U-Boot relocates if the dcache is on. But don't know how to track it down. I was thinking I might dump the DRAM where U-Boot relocates to both with the dcache on and off, and see if there are any differences. I'm not really sure what that tells me though if I find a difference. Any suggestions? Regards, Jason _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot