Hi Simon, On Wed, Dec 7, 2011 at 2:25 PM, Simon Glass <s...@chromium.org> wrote: > Hi Graeme, > > On Tue, Dec 6, 2011 at 6:56 PM, Graeme Russ <graeme.r...@gmail.com> wrote: >> Hi Simon >> >> Sorry for the late review on this - I've only just now had a good >> chance to look at it... >> > > Thanks for the comments. > >> On Wed, Dec 7, 2011 at 12:56 PM, Simon Glass <s...@chromium.org> wrote: >>> Hi Tom, >>> >>> On Mon, Nov 28, 2011 at 3:45 PM, Tom Rini <tom.r...@gmail.com> wrote: >>>> On Mon, Nov 21, 2011 at 4:57 PM, Simon Glass <s...@chromium.org> wrote: >>>>> This is the second patch series aiming to unify the various board.c files >>>>> in each architecture into a single one. This series creates a libboard >>>>> library and implements relocation in it. It then moves ARM over to use >>>>> this framework, as an example. >>>>> >>>>> On ARM the relocation code is duplicated for each CPU yet it >>>>> is the same. We can bring this up to the arch level. But since (I believe) >>>>> Elf relocation is basically the same process for all archs, there is no >>>>> reason not to bring it up to the generic level. >>>>> >>>>> This series establishes a new libboard library in the board/ subdir and >>>>> puts some relocation code in it. Each architecture which uses this >>>>> framework needs to provide a function called arch_elf_relocate_entry() >>>>> which processes a single relocation entry. If there is concern about >>>>> calling a function for all 2000-odd relocations then I can change this. >> >> Ugh, that overhead is a killer - The relocation can be done entirely in a >> tight loop with a switch statement in your reloc_elf() function. But I >> think reloc_elf() should be weak - x86 relocation can be done in a few >> lines of assembler which is smaller and faster than a C implementation >> will ever be > > I will benchmark it tomorrow. If we move the for loop into the > architecture-specific code then it goes away. Just seemed to me that > the less duplicated code we have the better. We could use an inline > function but I'm not keen on that.
This may be one case were performance outweighs the desire to be generic. The nice thing is we can provide a generic function which will work for all arches (albiet a little big and slow) - This will help when switching an arch over as there is, at least, a default implementation much like all the string functions (off topic, but why aren't they weak instead of #defined) >>>>> For ARM, a new arch/arm/lib/proc.S file is created, which holds generic >>>>> ARM assembler code (things that cannot be written in C and are common >>>>> functions used by all ARM CPUs). This helps reduce duplication. Interrupt >>>>> handling code and perhaps even some startup code can move there later. >>>>> >>>>> It may be useful for other architectures to have a similar file. >>>>> >>>>> This series moves ARM over to use this framework. Overall this means that >>>>> two new files are required 'early' in boot: board/reloc.c and >>>>> arch/arm/lib/proc.S. This is tricky mainly due to SPL. I believe that >>>>> we may need to adjust link scripts to put these two files early in the >>>>> link scripts also. But I am not sure about this and can't actually find >>>>> a problem as yet. I would much prefer to solve this with a new section >>>>> name like .text.early if we can. >>>>> >>>>> (I should really cc all arch maintainers but I think in that case I get >>>>> an error from the list server. Not sure what the limit is.) >>>>> >>>>> Comments please... >> >> The whole sequence of the series seems a little bit 'random' to me - I >> think patches 2 and 3 can be squahed and the dead code removal in patch 7 >> should be done in the same patch(es) that create the dead code. >> > Yes I can clean this up. > >> It looks like reloc_make_copy(), reloc_clear_bss() and reloc_elf() have >> been poached from x86 - The copy and clear loops are quite inefficient. I >> have patches at home which change these to use memcpy and memset > > Yes they have - in fact I have a series here which unifies board.c > completely for ARM and x86. But I need to take this in baby steps as I > said. > >> >>>> >>>> Seems like a good idea. Got some size's before and after? Maybe some >>>> boot times too (as you mention 2000-odd function calls during >>>> relocation) ? >>>> >>>> -- >>>> Tom >>> >>> Yes I have this info - will package it up tomorrow along with a new >>> series to deal with review comments. >> >> As far as I understand it, the long-term plan is to unify the board init >> sequence across all architectures. I can see three approaches: >> >> 1) Tweak each arch until they are all the same and then 'throw the switch' >> 2) Pick one or two arches and 'throw the switch' on them and migrate the >> others later >> 3) Pull out common bits and pieced and deal with the left-overs later >> >> It looks like you've opted for number 3. The problem I see in this approach >> is that you are relying on the other arches to follow your lead which is, at >> best, a 50/50 bet. > > Sort of. Actually I have gone for option 2. But rather than send a > series which switches over ARM and then x86 (about 40 patches) I have > decided to do it in steps. Relocation is the first step. OK, sounds like a plan >> Now I look at board_init_f() and board_init_r() in ARM and x86 and think >> 'why on earth isn't most of than in the init loops?'. The boot sequence >> for ARM and x86 should be reducible to (all arches are generally pretty >> close to this, so it should not introduce too much pain when they decide >> to switch) > > Yes my generic board.c consists of a couple of init loops. I can email > the patch if you like but it is not ready for the mailing list. Hmm, I don't think I have the time right now >> - Low level init (reset vector etc) >> - Setup temporary stack and gd storage (typically in CPU cache) >> + init_f loop <-- SDRAM initialised >> - Move gd to RAM >> - Setup new stack in SDRAM >> - Turn on caching (cannot be done while gd and stack in cache!!!) >> + Calculate relocation address >> + Copy to RAM >> + Do relocation fixups >> + Clear BSS >> - Jump to RAM >> + init_r loop >> + main_loop() >> >> + = common code >> - = arch, SoC, or board specific >> >> Note: Typically the boot sequence currently has 'Turn on caching' after jump >> to RAM - I have patches in the works which change this for x86 >> >> So my vote would be #2 - Unify the ARM and x86 and bring any code that can >> be inside the init loops in and then switch both over to a generic init >> implementation. Other arches can be switched later > > Agreed, but I can't send a series that big, so I am doing it in stages. Excellent - I don't have a huge amoun of spare time, but if you let me know what you need, I can probably give x86 a little prod here and there to align it with what you are doing. I already have a few patches, and Gabe has been doing a bit as well in order to make the coreboot port integrate more seamlessly. Regards, Graeme _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot