--snip-- On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> > Thanks for working on this, Daniel. > > This is clearly stage1 material, so we've got time for a couple of > iterations to sort things out. I appreciate your feedback. I had been hoping that with no regressions this might still be eligible for stage2. Christophe never indicated either way. but the fact that he was looking at it seemed positive. I thought I would be a couple of weeks faster with this last iteration, but holidays got in the way. I actually think your comments below could all be addressable within a couple of days. But, I'm not accounting for the review process. > Firstly, the patch is very large, but contains a large number of > distinct changes, so it would really benefit from being broken down into > a number of distinct patches. This will make reviewing the individual > changes much more straight-forward. I have no context for "large" or "small" with respect to gcc. This patch comprises about 30% of a previously-monolithic library that's been shipping since ~2016 (the rest is libm material). Other than (1) the aforementioned change to div0(), (2) a nascent adaptation for __truncdfsf2() (not enabled), and (3) the gratuitous addition of the bitwise functions, the library remains pretty much as it was originally released. The driving force in the development of this library was small size, which of course was never possible with the softfp routines. It's not half-slow, either, for the limitations of the M0 architecture. And, it's IEEE compliant. But, that means that most of the functions are highly interconnected. So, some of it can be broken up as you outline below, but that last patch is still worth more than half of the total. I also have ~70k lines of test vectors that seem mostly redundant, but not completely. I haven't decided what to do here. For example, I have coverage for __aeabi_u/ldivmod, while GCC does not. If I do anything with this code it will be in a separate thread. > I'd suggest: > > 1) Some basic makefile cleanups to ease initial integration - in > particular where we have things like > > LIB1FUNCS += <long list of functions> > > that this be rewritten with one function per line (and sorted > alphabetically) - then we can see which functions are being changed in > subsequent patches. It makes the Makefile fragments longer, but the > improvement in clarity for makes this worthwhile. I know next to nothing about Makefiles, particularly ones as complex as GCC's. I was just trying to work with the existing style to avoid breaking something. However, I can certainly adopt this suggestion. > 2) The changes for the existing integer functions - preferably one > function per patch. > > 3) The new integer functions that you're adding These wouldn't be too hard to do, but what are the expectations for testing? A clean build of GCC takes about 6 hours in my VM, and regression testing takes about 4 hours per architecture. You would want a full regression report for each incremental patch? I have no idea how to target regression tests that apply to particular runtime functions without the risk of missing something. > 4) The floating-point support. > > Some more general observations: > > - where functions are already in lib1funcs.asm, please leave them there. I guess I have a different vision here. I have had a really hard time following all of the nested #ifdefs in lib1funcs, so I thought it would be helpful to begin breaking it up into logical units. The functions removed were all functions for which I had THUMB1 sequences faster/smaller than lib1funcs: __clzsi2, __clzdi2, __ctzsi2, __ashrdi3, __lshrdi3, __ashldi3. In fact, the new THUMB1 of __clzsi2 is the same number of instructions as the previous ARM/THUMB2 version. You will find all of the previous ARM versions of these functions merged into the new files (with attribution) and the same preprocessor selection path. So no architecture variant should be any worse off than before this patch, and some beyond v6m should benefit. In the future, I think that my versions of __divsi3 and __divdi3 will prove faster/smaller than the existing THUMB2 versions. I know that my float routines are less than half the compiled size of THUMB2 versions in 'ieee754-sf.S'. However, I haven't profiled the exact performance differences so I have left all this work for future patches. (It's also quite likely that my version can be further-refined with a few judicious uses of THUMB2 alternatives.) My long-term vision would be use lib1funcs as an architectural wrapper distinct from the implementation code. > - lets avoid having the cm0 subdirectory - in particular we should do > this when there is existing code for other targets in the same source > files. It's OK to have any new files in the main 'arm' directory of the > source tree - just name the files appropriately if really needed. Fair point on the name. In v1 of this patch, all these files were all preprocessor-selected for v6m only. However, as I've stumbled through the finer points of integration, that line has blurred. Name aside, the subdirectory does still represent a standalone library. I think I've managed to add enough integration hooks that it works well in a libgcc context, but it still has a very distinct implementation style. I don't have a strong opinion on this, just preference. But, keeping the subdirectory with a neutral name will probably make maintenance easier in the short term. I would suggest "lib0" (since it caters to the lowest common denominator) or "eabi" (since that was the original target). There are precedents in other architectures (e.g. avr). > - let's avoid the CM0 prefix - this is 'thumb1' code, for want of a > better term, and that is used widely elsewhere in the compiler. So if > you really need a term just use THUMB1, or even T1. Maybe. The Cortex M0 includes a subset of THUMB2 instructions. Most of this is probably THUMB1 clean, but it wasn't a design requirement. The CM0_FUNC_START exists so that I can specify subsections of ".text" for each function. This was a fairly fundamental design decision that allowed me to make a number of branch optimizations between functions. The other macros are just duplicates for naming symmetry. The existing FUNC_START macro inserts extra conflicting ".text" directives that would break the build. Of course, the prefix was arbitrary; I just took CM0 from the library name. But, there's nothing architecturally significant about this macro at all, so THUMB1 and T1 seems just about as wrong. Maybe define a FUNC_START_SECTION macro with two parameters? For example: FUNC_START_SECTION clzdi2 .text.sorted.libgcc.clz2.clzdi2 Instead of: .section .text.sorted.libgcc.clz2.clzdi2,"x" CM0_FUNC_START clzdi2 > - For the 64-bit shift functions, I expect the existing code to be > preferable whenever possible - I think it can even be tweaked to build > for thumb2 by inserting a suitable IT instruction. So your code should > only be used when > > #if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1 That is the definition of NOT_ISA_TARGET_32BIT, which I am respecting. (The name doesn't seem quite right for Cortex M0, since it does support some 32 bit instructions, but that's beside the point.) The current lib1funcs ARM code path still exists, as described above. My THUMB1 implementations were 1 - 3 instructions shorter than the current versions, which is why I took the time to merge the files. Unfortunately, the Cortex M0 THUMB2 subset does not provide IT. I don't see an advantage to eliminating the branch unless these functions were written with cryptographic side channel attacks in mind. > - most, if not all, of your LSYM symbols should not be needed after > assembly, so should start with a captial 'L' (and no leading > underscores), the assembler will then automatically discard any that are > not needed for relocations. You don't want debugging symbols for libgcc internals :) ? I sort of understand that, but those symbols have been useful to me in the past. The "." by itself seems to keep visibility local, so the extra symbols won't cause linker issuess. Would you object to a macro variant (e.g. LLSYM) that prepends the "L" but is easier to disable? > - you'll need to write suitable commit messages for each patch, which > also contain a suitable ChangeLog style entry. OK. > - finally, your popcount implementations have data in the code segment. > That's going to cause problems when we have compilation options such as > -mpure-code. I am just following the precedent of existing lib1funcs (e.g. __clz2si). If this matters, you'll need to point in the right direction for the fix. I'm not sure it does matter, since these functions are PIC anyway. > I strongly suggest that, rather than using gcc snapshots (I'm assuming > this based on the diff style and directory naming in your patches), you > switch to using a git tree, then you'll be able to use tools such as > rebasing and the git posting tools to send the patch series for > subsequent review. Your assumption is correct. I didn't think that I would have to get so deep into the gcc development process for this contribution. I used this library as a bare metal alternative for libgcc/libm in the product for years, so I thought it would just drop in. But, the libgcc compile mechanics have proved much more 'interesting'. I'm assuming this architecture was created years before the introduction of -ffunction- sections... > > Richard. > Thanks again, Daniel