Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

Daniel Engel Wed, 06 Jan 2021 16:58:34 -0800

--snip--

On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:


> 
> Thanks for working on this, Daniel.
> 
> This is clearly stage1 material, so we've got time for a couple of
> iterations to sort things out.

I appreciate your feedback.  I had been hoping that with no regressions
this might still be eligible for stage2.  Christophe never indicated
either way. but the fact that he was looking at it seemed positive.
I thought I would be a couple of weeks faster with this last
iteration, but holidays got in the way.

I actually think your comments below could all be addressable within a
couple of days.  But, I'm not accounting for the review process.
 
> Firstly, the patch is very large, but contains a large number of
> distinct changes, so it would really benefit from being broken down into
> a number of distinct patches.  This will make reviewing the individual
> changes much more straight-forward.  

I have no context for "large" or "small" with respect to gcc.  This
patch comprises about 30% of a previously-monolithic library that's
been shipping since ~2016 (the rest is libm material).  Other than
(1) the aforementioned change to div0(), (2) a nascent adaptation
for __truncdfsf2() (not enabled), and (3) the gratuitous addition of
the bitwise functions, the library remains pretty much as it was
originally released.

The driving force in the development of this library was small size,
which of course was never possible with the softfp routines.  It's not
half-slow, either, for the limitations of the M0 architecture.   And,
it's IEEE compliant.  But, that means that most of the functions are
highly interconnected.  So, some of it can be broken up as you outline
below, but that last patch is still worth more than half of the total.

I also have ~70k lines of test vectors that seem mostly redundant, but
not completely.  I haven't decided what to do here.  For example, I have
coverage for __aeabi_u/ldivmod, while GCC does not.  If I do anything
with this code it will be in a separate thread.

> I'd suggest:
> 
> 1) Some basic makefile cleanups to ease initial integration - in
> particular where we have things like
> 
> LIB1FUNCS += <long list of functions>
> 
> that this be rewritten with one function per line (and sorted
> alphabetically) - then we can see which functions are being changed in
> subsequent patches.  It makes the Makefile fragments longer, but the
> improvement in clarity for makes this worthwhile.

I know next to nothing about Makefiles, particularly ones as complex as
GCC's.  I was just trying to work with the existing style to avoid
breaking something.  However, I can certainly adopt this suggestion.
 
> 2) The changes for the existing integer functions - preferably one
> function per patch.
> 
> 3) The new integer functions that you're adding

These wouldn't be too hard to do, but what are the expectations for
testing?  A clean build of GCC takes about 6 hours in my VM, and
regression testing takes about 4 hours per architecture.  You would want
a full regression report for each incremental patch?  I have no idea how
to target regression tests that apply to particular runtime functions
without the risk of missing something.

> 4) The floating-point support.
> 
> Some more general observations:
> 
> - where functions are already in lib1funcs.asm, please leave them there.

I guess I have a different vision here.  I have had a really hard time
following all of the nested #ifdefs in lib1funcs, so I thought it would
be helpful to begin breaking it up into logical units.

The functions removed were all functions for which I had THUMB1
sequences faster/smaller than lib1funcs:  __clzsi2, __clzdi2, __ctzsi2,
__ashrdi3, __lshrdi3, __ashldi3.  In fact, the new THUMB1 of __clzsi2 is
the same number of instructions as the previous ARM/THUMB2 version.

You will find all of the previous ARM versions of these functions merged
into the new files (with attribution) and the same preprocessor
selection path.  So no architecture variant should be any worse off than
before this patch, and some beyond v6m should benefit.

In the future, I think that my versions of __divsi3 and __divdi3 will
prove faster/smaller than the existing THUMB2 versions.  I know that my
float routines are less than half the compiled size of THUMB2 versions
in 'ieee754-sf.S'.  However, I haven't profiled the exact performance
differences so I have left all this work for future patches. (It's also
quite likely that my version can be further-refined with a few judicious
uses of THUMB2 alternatives.)

My long-term vision would be use lib1funcs as an architectural wrapper
distinct from the implementation code.

> - lets avoid having the cm0 subdirectory - in particular we should do
> this when there is existing code for other targets in the same source
> files.  It's OK to have any new files in the main 'arm' directory of the
> source tree - just name the files appropriately if really needed.

Fair point on the name.  In v1 of this patch, all these files were all
preprocessor-selected for v6m only.  However, as I've stumbled through
the finer points of integration, that line has blurred.  Name aside,
the subdirectory does still represent a standalone library.   I think
I've managed to add enough integration hooks that it works well in
a libgcc context, but it still has a very distinct implementation style.

I don't have a strong opinion on this, just preference.  But, keeping
the subdirectory with a neutral name will probably make maintenance
easier in the short term.  I would suggest "lib0" (since it caters to
the lowest common denominator) or "eabi" (since that was the original
target).  There are precedents in other architectures (e.g. avr).

> - let's avoid the CM0 prefix - this is 'thumb1' code, for want of a
> better term, and that is used widely elsewhere in the compiler.  So if
> you really need a term just use THUMB1, or even T1.

Maybe.  The Cortex M0 includes a subset of THUMB2 instructions.  Most
of this is probably THUMB1 clean, but it wasn't a design requirement.

The CM0_FUNC_START exists so that I can specify subsections of ".text"
for each function.  This was a fairly fundamental design decision that
allowed me to make a number of branch optimizations between functions.
The other macros are just duplicates for naming symmetry.

The existing  FUNC_START macro inserts extra conflicting ".text"
directives that would break the build.  Of course, the prefix was
arbitrary; I just took CM0 from the library name.  But, there's nothing
architecturally significant about this macro at all, so THUMB1 and T1
seems just about as wrong.  Maybe define a FUNC_START_SECTION macro with
two parameters? For example:

    FUNC_START_SECTION clzdi2 .text.sorted.libgcc.clz2.clzdi2

Instead of: 

    .section .text.sorted.libgcc.clz2.clzdi2,"x"
    CM0_FUNC_START clzdi2

> - For the 64-bit shift functions, I expect the existing code to be
> preferable whenever possible - I think it can even be tweaked to build
> for thumb2 by inserting a suitable IT instruction.  So your code should
> only be used when
> 
>  #if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1

That is the definition of NOT_ISA_TARGET_32BIT, which I am respecting.
(The name doesn't seem quite right for Cortex M0, since it does support
some 32 bit instructions, but that's beside the point.)

The current lib1funcs ARM code path still exists, as described above. My
THUMB1 implementations were 1 - 3 instructions shorter than the current
versions, which is why I took the time to merge the files.

Unfortunately, the Cortex M0 THUMB2 subset does not provide IT.  I don't
see an advantage to eliminating the branch unless these functions were
written with cryptographic side channel attacks in mind.

> - most, if not all, of your LSYM symbols should not be needed after
> assembly, so should start with a captial 'L' (and no leading
> underscores), the assembler will then automatically discard any that are
> not needed for relocations.

You don't want debugging symbols for libgcc internals :) ?  I sort of
understand that, but those symbols have been useful to me in the past.
The "." by itself seems to keep visibility local, so the extra symbols
won't cause linker issuess. Would you object to a macro variant (e.g.
LLSYM) that prepends the "L" but is easier to disable?

> - you'll need to write suitable commit messages for each patch, which
> also contain a suitable ChangeLog style entry.

OK.

> - finally, your popcount implementations have data in the code segment.
>  That's going to cause problems when we have compilation options such as
> -mpure-code.

I am just following the precedent of existing lib1funcs (e.g. __clz2si).
If this matters, you'll need to point in the right direction for the
fix.  I'm not sure it does matter, since these functions are PIC anyway.

> I strongly suggest that, rather than using gcc snapshots (I'm assuming
> this based on the diff style and directory naming in your patches), you
> switch to using a git tree, then you'll be able to use tools such as
> rebasing and the git posting tools to send the patch series for
> subsequent review.

Your assumption is correct.  I didn't think that I would have to get so
deep into the gcc development process for this contribution.  I used
this library as a bare metal alternative for libgcc/libm in the product
for years, so I thought it would just drop in.  But, the libgcc compile
mechanics have proved much more 'interesting'. I'm assuming this
architecture was created years before the introduction of -ffunction-
sections...

>
> Richard.
>

Thanks again,
Daniel

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

Reply via email to