Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"

Marcin Godlewski Thu, 09 Feb 2017 14:17:43 -0800

David,

Thanks for pointing out the right place to submit the bug report. I have 
submitted one here:
https://savannah.nongnu.org/bugs/index.php?50270


Best regards,
Marcin Godlewski

W dniu 2017-02-09 15:14:11 użytkownik David Brown <da...@westcontrol.com> 
napisał:
> You could file this as a bug on the website:
> <https://savannah.nongnu.org/bugs/?group=avr-libc>
> 
> As far as I understand it, the documentation (both on the website and
> the Atmel documentation) is generated directly from the library code and
> comments - so this would be a change to the library source.
> 
> Feel free to quote any parts of my mails on the subject when filing the bug.
> 
> mvh.,
> 
> David
> 
> 
> On 09/02/17 13:11, Marcin Godlewski wrote:
> > Dear All,
> > 
> > The site
> > http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html
> > still contains buggy description of memory barriers in avr-gcc. As
> > this site is popular among avr users I think it's really worth
> > fixing. What is more the same inaccurate article is available on
> > Atmel doc site:
> > http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> > . Is there anybody subscribed to this mailing list who can contact
> > the authors/maintainers of the site in order to discuss correction of
> > the content?
> > 
> > Marcin Godlewski
> > 
> > W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski
> > <marcin.godlew...@onet.pl> napisał:
> >> W dniu 2016-12-09 10:11:55 użytkownik David Brown
> >> <da...@westcontrol.com> napisał:
> >>> On 08/12/16 21:46, Georg-Johann Lay wrote:
> >>>> Marcin Godlewski schrieb:
> >>>>> Dear all,
> >>>>> 
> >>>>> Thanks for the reply to David. However I'm not trying to find
> >>>>> a solution for the described issue. What I'm trying to say in
> >>>>> this e-mail is that this part of Atmel documentation: 
> >>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> >>>>>
> >>>>> 
> is innacurate and should be corrected. The conclusion says:
> >>>>> 
> >>>>> memory barriers ensure proper ordering of volatile accesses
> >>>>> 
> >>>>> memory barriers don't ensure statements with no volatile
> >>>>> accesses to be reordered across the barrier while it should
> >>>>> say:
> >>>>> 
> >>>>> memory barriers ensure proper ordering of global variables
> >>>>> accesses
> >>>>> 
> >>>>> memory barriers don't ensure local variables accesses to be 
> >>>>> reordered across the barrier
> >>>> 
> >>>> At least the "local" vs. "global" is not completely correct.
> >>>> After all it's about memory accesses, and it doesn't matter if
> >>>> the memory is local (e.g. local static) or if you are
> >>>> dereferencing a pointer (which might point to a local auto or
> >>>> to an object on heap).
> >>>> 
> >>>> The code example you quoted above is actually due to a subtle 
> >>>> implementation detail of division, modulo and some other
> >>>> arithmetic of GCC's avr backend (the division is _not_ a call
> >>>> actually).
> >>>> 
> >>>> IIRC the solution for me back then was -fno-tree-ter as any
> >>>> messing with inline asm doesn't hit the point.
> >>> 
> >>> Yes, that is the solution you proposed when we discussed it a
> >>> good while back (on the avrlibc list, I think).  I disagree with
> >>> you somewhat here (as I did then, though I can't remember if we
> >>> discussed the details).
> >>> 
> >>> Changing an optimisation option like this means that code that
> >>> looks right, will run as expected - and that is a good thing.
> >>> But it also means that the code will /only/ be correct if
> >>> particular optimisation flags are set in particular ways.  That
> >>> is a very fragile situation, and should always be avoided.  To be
> >>> safe, this optimisation would have to be completely disabled in
> >>> the avr-gcc port.  I don't know how useful this particular
> >>> optimisation is in terms of generating more efficient code,
> >>> though from the gcc manual it appears very useful and is enabled 
> >>> at -O1.  Clearly that determines the "cost" of this solution to
> >>> the re-ordering problem.
> >>> 
> >>> 
> >>> The use of the assembly dependency (or a nicely named macro with
> >>> the same effect) fixes the problem in this situation.  It does so
> >>> regardless of optimisation options - the compiler is required to
> >>> have calculated the result of the division before disabling
> >>> interrupts, and cannot re-order the operations.  It does so
> >>> without adding any extra assembly code or hindering any
> >>> optimisations - it merely forces an order on operations that are
> >>> to be done anyway.
> >>> 
> >>> It has the clear disadvantage of needing extra code in the
> >>> user's source.  Like memory barriers, it is a way of giving the
> >>> compiler extra information that cannot be expressed in normal C,
> >>> and which the compiler cannot (at the moment) figure out for
> >>> itself.
> >>> 
> >>> You say that the assembly dependency does not "hit the point".  I
> >>> think you are correct there - it is treating the symptom, not the
> >>> disease.  It is not telling the compiler that an operation should
> >>> not be re-ordered, or that division is a costly operation.  It
> >>> simply tells the compiler that we need the results of that
> >>> computation /here/.  But it is a very effective and efficient
> >>> cure for this sort of problem.  Unless and until there is some
> >>> /safe/ fix in the compiler to avoid this (and I don't count "put
> >>> this compiler option in your command line" as safe), I really do
> >>> think it is the best we have.
> >>> 
> >>> 
> >>> Note, however, that the "forceDependency" macro only solves half
> >>> the problem.  Consider :
> >>> 
> >>> unsigned int test2b(void) { unsigned int val;
> >>> 
> >>> cli(); val = ivar; sei(); val = 65535 / val; return val; }
> >>> 
> >>> In this case, the compiler could move the division backwards
> >>> above the sei(), giving a similar problem.  (It did not make the
> >>> move in my brief tests - but it /could/ do.)  I don't know if the
> >>> -fno-tree-ter flag stops that too, but the forceDependency()
> >>> macro is not enough.  The forgetCompilerKnowledge macro is the
> >>> answer:
> >>> 
> >>> unsigned int test2b(void) { unsigned int val;
> >>> 
> >>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val =
> >>> 65535 / val; return val; }
> >>> 
> >>> This tells the compiler that it needs to stabilise the value of
> >>> "val", and it can't assume anything about "val" after this point
> >>> in the code, because it /might/ be read and /might/ change in the
> >>> assembly code. Again, nothing is actually generated in the
> >>> assembly and we are only forcing an ordering on the code.
> >>> 
> >>> 
> >>> Nothing would please me better here here than to have the
> >>> compiler understand that users would not want such re-ordering
> >>> around cli() and sei(), so that the problem simply goes away.
> >>> But it should not require particular choices of compiler flags,
> >>> nor should it require disabling useful optimisations and thus
> >>> generating poorer code elsewhere.
> >>> 
> >>> It is also worth noting that though this situation occurs
> >>> because division does not work like a normal function call,
> >>> despite it using a library call for implementation, there is
> >>> nothing fundamental to stop the compiler moving a call to foo()
> >>> back or forth across a cli() or sei() as long as the compiler is
> >>> sure that no memory is accessed, no volatiles are accessed, and
> >>> there are no other externally visible effects in foo().  If the
> >>> definition of foo() is available when compiling the code, then
> >>> the compiler could well know this to be the case.  If we replace
> >>> "val = 65535U / val;" with "val = foo(val);", where we have :
> >>> 
> >>> unsigned int foo(unsigned int v) { return (v * v) - v; }
> >>> 
> >>> in the same compilation unit, some or all of the calculation from
> >>> foo() will be inlined and mixed with the cli().  Again,
> >>> -fno-tree-ter fixes this - at the cost of killing such mixing and
> >>> optimisation in cases where it is useful.  And again, the inline
> >>> assembly fixes it at the cost of knowing that you have to add
> >>> this line of source code.
> >>> 
> >>> As gcc gets ever smarter with its inlining, function cloning,
> >>> link-time optimisations, etc., then this will become more and
> >>> more of an issue.
> >>> 
> >>> 
> >>> 
> >>> Maybe the answer is that gcc needs an "execution barrier" that
> >>> is stronger than a memory barrier, because it will also block
> >>> calculations - it would act as a barrier to all local variables.
> >>> I cannot think of any way to make such a barrier with inline
> >>> assembly or the C11 fences - I think it would take a new
> >>> __builtin for gcc.  Such a feature would have use on all embedded
> >>> targets, not just AVR.
> >>> 
> >>> mvh.,
> >>> 
> >>> David
> >>> 
> >>> 
> >> 
> >> I totally agree with you - a feature like "execution barrier" would
> >> be very useful. C11 made good job standardizing multi-threading
> >> features but unfortunately the features not always fits firmware
> >> development. Controlling what exactly goes into critical section is
> >> a fundamental problem, so I would even go further - why don't you
> >> propose the "execution barrier" as a new feature for the future C
> >> language standard?
> >> 
> >>> 
> >>>> 
> >>>> Johann
> >>>> 
> >>>>> I don't know whether this group is the right place to post it
> >>>>> however I do not know any better place. Hope someone here can
> >>>>> trigger the change of the documentation and I also hope to be
> >>>>> corrected if I am wrong.
> >>>>> 
> >>>>> Thanks and regards, Marcin
> >>> 
> >>> 
> >> 
> >> 
> >> 
> >> 
> > 
> > 
> > 
> 
> 




_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"

Reply via email to