Are the functions like macros in atomic.h correct? They attempt to deal properly with critical sections/code motion etc, in what this thread is discussing.
http://www.nongnu.org/avr-libc/user-manual/group__util__atomic.html On Thu, Feb 9, 2017 at 9:14 AM, David Brown <da...@westcontrol.com> wrote: > You could file this as a bug on the website: > <https://savannah.nongnu.org/bugs/?group=avr-libc> > > As far as I understand it, the documentation (both on the website and > the Atmel documentation) is generated directly from the library code and > comments - so this would be a change to the library source. > > Feel free to quote any parts of my mails on the subject when filing the bug. > > mvh., > > David > > > On 09/02/17 13:11, Marcin Godlewski wrote: >> Dear All, >> >> The site >> http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html >> still contains buggy description of memory barriers in avr-gcc. As >> this site is popular among avr users I think it's really worth >> fixing. What is more the same inaccurate article is available on >> Atmel doc site: >> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html >> . Is there anybody subscribed to this mailing list who can contact >> the authors/maintainers of the site in order to discuss correction of >> the content? >> >> Marcin Godlewski >> >> W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski >> <marcin.godlew...@onet.pl> napisał: >>> W dniu 2016-12-09 10:11:55 użytkownik David Brown >>> <da...@westcontrol.com> napisał: >>>> On 08/12/16 21:46, Georg-Johann Lay wrote: >>>>> Marcin Godlewski schrieb: >>>>>> Dear all, >>>>>> >>>>>> Thanks for the reply to David. However I'm not trying to find >>>>>> a solution for the described issue. What I'm trying to say in >>>>>> this e-mail is that this part of Atmel documentation: >>>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html >>>>>> >>>>>> > is innacurate and should be corrected. The conclusion says: >>>>>> >>>>>> memory barriers ensure proper ordering of volatile accesses >>>>>> >>>>>> memory barriers don't ensure statements with no volatile >>>>>> accesses to be reordered across the barrier while it should >>>>>> say: >>>>>> >>>>>> memory barriers ensure proper ordering of global variables >>>>>> accesses >>>>>> >>>>>> memory barriers don't ensure local variables accesses to be >>>>>> reordered across the barrier >>>>> >>>>> At least the "local" vs. "global" is not completely correct. >>>>> After all it's about memory accesses, and it doesn't matter if >>>>> the memory is local (e.g. local static) or if you are >>>>> dereferencing a pointer (which might point to a local auto or >>>>> to an object on heap). >>>>> >>>>> The code example you quoted above is actually due to a subtle >>>>> implementation detail of division, modulo and some other >>>>> arithmetic of GCC's avr backend (the division is _not_ a call >>>>> actually). >>>>> >>>>> IIRC the solution for me back then was -fno-tree-ter as any >>>>> messing with inline asm doesn't hit the point. >>>> >>>> Yes, that is the solution you proposed when we discussed it a >>>> good while back (on the avrlibc list, I think). I disagree with >>>> you somewhat here (as I did then, though I can't remember if we >>>> discussed the details). >>>> >>>> Changing an optimisation option like this means that code that >>>> looks right, will run as expected - and that is a good thing. >>>> But it also means that the code will /only/ be correct if >>>> particular optimisation flags are set in particular ways. That >>>> is a very fragile situation, and should always be avoided. To be >>>> safe, this optimisation would have to be completely disabled in >>>> the avr-gcc port. I don't know how useful this particular >>>> optimisation is in terms of generating more efficient code, >>>> though from the gcc manual it appears very useful and is enabled >>>> at -O1. Clearly that determines the "cost" of this solution to >>>> the re-ordering problem. >>>> >>>> >>>> The use of the assembly dependency (or a nicely named macro with >>>> the same effect) fixes the problem in this situation. It does so >>>> regardless of optimisation options - the compiler is required to >>>> have calculated the result of the division before disabling >>>> interrupts, and cannot re-order the operations. It does so >>>> without adding any extra assembly code or hindering any >>>> optimisations - it merely forces an order on operations that are >>>> to be done anyway. >>>> >>>> It has the clear disadvantage of needing extra code in the >>>> user's source. Like memory barriers, it is a way of giving the >>>> compiler extra information that cannot be expressed in normal C, >>>> and which the compiler cannot (at the moment) figure out for >>>> itself. >>>> >>>> You say that the assembly dependency does not "hit the point". I >>>> think you are correct there - it is treating the symptom, not the >>>> disease. It is not telling the compiler that an operation should >>>> not be re-ordered, or that division is a costly operation. It >>>> simply tells the compiler that we need the results of that >>>> computation /here/. But it is a very effective and efficient >>>> cure for this sort of problem. Unless and until there is some >>>> /safe/ fix in the compiler to avoid this (and I don't count "put >>>> this compiler option in your command line" as safe), I really do >>>> think it is the best we have. >>>> >>>> >>>> Note, however, that the "forceDependency" macro only solves half >>>> the problem. Consider : >>>> >>>> unsigned int test2b(void) { unsigned int val; >>>> >>>> cli(); val = ivar; sei(); val = 65535 / val; return val; } >>>> >>>> In this case, the compiler could move the division backwards >>>> above the sei(), giving a similar problem. (It did not make the >>>> move in my brief tests - but it /could/ do.) I don't know if the >>>> -fno-tree-ter flag stops that too, but the forceDependency() >>>> macro is not enough. The forgetCompilerKnowledge macro is the >>>> answer: >>>> >>>> unsigned int test2b(void) { unsigned int val; >>>> >>>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val = >>>> 65535 / val; return val; } >>>> >>>> This tells the compiler that it needs to stabilise the value of >>>> "val", and it can't assume anything about "val" after this point >>>> in the code, because it /might/ be read and /might/ change in the >>>> assembly code. Again, nothing is actually generated in the >>>> assembly and we are only forcing an ordering on the code. >>>> >>>> >>>> Nothing would please me better here here than to have the >>>> compiler understand that users would not want such re-ordering >>>> around cli() and sei(), so that the problem simply goes away. >>>> But it should not require particular choices of compiler flags, >>>> nor should it require disabling useful optimisations and thus >>>> generating poorer code elsewhere. >>>> >>>> It is also worth noting that though this situation occurs >>>> because division does not work like a normal function call, >>>> despite it using a library call for implementation, there is >>>> nothing fundamental to stop the compiler moving a call to foo() >>>> back or forth across a cli() or sei() as long as the compiler is >>>> sure that no memory is accessed, no volatiles are accessed, and >>>> there are no other externally visible effects in foo(). If the >>>> definition of foo() is available when compiling the code, then >>>> the compiler could well know this to be the case. If we replace >>>> "val = 65535U / val;" with "val = foo(val);", where we have : >>>> >>>> unsigned int foo(unsigned int v) { return (v * v) - v; } >>>> >>>> in the same compilation unit, some or all of the calculation from >>>> foo() will be inlined and mixed with the cli(). Again, >>>> -fno-tree-ter fixes this - at the cost of killing such mixing and >>>> optimisation in cases where it is useful. And again, the inline >>>> assembly fixes it at the cost of knowing that you have to add >>>> this line of source code. >>>> >>>> As gcc gets ever smarter with its inlining, function cloning, >>>> link-time optimisations, etc., then this will become more and >>>> more of an issue. >>>> >>>> >>>> >>>> Maybe the answer is that gcc needs an "execution barrier" that >>>> is stronger than a memory barrier, because it will also block >>>> calculations - it would act as a barrier to all local variables. >>>> I cannot think of any way to make such a barrier with inline >>>> assembly or the C11 fences - I think it would take a new >>>> __builtin for gcc. Such a feature would have use on all embedded >>>> targets, not just AVR. >>>> >>>> mvh., >>>> >>>> David >>>> >>>> >>> >>> I totally agree with you - a feature like "execution barrier" would >>> be very useful. C11 made good job standardizing multi-threading >>> features but unfortunately the features not always fits firmware >>> development. Controlling what exactly goes into critical section is >>> a fundamental problem, so I would even go further - why don't you >>> propose the "execution barrier" as a new feature for the future C >>> language standard? >>> >>>> >>>>> >>>>> Johann >>>>> >>>>>> I don't know whether this group is the right place to post it >>>>>> however I do not know any better place. Hope someone here can >>>>>> trigger the change of the documentation and I also hope to be >>>>>> corrected if I am wrong. >>>>>> >>>>>> Thanks and regards, Marcin >>>> >>>> >>> >>> >>> >>> >> >> >> > > > _______________________________________________ > AVR-GCC-list mailing list > AVR-GCC-list@nongnu.org > https://lists.nongnu.org/mailman/listinfo/avr-gcc-list _______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list