David, Thanks for pointing out the right place to submit the bug report. I have submitted one here: https://savannah.nongnu.org/bugs/index.php?50270
Best regards, Marcin Godlewski W dniu 2017-02-09 15:14:11 użytkownik David Brown <da...@westcontrol.com> napisał: > You could file this as a bug on the website: > <https://savannah.nongnu.org/bugs/?group=avr-libc> > > As far as I understand it, the documentation (both on the website and > the Atmel documentation) is generated directly from the library code and > comments - so this would be a change to the library source. > > Feel free to quote any parts of my mails on the subject when filing the bug. > > mvh., > > David > > > On 09/02/17 13:11, Marcin Godlewski wrote: > > Dear All, > > > > The site > > http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html > > still contains buggy description of memory barriers in avr-gcc. As > > this site is popular among avr users I think it's really worth > > fixing. What is more the same inaccurate article is available on > > Atmel doc site: > > http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html > > . Is there anybody subscribed to this mailing list who can contact > > the authors/maintainers of the site in order to discuss correction of > > the content? > > > > Marcin Godlewski > > > > W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski > > <marcin.godlew...@onet.pl> napisał: > >> W dniu 2016-12-09 10:11:55 użytkownik David Brown > >> <da...@westcontrol.com> napisał: > >>> On 08/12/16 21:46, Georg-Johann Lay wrote: > >>>> Marcin Godlewski schrieb: > >>>>> Dear all, > >>>>> > >>>>> Thanks for the reply to David. However I'm not trying to find > >>>>> a solution for the described issue. What I'm trying to say in > >>>>> this e-mail is that this part of Atmel documentation: > >>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html > >>>>> > >>>>> > is innacurate and should be corrected. The conclusion says: > >>>>> > >>>>> memory barriers ensure proper ordering of volatile accesses > >>>>> > >>>>> memory barriers don't ensure statements with no volatile > >>>>> accesses to be reordered across the barrier while it should > >>>>> say: > >>>>> > >>>>> memory barriers ensure proper ordering of global variables > >>>>> accesses > >>>>> > >>>>> memory barriers don't ensure local variables accesses to be > >>>>> reordered across the barrier > >>>> > >>>> At least the "local" vs. "global" is not completely correct. > >>>> After all it's about memory accesses, and it doesn't matter if > >>>> the memory is local (e.g. local static) or if you are > >>>> dereferencing a pointer (which might point to a local auto or > >>>> to an object on heap). > >>>> > >>>> The code example you quoted above is actually due to a subtle > >>>> implementation detail of division, modulo and some other > >>>> arithmetic of GCC's avr backend (the division is _not_ a call > >>>> actually). > >>>> > >>>> IIRC the solution for me back then was -fno-tree-ter as any > >>>> messing with inline asm doesn't hit the point. > >>> > >>> Yes, that is the solution you proposed when we discussed it a > >>> good while back (on the avrlibc list, I think). I disagree with > >>> you somewhat here (as I did then, though I can't remember if we > >>> discussed the details). > >>> > >>> Changing an optimisation option like this means that code that > >>> looks right, will run as expected - and that is a good thing. > >>> But it also means that the code will /only/ be correct if > >>> particular optimisation flags are set in particular ways. That > >>> is a very fragile situation, and should always be avoided. To be > >>> safe, this optimisation would have to be completely disabled in > >>> the avr-gcc port. I don't know how useful this particular > >>> optimisation is in terms of generating more efficient code, > >>> though from the gcc manual it appears very useful and is enabled > >>> at -O1. Clearly that determines the "cost" of this solution to > >>> the re-ordering problem. > >>> > >>> > >>> The use of the assembly dependency (or a nicely named macro with > >>> the same effect) fixes the problem in this situation. It does so > >>> regardless of optimisation options - the compiler is required to > >>> have calculated the result of the division before disabling > >>> interrupts, and cannot re-order the operations. It does so > >>> without adding any extra assembly code or hindering any > >>> optimisations - it merely forces an order on operations that are > >>> to be done anyway. > >>> > >>> It has the clear disadvantage of needing extra code in the > >>> user's source. Like memory barriers, it is a way of giving the > >>> compiler extra information that cannot be expressed in normal C, > >>> and which the compiler cannot (at the moment) figure out for > >>> itself. > >>> > >>> You say that the assembly dependency does not "hit the point". I > >>> think you are correct there - it is treating the symptom, not the > >>> disease. It is not telling the compiler that an operation should > >>> not be re-ordered, or that division is a costly operation. It > >>> simply tells the compiler that we need the results of that > >>> computation /here/. But it is a very effective and efficient > >>> cure for this sort of problem. Unless and until there is some > >>> /safe/ fix in the compiler to avoid this (and I don't count "put > >>> this compiler option in your command line" as safe), I really do > >>> think it is the best we have. > >>> > >>> > >>> Note, however, that the "forceDependency" macro only solves half > >>> the problem. Consider : > >>> > >>> unsigned int test2b(void) { unsigned int val; > >>> > >>> cli(); val = ivar; sei(); val = 65535 / val; return val; } > >>> > >>> In this case, the compiler could move the division backwards > >>> above the sei(), giving a similar problem. (It did not make the > >>> move in my brief tests - but it /could/ do.) I don't know if the > >>> -fno-tree-ter flag stops that too, but the forceDependency() > >>> macro is not enough. The forgetCompilerKnowledge macro is the > >>> answer: > >>> > >>> unsigned int test2b(void) { unsigned int val; > >>> > >>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val = > >>> 65535 / val; return val; } > >>> > >>> This tells the compiler that it needs to stabilise the value of > >>> "val", and it can't assume anything about "val" after this point > >>> in the code, because it /might/ be read and /might/ change in the > >>> assembly code. Again, nothing is actually generated in the > >>> assembly and we are only forcing an ordering on the code. > >>> > >>> > >>> Nothing would please me better here here than to have the > >>> compiler understand that users would not want such re-ordering > >>> around cli() and sei(), so that the problem simply goes away. > >>> But it should not require particular choices of compiler flags, > >>> nor should it require disabling useful optimisations and thus > >>> generating poorer code elsewhere. > >>> > >>> It is also worth noting that though this situation occurs > >>> because division does not work like a normal function call, > >>> despite it using a library call for implementation, there is > >>> nothing fundamental to stop the compiler moving a call to foo() > >>> back or forth across a cli() or sei() as long as the compiler is > >>> sure that no memory is accessed, no volatiles are accessed, and > >>> there are no other externally visible effects in foo(). If the > >>> definition of foo() is available when compiling the code, then > >>> the compiler could well know this to be the case. If we replace > >>> "val = 65535U / val;" with "val = foo(val);", where we have : > >>> > >>> unsigned int foo(unsigned int v) { return (v * v) - v; } > >>> > >>> in the same compilation unit, some or all of the calculation from > >>> foo() will be inlined and mixed with the cli(). Again, > >>> -fno-tree-ter fixes this - at the cost of killing such mixing and > >>> optimisation in cases where it is useful. And again, the inline > >>> assembly fixes it at the cost of knowing that you have to add > >>> this line of source code. > >>> > >>> As gcc gets ever smarter with its inlining, function cloning, > >>> link-time optimisations, etc., then this will become more and > >>> more of an issue. > >>> > >>> > >>> > >>> Maybe the answer is that gcc needs an "execution barrier" that > >>> is stronger than a memory barrier, because it will also block > >>> calculations - it would act as a barrier to all local variables. > >>> I cannot think of any way to make such a barrier with inline > >>> assembly or the C11 fences - I think it would take a new > >>> __builtin for gcc. Such a feature would have use on all embedded > >>> targets, not just AVR. > >>> > >>> mvh., > >>> > >>> David > >>> > >>> > >> > >> I totally agree with you - a feature like "execution barrier" would > >> be very useful. C11 made good job standardizing multi-threading > >> features but unfortunately the features not always fits firmware > >> development. Controlling what exactly goes into critical section is > >> a fundamental problem, so I would even go further - why don't you > >> propose the "execution barrier" as a new feature for the future C > >> language standard? > >> > >>> > >>>> > >>>> Johann > >>>> > >>>>> I don't know whether this group is the right place to post it > >>>>> however I do not know any better place. Hope someone here can > >>>>> trigger the change of the documentation and I also hope to be > >>>>> corrected if I am wrong. > >>>>> > >>>>> Thanks and regards, Marcin > >>> > >>> > >> > >> > >> > >> > > > > > > > > _______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list