Re: [RFC] Meta-description for tree and gimple folding
On Thu, 2014-02-27 at 15:34 +0100, Richard Biener wrote: > > I've been hacking on a prototype that generates matching and > simplification code from a meta-description. For what it is worth, MELT has a similar feature. http://gcc-melt.org/ regards -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: Asm volatile causing performance regressions on ARM
> I think part of the problem is that some parts of GCC (like the one you > noted) are far more conservative than others. E.g. take: > > void foo (int x, int *y) > { > y[0] = x + 1; > asm volatile ("# asm"); > y[1] = x + 1; > } > > The extra-paranoid check you pointed out means that we assume that > x + 1 is no longer available after the asm for rtx-level CSE, but take > the opposite view for tree-level CSE, which happily optimises away the > second +. Right, I think that it should be possible to (partially) achieve this for RTL CSE without totally gettting rid of the invalidation. > One of the big grey areas is what should happen for floating-point ops > that depend on the current rounding mode. That isn't really modelled > properly yet though. Again, it affects calls as well as volatile asms. There is an explicit comment about this in the scheduler: case ASM_OPERANDS: case ASM_INPUT: { /* Traditional and volatile asm instructions must be considered to use and clobber all hard registers, all pseudo-registers and all of memory. So must TRAP_IF and UNSPEC_VOLATILE operations. Consider for instance a volatile asm that changes the fpu rounding mode. An insn should not be moved across this even if it only uses pseudo-regs because it might give an incorrectly rounded result. */ if (code != ASM_OPERANDS || MEM_VOLATILE_P (x)) reg_pending_barrier = TRUE_BARRIER; -- Eric Botcazou
Re: Asm volatile causing performance regressions on ARM
> So, the main question is not about triggering condition, but about the > behavior itself. Is it correct to flush and reload all constants ? They are > constants after all, they are even not stored in .data section but inlined > in the code, and thus cannot be modified. I'd think that everyone more or less agrees that this behavior of RTL CSE is overly conservative and ought to be relaxed. The question is whether this can be done in a sensible way without totally getting rid of the barrier. -- Eric Botcazou
Re: Asm volatile causing performance regressions on ARM
On Fri, Feb 28, 2014 at 9:51 AM, Eric Botcazou wrote: >> So, the main question is not about triggering condition, but about the >> behavior itself. Is it correct to flush and reload all constants ? They are >> constants after all, they are even not stored in .data section but inlined >> in the code, and thus cannot be modified. > > I'd think that everyone more or less agrees that this behavior of RTL CSE is > overly conservative and ought to be relaxed. The question is whether this can > be done in a sensible way without totally getting rid of the barrier. Of course if the GIMPLE level doesn't care about the barrier then it doesn't make sense to be overly conservative at the RTL CSE level. Thus I think we can just remove this barrier completely. [GIMPLE of course respects the volatile in that it doesn't remove the asm and it respects the "memory" clobber as a barrier for memory optimizations (but it does _not_ consider it a memory barrier for automatic variables that do not have their address taken!)] Richard. > -- > Eric Botcazou
Re: Asm volatile causing performance regressions on ARM
Am 02/27/2014 06:03 PM, schrieb Richard Sandiford: Yury Gribov writes: Richard Biener wrote: If this behavior is not intended, what would be the best way to fix performance? I could teach GCC to not remove constant RTXs in flush_hash_table() but this is probably very naive and won't cover some corner-cases. That could be a good starting point though. Though with modifying "machine state" you can modify constants as well, no? Valid point but this would mean relying on compiler to always load all constants from memory (instead of, say, generating them via movhi/movlo) for a piece of code which looks extremely unstable. Right. And constant rtx codes have mode-independent semantics. (const_int 1) is always 1, whatever a volatile asm does. Same for const_double, symbol_ref, label_ref, etc. If a constant load is implemented using some mode-dependent operation then it would need to be represented as something like an unspec instead. But even then, the result would usually be annotated with a REG_EQUAL note giving the value of the final register result. It should be perfectly OK to reuse that register after a volatile asm if the value in the REG_EQUAL note is needed again. What is the general attitude towards volatile asm? Are people interested in making it more defined/performant or should we just leave this can of worms as is? I can try to improve generated code but my patches will be doomed if there is no consensus on what volatile asm actually means... I think part of the problem is that some parts of GCC (like the one you noted) are far more conservative than others. E.g. take: void foo (int x, int *y) { y[0] = x + 1; asm volatile ("# asm"); y[1] = x + 1; } The extra-paranoid check you pointed out means that we assume that x + 1 is no longer available after the asm for rtx-level CSE, but take the opposite view for tree-level CSE, which happily optimises away the second +. Some places were (maybe still are) worried that volatile asms could clobber any register they like. But the register allocator assumes that registers are preserved across volatile asms unless explicitly clobbered. And AFAIK it always has. So in the above example we get: addl$1, %edi movl%edi, (%rsi) #APP # 4 "/tmp/foo.c" 1 # asm # 0 "" 2 #NO_APP movl%edi, 4(%rsi) ret with %edi being live across the asm. We do nothing this draconian for a normal function call, which could easily use a volatile asm internally. IMO anything that isn't flushed for a call shouldn't be flushed for a volatile asm either. Disagree here. Just take the example I gave in my other main, worked out a bit here: extern double __attribute__((const)) costly_func (double); extern void func (void); extern double X; void code1_func (double x) { func (); X = costly_func (x); } void code2_func (double x) { double temp.1 = costly_func(x); func (); X = temp.1; } void code1_asm (double x) { asm volatile ("disable interrupts":::"memory"); // some atomic operations not involving x or X or costly_func asm volatile ("re-enable interrupts":::"memory"); X = costly_func (x); } void code2_asm (double x) { asm volatile ("disable interrupts":::"memory"); double temp.1 = costly_func(x); // some atomic operations not involving x or X or costly_func asm volatile ("re-enable interrupts":::"memory"); X = temp.1; } costly_func can be moved across func as usual, i.e. the transformation from code1_func to code2_func is in all right. But this is no more the case if we move costly_func across an asm barrier instead of across an ordinary function. Now observe the impact on interrupt respond times. This is a side effect not covered by the standards' notion of "side effects". But it is part of "any weird side effect" than asm barrier can have. WHereas code1_asm has no big impact in interrupt respond times (provided the code in comments is small), code2_asm is disastrous w.r.t. interrupt respond times. Notice that in code1, func might contain such asm-pairs to implement atomic operations, but moving costly_func across func does *not* affect the interrupt respond times in such a disastrous way. Thus you must be *very* careful w.r.t. optimizing against asm volatile + memory clobber. It's too easy to miss some side effects of *real* code. And this is just one example that proves that moving const across asm is not sane. Setting floating point rounding mode might be an other, but are surely many other cases you don't imagine and I don't imagine. Thus always keep in mind that GCC is supposed to be a compiler that shall generate code for *real* machines with real *weird* hardware of requirements. Optimizing code to scrap and pointing to some GCC internal reasoning or some standard's wording does not help with real code. Johann One of the big grey areas is what should happen for floating
Re: Asm volatile causing performance regressions on ARM
> Of course if the GIMPLE level doesn't care about the barrier then it doesn't > make sense to be overly conservative at the RTL CSE level. Thus I think we > can just remove this barrier completely. Not clear to me, what happens e.g. for register variables? -- Eric Botcazou
Re: Asm volatile causing performance regressions on ARM
On Fri, Feb 28, 2014 at 10:23 AM, Eric Botcazou wrote: >> Of course if the GIMPLE level doesn't care about the barrier then it doesn't >> make sense to be overly conservative at the RTL CSE level. Thus I think we >> can just remove this barrier completely. > > Not clear to me, what happens e.g. for register variables? Not sure, on GIMPLE some passes explicitely avoid doing some transforms for DECL_HARD_REGISTER variables, some don't. A volatile asm is certainly not considered a barrier here (DECL_HARD_REGISTER vars are not written into SSA form but they are not having their address taken and thus the volatile asm memory clobber isn't a barrier for them). Richard.
Handling error conditions in libgomp
Hi! Currently when libgomp hits an error (that is, an internal error, possibly due to the user doing "stupid" things), after printing a message libgomp/error.c:gomp_fatal terminates the executing process: void gomp_fatal (const char *fmt, ...) { va_list list; va_start (list, fmt); gomp_verror (fmt, list); va_end (list); exit (EXIT_FAILURE); } The process cannot recover from this, trying to continue despite the error. (It is of course questionable what exactly to do in this case, as libgomp's internal state may now be corrupt.) So far, such errors may have been rare (aside from real bugs, only/primarily dynamic resource exhaustion?), but in the advent of libgomp using external modules (plugin interface, for acceleration devices) I expect them to become more frequent. That aside, it is also an hurdle for ourselves if in libgomp's testsuite we want to test that the library does the right thing (abort) for non-sensible requests. Does it make sense to add the option for the user to install a handler for this? Three quick ideas, all untested: generally use abort in libgomp, which can then be caught with a SIGABRT handler that the user has set up (difficult to communicate information from libgomp to the handler); adding a weak function GOMP_error that the user can provide and that libgomp will call in presence of an error; or provide some GOMP_init function for registering an error handler. The actual interface might be something like: an enum to indicate the class (severity?) of the error, a const char* for an error message that libgomp has generated (possibly forwarded from a plugin), and a boolean return value to tell libgomp to either continue or terminate the process. Then, also libgomp's internal initialization could be made more explicit, so that it can (be) reinitialize(d) after an error occured. It makes sense that the default remains to terminate the process in presence of an error. I have not yet researched what other OpenACC or OpenMP implementations are doing, or other compiler support/runtime libraries. Thoughts? Grüße, Thomas pgplDFDltggmv.pgp Description: PGP signature
Re: Asm volatile causing performance regressions on ARM
Eric Botcazou writes: >> One of the big grey areas is what should happen for floating-point ops >> that depend on the current rounding mode. That isn't really modelled >> properly yet though. Again, it affects calls as well as volatile asms. > > There is an explicit comment about this in the scheduler: > > case ASM_OPERANDS: > case ASM_INPUT: > { > /* Traditional and volatile asm instructions must be considered to use > and clobber all hard registers, all pseudo-registers and all of > memory. So must TRAP_IF and UNSPEC_VOLATILE operations. > > Consider for instance a volatile asm that changes the fpu rounding > mode. An insn should not be moved across this even if it only uses > pseudo-regs because it might give an incorrectly rounded result. */ > if (code != ASM_OPERANDS || MEM_VOLATILE_P (x)) > reg_pending_barrier = TRUE_BARRIER; But here too the point is that we don't assume the same thing at the tree level or during register allocation. It seems a bit silly for the scheduler to assume that all hard registers are clobbered when the register allocator itself doesn't assume that. And most rtl passes assume that changes to pseudo registers are explicitly modelled via SETs or CLOBBERs. E.g. to slightly adjust my previous example: void foo (float *dest, float x, float y) { dest[0] = x + y; asm volatile ("# foo"); dest[1] = x + y; } gives: addss %xmm1, %xmm0 movss %xmm0, (%rdi) #APP # 4 "/tmp/foo.c" 1 # foo # 0 "" 2 #NO_APP movss %xmm0, 4(%rdi) ret And we certainly don't consider volatile asms to clobber memory in other places. E.g.: void foo (float *dest, float x, float y) { dest[0] = x + y; asm volatile ("# foo"); dest[1] = dest[0]; } gives the same code as above. In contrast: void foo (float *dest, float x, float y) { dest[0] = x + y; asm volatile ("# foo" ::: "memory"); dest[1] = dest[0]; } _does_ force dest[0] to be reloaded. As far as it being a scheduling barrier, consider something like: void foo (float *dest, float x, float y) { int i; for (i = 0; i < 100; i++) { asm volatile ("# a"); dest[i] = x + y; asm volatile ("# b"); } } At -O2 the addition is hoisted out of the loop. Obviously the scheduler is free to be extra conservative but I don't think the comment describes the semantics of volatile asms. Thanks, Richard
Re: Asm volatile causing performance regressions on ARM
Georg-Johann Lay writes: > Notice that in code1, func might contain such asm-pairs to implement > atomic operations, but moving costly_func across func does *not* > affect the interrupt respond times in such a disastrous way. > > Thus you must be *very* careful w.r.t. optimizing against asm volatile > + memory clobber. It's too easy to miss some side effects of *real* > code. I understand the example, but I don't think volatile asms guarantee what you want here. > Optimizing code to scrap and pointing to some GCC internal reasoning or some > standard's wording does not help with real code. But how else can a compiler work? It doesn't just regurgitate canned code, so it can't rely on human intuition as to what "makes sense". We have to have specific rules and guarantees and say that anything outside those rules and guarantees is undefined. It sounds like you want an asm with an extra-strong ordering guarantee. I think that would need to be an extension, since it would need to consider cases where the asm is used in a function. (Shades of carries_dependence or whatever in the huge atomic thread.) I think anything where: void foo (void) { X; } void bar (void) { Y1; foo (); Y2; } has different semantics from: void bar (void) { Y1; X; Y2; } is very dangerous. And assuming that any function call could enable or disable interrupts, and therefore that nothing can be moved across a non-const function call, would limit things a bit too much. Thanks, Richard
Re: Asm volatile causing performance regressions on ARM
> But here too the point is that we don't assume the same thing at the > tree level or during register allocation. It seems a bit silly for > the scheduler to assume that all hard registers are clobbered when the > register allocator itself doesn't assume that. And most rtl passes > assume that changes to pseudo registers are explicitly modelled via SETs > or CLOBBERs. OK, let's declare volatile asms dead as full optimization barriers then, which means that they will need to be taken out of volatile_insn_p eventually and the comments in the RTL middle-end adjusted. In the short term, i.e. for 4.8.x/4.9.x, we cannot simply revert the patch because it has been written to catch UNSPEC_VOLATILE. So we could devise another, temporary predicate in rtlanal.c that would be volatile_insn_p minus volatile asms and invoke it from the places in cse.c, cselib.c and dse.c which have been changed by the patch. -- Eric Botcazou
Re: [RFC] Meta-description for tree and gimple folding
On Thu, 27 Feb 2014, Richard Biener wrote: I've been hacking on a prototype that generates matching and simplification code from a meta-description. The goal is to provide a single source of transforms currently spread over the compiler, mostly fold-const.c, gimple-fold.c and tree-ssa-forwprop.c. Another goal is to make these transforms (which are most of the form of generating a simpler form of a pattern-matched IL piece) more readily available to passes like value-numbering so they can be used on-the-fly, using information provided by the pass lattice. The ultimate goal is to generate (most of) fold-const.c and gimple-fold.c and tree-ssa-forwprop.c from a single meta description. [...] Comments or suggestions? It is hard to judge from such simple examples. What if I want to do the transformation only if CST+CST' doesn't overflow? Can I make it handle commutative operations cleverly? How do I restrict some subexpression to have a single use? If I want to check some flag_*, do I have to create a predicate that tests for that as well as whatever other test the specific operand needed? Won't valueize create a lot of not-so-necessary trees when used on gimple? If I write a COND_EXPR matcher, could it generate code for phiopt as well? What is the point of gimple_match (without _and_substitute)? Will it clean up the new dead statements after itself? Can I write (PLUS_EXPR@0 @1 @2) so I can still refer to this subexpression later? How do you handle a transformation that currently tries to recursively fold something else and does the main transformation only if that simplified? I guess most of the answers will become obvious once you have converted a few existing transformations and I can look at those... Since this is going to require a rewrite of many transformations, I wonder if there are things that should be improved before starting so we can write the right tests directly. For A+B-B, we can do the transformation if the type is an integer for which overflow either wraps or is undefined or traps-but-it-is-ok-to-remove-those-traps (but not if it traps and we promise to preserve traps, or if the type saturates), or the type is a float and we have the unsafe math flag, or the type is a vector/complex and the transformation is valid for the element type. Ok, maybe not the best example, but we have for instance -ftrapping-math for which a split was discussed (PR 53805#c4). -- Marc Glisse
Re: [RFC] Meta-description for tree and gimple folding
On Thu, Feb 27, 2014 at 9:34 AM, Richard Biener wrote: > Comments or suggestions? On the surface it looks like a nice idea. However, I would like to understand the scope of this. Are you thinking of a pattern matcher with peephole like actions? Or would you like to evolve a DSL capable of writing compiler passes? (much like MELT). I prefer keeping it simple and limit the scope to pattern matching and replacement. There will be other things to handle, however (overflow, trapping arithmetic, etc). The language will grow over time. In terms of the mini-language, I don't like the lisp syntax but I appreciate that it is a familiar language for GCC since the backend uses it extensively. Please consider testing facilities as well. We should be able to write unit tests in this language to test its syntax validation, semantic actions, and transformations. Diego.
Changing the MIPS ISA for the Loongson 3A from MIPS64 to MIPS64r2
Hi, I have noticed that a patch was placed in bugzilla to do this change, but it does not appear to have been pushed. I was wondering if anyone could comment why this is the case? The bugzilla URL is the following: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57754 Many thanks, Andrew Andrew Bennett Software Design Engineer, MIPS Processor IP Imagination Technologies Limited t: +44 (0)113 2429814 www.imgtec.com
Re: [RFC] Meta-description for tree and gimple folding
Hmm, this all reminds me about the approach Andrew Pinski and I came up with two years ago. All in all I think it might be worth to express folding-patterns in a more abstract way. So the md-like Lisp syntax for this seems to be just stringent. We make use of such a script-language already for machine-description. Nevertheless I doubt that we really want to have same facility for fold-const and gimple. Instead I got the impression that we would prefer to have all folding-optimizations instead in Middle-end (GIMPLE). We need folding in front-end (AST) mostly for determination of constant-expression detection. Otherwise we want to keep maximum of original AST to have best results for debugging (and even to think about having alternative FE on our middle/backend) and code-analyzers. So I doubt that we want to keep fold-const patterns similar to gimple (forward-prop) ones. Wouldn't it make more sense to move fold-const patterns completely into gimple, and having a facility in FE to ask gimple to *pre*-fold a given tree to see if a constant-expression can be achieved? Regards, Kai
Re: Handling error conditions in libgomp
On 02/28/2014 03:37 AM, Thomas Schwinge wrote: > The process cannot recover from this, trying to continue despite the > error. (It is of course questionable what exactly to do in this case, as > libgomp's internal state may now be corrupt.) So far, such errors may > have been rare (aside from real bugs, only/primarily dynamic resource > exhaustion?), but in the advent of libgomp using external modules (plugin > interface, for acceleration devices) I expect them to become more > frequent. I could see that, yes. However... > Does it make sense to add the option for the user to install a handler > for this? Three quick ideas, all untested: generally use abort in > libgomp, which can then be caught with a SIGABRT handler that the user > has set up (difficult to communicate information from libgomp to the > handler); adding a weak function GOMP_error that the user can provide and > that libgomp will call in presence of an error; or provide some GOMP_init > function for registering an error handler. The actual interface might be > something like: an enum to indicate the class (severity?) of the error, a > const char* for an error message that libgomp has generated (possibly > forwarded from a plugin), and a boolean return value to tell libgomp to > either continue or terminate the process. Then, also libgomp's internal > initialization could be made more explicit, so that it can (be) > reinitialize(d) after an error occured. It makes sense that the default > remains to terminate the process in presence of an error. I've never been keen on weak symbol interposition. That works for an application which is the sole user of libgomp, but does not work for libraries using libgomp and worse for multiple such libraries. I'd be ok with some kind of registration interface, like old = omp_set_error_handler (new); so that a library can set and restore the handler around its own omp usage. As for the interface of the handler itself... I dunno. I'd hate to make it too complicated to actually use. If severity is restricted to a single bit meaning "cannot possibly continue" or "transient error". Maybe with the error string, since who knows what kind of weirdness is going on in the plugin. A significant question is what thread this error would be reported on, and the state of the other threads in the team when it happens. This is the primary reason that OMP disallows EH to propagate from inside the parallel region to outside -- the exception would have no where to go if it was raised from other than the primary thread. In order to be usable at all, one would have to arrange for the error to be reported on a thread that could do something about it. Does it do any good to report the error on thread 3, if thread 3 can't communicate with the user to actually report the error? I suppose there are a few types of "continuable" errors that might want some sort of user interaction. E.g. if the error is from a plugin, before we've committed to performing any computation with the plugin, we could well be in a state where we could fall back to the host for execution. But we could want to let the user know that the plugin failed and this is why the run is going to be slow this time. > I have not yet researched what other OpenACC or OpenMP implementations > are doing, or other compiler support/runtime libraries. That would be good to know. r~ signature.asc Description: OpenPGP digital signature
Re: Handling error conditions in libgomp
On 02/28/14 16:05, Richard Henderson wrote: I'd be ok with some kind of registration interface, like old = omp_set_error_handler (new); so that a library can set and restore the handler around its own omp usage. I agree. An explicit API should be exposed to do this. As for the interface of the handler itself... I dunno. I'd hate to make it too complicated to actually use. If severity is restricted to a single bit meaning "cannot possibly continue" or "transient error". Maybe with the error string, since who knows what kind of weirdness is going on in the plugin. I wonder whether there should be 2 separate handlers -- one for the diagnostic and one for fatality? Something like: typedef void omp_diag_t (void *usr_data, unsigned flags, const char *, va_list); void *OMP_diag_data; omp_diag_t *OMP_diag_fn = OMP_diag_default; void (*OMP_fatal_fn) (void) = abort; then in gomp: void OMP_error (unsigned flags, const char *fmt, ...) { va_list args; ... OMP_diag_fn (OMP_diag_data, flags, fmt, args); } void __attribute__ ((noreturn)) OMP_fatal (const char *fmt, ...) { ... OMP_diag_fn (OMP_diag_data, SOMETHING, fmt, args); OMP_fatal_fn (); } That way the non-returning nature of OMP_fatal can be exposed to the compiler (I'm sure that'll kill a lot of false positive compiler warnings). nathan -- Nathan Sidwell
Re: Changing the MIPS ISA for the Loongson 3A from MIPS64 to MIPS64r2
Andrew Bennett writes: > Hi, > > I have noticed that a patch was placed in bugzilla to do this change, but it > does not appear to have been pushed. I was wondering if anyone could comment > why this is the case? > > The bugzilla URL is the following: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57754 Looks OK if it passes testing. We'll need a name and email address for the commit though. Thanks, Richard
Re: Handling error conditions in libgomp
On Fri, 2014-02-28 at 12:37 +0100, Thomas Schwinge wrote: > Does it make sense to add the option for the user to install a handler > for this? The problem with calling a handler is that it's often hard to define which state the program is in during the handler. If libgomp had to terminate because of a programming error, there's often little one can do anyway to resolve the problem; even proper cleanup could be hard because the program may still hold locks (or violate other invariants) that are necessary for meaningful cleanup. Another problem is that if you specify that a handler is called, then you typically have to always call the handler in an erroneous situation. Thus, I think it would be good to differentiate between different classes of problems. For example, if it's a plugin initialization that's implicit in some lambda construct, then a handler might work, but offering a separate function that triggers such initialization and returns an error might be even better; with such a function, the programmer can do error handling up front. In contrast, for programming errors, simply terminating would be fine for me. At least for C/C++ there's plenty of undefined behavior, so trying to do something better for OpenMP might not be as helpful. > I have not yet researched what other OpenACC or OpenMP implementations > are doing, or other compiler support/runtime libraries. I suggest to also have a look at what proposed C++ parallelism abstractions do. They essentially return exceptions from user-supplied code as a list of exceptions, which is returned (or thrown) by the construct that spawned the parallel execution; translating this to handlers, the thread that spawned the construct should execute any handler. However, part of the problem is certainly OpenMPs goal to provide unified abstractions that work the same in all 3 base languages, which isn't too helpful if considering that those languages have different mechanisms for error handling (e.g., C++ exceptions vs. error codes in C).
Re: Handling error conditions in libgomp
On Fri, 28 Feb 2014, Thomas Schwinge wrote: > That aside, it is also an hurdle for ourselves if in libgomp's testsuite > we want to test that the library does the right thing (abort) for > non-sensible requests. If you want to verify that a program exits with an error (so that exit with an error means PASS, exit successfully means FAIL), that's what dg-shouldfail is for. -- Joseph S. Myers jos...@codesourcery.com
Re: [RFC][PATCH 0/5] arch: atomic rework
On Thu, Feb 27, 2014 at 12:53:12PM -0800, Paul E. McKenney wrote: > On Thu, Feb 27, 2014 at 11:47:08AM -0800, Linus Torvalds wrote: > > On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney > > wrote: > > > > > > 3. The comparison was against another RCU-protected pointer, > > > where that other pointer was properly fetched using one > > > of the RCU primitives. Here it doesn't matter which pointer > > > you use. At least as long as the rcu_assign_pointer() for > > > that other pointer happened after the last update to the > > > pointed-to structure. > > > > > > I am a bit nervous about #3. Any thoughts on it? > > > > I think that it might be worth pointing out as an example, and saying > > that code like > > > >p = atomic_read(consume); > >X; > >q = atomic_read(consume); > >Y; > >if (p == q) > > data = p->val; > > > > then the access of "p->val" is constrained to be data-dependent on > > *either* p or q, but you can't really tell which, since the compiler > > can decide that the values are interchangeable. > > > > I cannot for the life of me come up with a situation where this would > > matter, though. If "X" contains a fence, then that fence will be a > > stronger ordering than anything the consume through "p" would > > guarantee anyway. And if "X" does *not* contain a fence, then the > > atomic reads of p and q are unordered *anyway*, so then whether the > > ordering to the access through "p" is through p or q is kind of > > irrelevant. No? > > I can make a contrived litmus test for it, but you are right, the only > time you can see it happen is when X has no barriers, in which case > you don't have any ordering anyway -- both the compiler and the CPU can > reorder the loads into p and q, and the read from p->val can, as you say, > come from either pointer. > > For whatever it is worth, hear is the litmus test: > > T1: p = kmalloc(...); > if (p == NULL) > deal_with_it(); > p->a = 42; /* Each field in its own cache line. */ > p->b = 43; > p->c = 44; > atomic_store_explicit(&gp1, p, memory_order_release); > p->b = 143; > p->c = 144; > atomic_store_explicit(&gp2, p, memory_order_release); > > T2: p = atomic_load_explicit(&gp2, memory_order_consume); > r1 = p->b; /* Guaranteed to get 143. */ > q = atomic_load_explicit(&gp1, memory_order_consume); > if (p == q) { > /* The compiler decides that q->c is same as p->c. */ > r2 = p->c; /* Could get 44 on weakly order system. */ > } > > The loads from gp1 and gp2 are, as you say, unordered, so you get what > you get. > > And publishing a structure via one RCU-protected pointer, updating it, > then publishing it via another pointer seems to me to be asking for > trouble anyway. If you really want to do something like that and still > see consistency across all the fields in the structure, please put a lock > in the structure and use it to guard updates and accesses to those fields. And here is a patch documenting the restrictions for the current Linux kernel. The rules change a bit due to rcu_dereference() acting a bit differently than atomic_load_explicit(&p, memory_order_consume). Thoughts? Thanx, Paul documentation: Record rcu_dereference() value mishandling Recent LKML discussings (see http://lwn.net/Articles/586838/ and http://lwn.net/Articles/588300/ for the LWN writeups) brought out some ways of misusing the return value from rcu_dereference() that are not necessarily completely intuitive. This commit therefore documents what can and cannot safely be done with these values. Signed-off-by: Paul E. McKenney diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX index fa57139f50bf..f773a264ae02 100644 --- a/Documentation/RCU/00-INDEX +++ b/Documentation/RCU/00-INDEX @@ -12,6 +12,8 @@ lockdep-splat.txt - RCU Lockdep splats explained. NMI-RCU.txt - Using RCU to Protect Dynamic NMI Handlers +rcu_dereference.txt + - Proper care and feeding of return values from rcu_dereference() rcubarrier.txt - RCU and Unloadable Modules rculist_nulls.txt diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 9d10d1db16a5..877947130ebe 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -114,12 +114,16 @@ over a rather long period of time, but improvements are always welcome! http://www.openvms.compaq.com/wizard/wiz_2637.html The rcu_dereference() primitive is also an excellent - documentation aid, letting the person reading the code - know exactly which pointers are protected by RCU. + documentation aid, letting the person reading the +