i did not use the term "pure" properly. please read instead "posix_memalign is a function that does not modify any user variable" that assumption is correct when there is no wrapper, and incorrect in our case.
On Saturday, June 6, 2015, George Bosilca <bosi...@icl.utk.edu> wrote: > Based on the definition of a pure function, two calls to the same pure > function with the same arguments should return the same value. This is > obviously never true for posix_memalign. > > I suggest we take a look at the generated code, and based on that we > decide how to move forward on this. > > George. > > > > On Jun 4, 2015, at 22:47 , Gilles Gouaillardet <gil...@rist.or.jp > <javascript:;>> wrote: > > > > Jeff, > > > > imho, this is a grey area ... > > > > 99.999% of the time, posix_memalign is a "pure" function. > > "pure" means it has no side effects. > > > > unfortunatly, this part of the code is the 0.001% case in which we > explicitly rely on a side effect > > (e.g. posix_memalign calls an Open MPI wrapper that updates a global > variable) > > > > i am not surprised (nor shocked) the compiler assumes posix_memalign is > side effect free when (aggressive) optimisation is on, since it is a valid > asumption most of the time. > > > > Of course, the is a chance that might be a bug > > (e.g. GCC folks forgot a wrapper can be added to posix_memalign, so it > is not 100% safe to assume posix_memalign is side effect free) > > but as far as i am concerned, using "volatile" is correct. > > > > Cheers, > > > > Gilles > > > > > > > > > > On 6/5/2015 10:26 AM, Jeff Squyres (jsquyres) wrote: > >> On Jun 4, 2015, at 5:48 AM, René Oertel <rene.oer...@cs.tu-chemnitz.de > <javascript:;>> wrote: > >>> Problem description: > >>> =================== > >>> > >>> The critical code in question is in > >>> opal/mca/memory/linux/memory_linux_ptmalloc2.c: > >>> ##### > >>> 92 #if HAVE_POSIX_MEMALIGN > >>> 93 /* Double check for posix_memalign, too */ > >>> 94 if (mca_memory_linux_component.memalign_invoked) { > >>> 95 mca_memory_linux_component.memalign_invoked = false; > >>> 96 if (0 != posix_memalign(&p, sizeof(void*), 1024 * 1024)) { > >>> 97 return OPAL_ERR_IN_ERRNO; > >>> 98 } > >>> 99 free(p); > >>> 100 } > >>> 101 #endif > >>> 102 > >>> 103 if (mca_memory_linux_component.malloc_invoked && > >>> 104 mca_memory_linux_component.realloc_invoked && > >>> 105 mca_memory_linux_component.memalign_invoked && > >>> 106 mca_memory_linux_component.free_invoked) { > >>> 107 /* Happiness; our functions were invoked */ > >>> 108 val |= OPAL_MEMORY_FREE_SUPPORT | > OPAL_MEMORY_CHUNK_SUPPORT; > >>> 109 } > >>> [...] > >>> 121 /* All done */ > >>> 122 if (val > 0) { > >>> 123 opal_mem_hooks_set_support(val); > >>> 124 return OPAL_SUCCESS; > >>> 125 } > >>> ##### > >>> > >>> The code at lines 103-109 is legally optimized away with >=GCC-4.9 and > >>> optimizations turned on, > >> I'm not sure what you mean by "legally optimized away" -- shouldn't gcc > know that the call to posix_memalign() can change global variables (such as > mca_memory_linux_component.<foo>)? > >> > >>> because line 105 could never become true with > >>> the local knowledge of the compiler/optimizer. > >> My compiler knowledge may be a bit rusty, but if this optimization is > being made solely with local knowledge, this sounds like a buggy > optimization...? > >> > >>> If mca_memory_linux_component.memalign_invoked == true at line 92, it > >>> would be set to false at line 95. > >> ...but potentially set to true again in the body of posix_memalign(). > ...which you describe below. > >> > >>> If mca_memory_linux_component.memalign_invoked == false at line 92, it > >>> would be false at line 103, too. > >>> In both cases, the if at line 103-106 could never be evaluated to true > >>> and opal_mem_hooks_set_support is never called with > >>> OPAL_MEMORY_FREE_SUPPORT | OPAL_MEMORY_CHUNK_SUPPORT resulting in > >>> (silently) disabled mpi_leaved_pinned. > >>> > >>> In the global view this local assumption is not true, because > >>> posix_memalign() in line 96 will call the hook public_mEMALIGn() in > >>> opal/mca/memory/linux/malloc.c which in turn sets > >>> mca_memory_linux_component.memalign_invoked = true. > >>> As a result, the OPAL_MEMORY_*_SUPPORT would get configured correctly > in > >>> line 123 and so the opal_mem_hooks_support_level() used by > >>> ompi/mca/btl/openib/btl_openib_component.c and indirectly by the > >>> ompi/mca/mpool/grdma/mpool_grdma* module enables the usage of pinned > memory. > >> How can a compiler not take the global view? Taking a local-only view > is unsafe -- for exactly these kinds of reasons. > >> > >>> The optimization could be disabled by adding -fno-tree-partial-pre to > >>> the CFLAGS in opal/mca/memory/linux/Makefile, but this is only a > >>> temporary workaround. > >>> > >>> Patch: > >>> ===== > >>> > >>> The proposed patch is as follows, which alters the point-of-view of the > >>> compiler's optimizer on the *_invoked variables used by different code > >>> paths (memory_linux_ptmalloc2.c vs. malloc.c): > >>> > >>> ##### > >>> diff -rupN openmpi-1.8.5.org/opal/mca/memory/linux/memory_linux.h > >>> openmpi-1.8.5/opal/mca/memory/linux/memory_linux.h > >>> --- openmpi-1.8.5.org/opal/mca/memory/linux/memory_linux.h > >>> 2014-10-03 22:32:23.000000000 +0200 > >>> +++ openmpi-1.8.5/opal/mca/memory/linux/memory_linux.h 2015-06-04 > >>> 10:01:44.941544282 +0200 > >>> @@ -33,11 +33,11 @@ typedef struct opal_memory_linux_compone > >>> > >>> #if MEMORY_LINUX_PTMALLOC2 > >>> /* Ptmalloc2-specific data */ > >>> - bool free_invoked; > >>> - bool malloc_invoked; > >>> - bool realloc_invoked; > >>> - bool memalign_invoked; > >>> - bool munmap_invoked; > >>> + volatile bool free_invoked; > >>> + volatile bool malloc_invoked; > >>> + volatile bool realloc_invoked; > >>> + volatile bool memalign_invoked; > >>> + volatile bool munmap_invoked; > >>> #endif > >>> } opal_memory_linux_component_t; > >> I don't really want to apply this patch (thanks for filing the PR, BTW) > without understanding why the compiler is not automatically taking the > global view. It seems unsafe. > >> > >>> ##### > >>> > >>> Additionally, a further patch should be applied generating a warning in > >>> the GPUDirect module if leave_pinned is not available for some reason. > >>> In this case, GPUDirect support should be disabled, because it runs > >>> faster without (see Case 2 below). > >> I'll let Rolf / NVIDIA comment on the GPU implications. > >> > >> [snip] > >> > >>> Best regards, > >>> > >>> René "oere" Oertel > >> Many thanks for the detailed report! > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <javascript:;> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27042.php > > _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:;> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27043.php