It is always a good idea to have your application's sizeof(INTEGER) match the MPI's sizeof(INTEGER). Having them mismatch is a recipe for trouble.
Meaning: if you're compiling your app with -make-integer-be-8-bytes, then you should configure/build Open MPI with that same flag. I'm thinking that this should *only* affect the back-end behavior of MPI_INTEGER; the size of address pointers and whatnot should not be affected (unless -make-integer-be-8-bytes also changes the sizes of some other types). On Dec 5, 2010, at 9:01 PM, Gustavo Correa wrote: > Hi Benjamin > > I guess you could compile OpenMPI with standard integer and real sizes. > Then compile your application (DRAGON) with the flags to change to 8-byte > integers and 8-byte reals. > We have some programs here that use real8 and are compiled this way, > and run without a problem. > I guess this is what Tim Prince was also telling you in his comments. > > You can pass those flags to the MPI compiler wrappers (mpif77 etc), > which will relay them to gfortran when you compile DRAGON. > > I am not even sure if those flags would be accepted or ignored by OpenMPI > when you build it. > I guess they will be ignored. > You could check this out by looking at the MPI type sizes in your header > files in the include directory and subdirectories. > > Maybe an OpenMPI developer could shed some light here. > > Moreover, if I remember right, > the MPI address type complies with the machine architecture, > i.e., 32 bits if your machine is 32-bit, 64-bits if the machine is 64-bit, > and you don't need to force it to be 8-bytes with compilation flags. > > Unfortunately mixing pointers ("Cray pointers", I suppose) > with integers is a common source of headaches, if DRAGON does this. > It is yet another possible situation where negative integers could crop in > and lead to segmentation fault. > At least one ocean circulation model we run here had > many problems because of this mix of integers and (Cray) pointers > spread all across the code. > > Gus Correa > > On Dec 5, 2010, at 7:17 PM, Benjamin Toueg wrote: > >> Unfortunately DRAGON is old FORTRAN77. Integers have been used instead of >> pointers. If I compile it in 64bits without -f-default-integer-8, the >> so-called pointers will remain in 32bits. Problems could also arise from its >> data structure handlers. >> >> Therefore -f-default-integer-8 is absolutely necessary. >> >> Futhermore MPI_SEND and MPI_RECEIVE are called a dozen times in only one >> source file (used for passing a data structure from one node to another) and >> it has proved to be working in every situtation. >> >> Not knowing which line is causing my segfault is annoying. <323.gif> >> >> Regards, >> Benjamin >> >> 2010/12/6 Gustavo Correa <g...@ldeo.columbia.edu> >> Hi Benjamin >> >> I would just rebuild OpenMPI withOUT the compiler flags that change the >> standard >> sizes of "int" and "float" (do a "make cleandist" first!), then recompile >> your program, >> and see how it goes. >> I don't think you are gaining anything by trying to change the standard >> "int/integer" and >> "real/float" sizdes, and most likely they are inviting trouble, making >> things more confusing. >> Worst scenario, you will at least be sure that the bug is somewhere else, >> not on the mismatch >> of basic type sizes. >> >> If you need to pass 8-byte real buffers, use MPI_DOUBLE_PRECISION, or >> MPI_REAL8 >> in your (Fortran) MPI calls, and declare them in the Fortran code accordingly >> (double precision or real(kind=8)). >> >> If I remember right, there is no 8-byte integer support in the Fortran MPI >> bindings, >> only in the C bindings, but some OpenMPI expert could clarify this. >> Hence, if you are passing 8-byte integers in your MPI calls this may be also >> problematic. >> >> My two cents, >> Gus Correa >> >> On Dec 5, 2010, at 3:04 PM, Benjamin Toueg wrote: >> >>> Hi, >>> >>> First of all thanks for your insight ! >>> >>> Do you get a corefile? >>> I don't get a core file, but I get a file called _FIL001. It doesn't >>> contain any debugging symbols. It's most likely a digested version of the >>> input file given to the executable : ./myexec < inputfile. >>> >>> there's no line numbers printed in the stack trace >>> I would love to see those, but even if I compile openmpi with -debug >>> -mem-debug -mem-profile, they don't show up. I recompiled my sources to be >>> sure to properly link them to the newly debugged version of openmpi. I >>> assumed I didn't need to compile my own sources with -g option since it >>> crashes in openmpi itself ? I didn't try to run mpiexec via gdb either, I >>> guess it wont help since I already get the trace. >>> >>> the -fdefault-integer-8 options ought to be highly dangerous >>> Thanks for noting. Indeed I had some issues with this option. For instance >>> I have to declare some arguments as INTEGER*4 like RANK,SIZE,IERR in : >>> CALL MPI_COMM_RANK(MPI_COMM_WORLD,RANK,IERR) >>> CALL MPI_COMM_SIZE(MPI_COMM_WORLD,SIZE,IERR) >>> In your example "call MPI_Send(buf, count, MPI_INTEGER, dest, tag, >>> MPI_COMM_WORLD, mpierr)" I checked that count is never bigger than 2000 (as >>> you mentioned it could flip to the negative). However I haven't declared it >>> as INTEGER*4 and I think I should. >>> When I said "I had to raise the number of data strucutures to be sent", I >>> meant that I had to call MPI_SEND many more times, not that buffers were >>> bigger than before. >>> >>> I'll get back to you with more info when I'll be able to fix my connexion >>> problem to the cluster... >>> >>> Thanks, >>> Benjamin >>> >>> 2010/12/3 Martin Siegert <sieg...@sfu.ca> >>> Hi All, >>> >>> just to expand on this guess ... >>> >>> On Thu, Dec 02, 2010 at 05:40:53PM -0500, Gus Correa wrote: >>>> Hi All >>>> >>>> I wonder if configuring OpenMPI while >>>> forcing the default types to non-default values >>>> (-fdefault-integer-8 -fdefault-real-8) might have >>>> something to do with the segmentation fault. >>>> Would this be effective, i.e., actually make the >>>> the sizes of MPI_INTEGER/MPI_INT and MPI_REAL/MPI_FLOAT bigger, >>>> or just elusive? >>> >>> I believe what happens is that this mostly affects the fortran >>> wrapper routines and the way Fortran variables are mapped to C: >>> >>> MPI_INTEGER -> MPI_LONG >>> MPI_FLOAT -> MPI_DOUBLE >>> MPI_DOUBLE_PRECISION -> MPI_DOUBLE >>> >>> In that respect I believe that the -fdefault-real-8 option is harmless, >>> i.e., it does the expected thing. >>> But the -fdefault-integer-8 options ought to be highly dangerous: >>> It works for integer variables that are used as "buffer" arguments >>> in MPI statements, but I would assume that this does not work for >>> "count" and similar arguments. >>> Example: >>> >>> integer, allocatable :: buf(*,*) >>> integer i, count, dest, tag, mpierr >>> >>> i = 32768 >>> i2 = 2*i >>> allocate(buf(i,i2)) >>> count = i*i2 >>> buf = 1 >>> call MPI_Send(buf, count, MPI_INTEGER, dest, tag, MPI_COMM_WORLD, mpierr) >>> >>> Now count is 2^31 which overflows a 32bit integer. >>> The MPI standard requires that count is a 32bit integer, correct? >>> Thus while buf gets the type MPI_LONG, count remains an int. >>> Is this interpretation correct? If it is, then you are calling >>> MPI_Send with a count argument of -2147483648. >>> Which could result in a segmentation fault. >>> >>> Cheers, >>> Martin >>> >>> -- >>> Martin Siegert >>> Head, Research Computing >>> WestGrid/ComputeCanada Site Lead >>> IT Services phone: 778 782-4691 >>> Simon Fraser University fax: 778 782-4242 >>> Burnaby, British Columbia email: sieg...@sfu.ca >>> Canada V5A 1S6 >>> >>>> There were some recent discussions here about MPI >>>> limiting counts to MPI_INTEGER. >>>> Since Benjamin said he "had to raise the number of data structures", >>>> which eventually led to the the error, >>>> I wonder if he is inadvertently flipping to negative integer >>>> side of the 32-bit universe (i.e. >= 2**31), as was reported here by >>>> other list subscribers a few times. >>>> >>>> Anyway, segmentation fault can come from many different places, >>>> this is just a guess. >>>> >>>> Gus Correa >>>> >>>> Jeff Squyres wrote: >>>>> Do you get a corefile? >>>>> >>>>> It looks like you're calling MPI_RECV in Fortran and then it segv's. >>>>> This is *likely* because you're either passing a bad parameter or your >>>>> buffer isn't big enough. Can you double check all your parameters? >>>>> >>>>> Unfortunately, there's no line numbers printed in the stack trace, so >>>>> it's not possible to tell exactly where in the ob1 PML it's dying (i.e., >>>>> so we can't see exactly what it's doing to cause the segv). >>>>> >>>>> >>>>> >>>>> On Dec 2, 2010, at 9:36 AM, Benjamin Toueg wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am using DRAGON, a neutronic simulation code in FORTRAN77 that has its >>>>>> own datastructures. I added a module to send these data structures >>>>>> thanks to MPI_SEND / MPI_RECEIVE, and everything worked perfectly for a >>>>>> while. >>>>>> >>>>>> Then I had to raise the number of data structures to be sent up to a >>>>>> point where my cluster has this bug : >>>>>> *** Process received signal *** >>>>>> Signal: Segmentation fault (11) >>>>>> Signal code: Address not mapped (1) >>>>>> Failing at address: 0x2c2579fc0 >>>>>> [ 0] /lib/libpthread.so.0 [0x7f52d2930410] >>>>>> [ 1] /home/toueg/openmpi/lib/openmpi/mca_pml_ob1.so [0x7f52d153fe03] >>>>>> [ 2] /home/toueg/openmpi/lib/libmpi.so.0(PMPI_Recv+0x2d2) >>>>>> [0x7f52d3504a1e] >>>>>> [ 3] /home/toueg/openmpi/lib/libmpi_f77.so.0(pmpi_recv_+0x10e) >>>>>> [0x7f52d36cf9c6] >>>>>> >>>>>> How can I make this error more explicit ? >>>>>> >>>>>> I use the following configuration of openmpi-1.4.3 : >>>>>> ./configure --enable-debug --prefix=/home/toueg/openmpi CXX=g++ CC=gcc >>>>>> F77=gfortran FC=gfortran FLAGS="-m64 -fdefault-integer-8 >>>>>> -fdefault-real-8 -fdefault-double-8" FCFLAGS="-m64 -fdefault-integer-8 >>>>>> -fdefault-real-8 -fdefault-double-8" --disable-mpi-f90 >>>>>> >>>>>> Here is the output of mpif77 -v : >>>>>> mpif77 for 1.2.7 (release) of : 2005/11/04 11:54:51 >>>>>> Driving: f77 -L/usr/lib/mpich-mpd/lib -v -lmpich-p4mpd -lpthread -lrt >>>>>> -lfrtbegin -lg2c -lm -shared-libgcc >>>>>> Lecture des spécification à partir de >>>>>> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/specs >>>>>> Configuré avec: ../src/configure -v --enable-languages=c,c++,f77,pascal >>>>>> --prefix=/usr --libexecdir=/usr/lib >>>>>> --with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared >>>>>> --with-system-zlib --enable-nls --without-included-gettext >>>>>> --program-suffix=-3.4 --enable-__cxa_atexit --enable-clocale=gnu >>>>>> --enable-libstdcxx-debug x86_64-linux-gnu >>>>>> Modèle de thread: posix >>>>>> version gcc 3.4.6 (Debian 3.4.6-5) >>>>>> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/collect2 --eh-frame-hdr -m >>>>>> elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 >>>>>> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crt1.o >>>>>> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crti.o >>>>>> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtbegin.o -L/usr/lib/mpich-mpd/lib >>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 >>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 >>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib >>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../.. -L/lib/../lib >>>>>> -L/usr/lib/../lib -lmpich-p4mpd -lpthread -lrt -lfrtbegin -lg2c -lm >>>>>> -lgcc_s -lgcc -lc -lgcc_s -lgcc >>>>>> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtend.o >>>>>> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crtn.o >>>>>> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/libfrtbegin.a(frtbegin.o): >>>>>> dans la fonction ▒ main ▒: >>>>>> (.text+0x1e): référence indéfinie vers ▒ MAIN__ ▒ >>>>>> collect2: ld a retourné 1 code d'état d'exécution >>>>>> >>>>>> Thanks, >>>>>> Benjamin >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/