George, I hace done some modifications to the code, however this is the first part my zmp_list: ! ZEUSMP2 CONFIGURATION FILE &GEOMCONF LGEOM = 2, LDIMEN = 2 / &PHYSCONF LRAD = 0, XHYDRO = .TRUE., XFORCE = .TRUE., XMHD = .false., XTOTNRG = .false., XGRAV = .false., XGRVFFT = .false., XPTMASS = .false., XISO = .false., XSUBAV = .false., XVGRID = .false., !- - - - - - - - - - - - - - - - - - - XFIXFORCE = .TRUE., XFIXFORCE2 = .TRUE., !- - - - - - - - - - - - - - - - - - - XSOURCEENERGY = .TRUE., XSOURCEMASS = .TRUE., !- - - - - - - - - - - - - - - - - - - XRADCOOL = .TRUE., XA_RGB_WINDS = .TRUE., XSNIa = .TRUE./ !===================================== &IOCONF XASCII = .false., XA_MULT = .false., XHDF = .TRUE., XHST = .TRUE., XRESTART = .TRUE., XTSL = .false., XDPRCHDF = .TRUE., XTTY = .TRUE. , XAGRID = .false. / &PRECONF SMALL_NO = 1.0D-307, LARGE_NO = 1.0D+307 / &ARRAYCONF IZONES = 100, JZONES = 125, KZONES = 1, MAXIJK = 125/ &mpitop ntiles(1)=5,ntiles(2)=2,ntiles(3)=1,periodic=2*.false.,.true. /
I have done some tests, and currently I'm able to perform a run with 10 processes on 10 nodes, ie I use only 1 of two CPUs in a node. It crashes after 6 hours, and not after 20 minutes! 2012/9/6 <users-requ...@open-mpi.org>: > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Re: error compiling openmpi-1.6.1 on Windows 7 (Siegmar Gross) > 2. Re: OMPI 1.6.x Hang on khugepaged 100% CPU time (Yong Qin) > 3. Regarding the Pthreads (seshendra seshu) > 4. Re: some mpi processes "disappear" on a cluster of servers > (George Bosilca) > 5. SIGSEGV in OMPI 1.6.x (Yong Qin) > 6. Re: error compiling openmpi-1.6.1 on Windows 7 (Siegmar Gross) > 7. Re: Infiniband performance Problem and stalling > (Yevgeny Kliteynik) > 8. Re: SIGSEGV in OMPI 1.6.x (Jeff Squyres) > 9. Re: Regarding the Pthreads (Jeff Squyres) > 10. Re: python-mrmpi() failed (Jeff Squyres) > 11. Re: MPI_Cart_sub periods (Jeff Squyres) > 12. Re: error compiling openmpi-1.6.1 on Windows 7 (Shiqing Fan) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 5 Sep 2012 17:43:50 +0200 (CEST) > From: Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> > Subject: Re: [OMPI users] error compiling openmpi-1.6.1 on Windows 7 > To: f...@hlrs.de > Cc: us...@open-mpi.org > Message-ID: <201209051543.q85fhoba021...@tyr.informatik.hs-fulda.de> > Content-Type: TEXT/plain; charset=ISO-8859-1 > > Hi Shiqing, > >> Could you try set OPENMPI_HOME env var to the root of the Open MPI dir? >> This env is a backup option for the registry. > > It solves one problem but there is a new problem now :-(( > > > Without OPENMPI_HOME: Wrong pathname to help files. > > D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe > -------------------------------------------------------------------------- > Sorry! You were supposed to get help about: > invalid if_inexclude > But I couldn't open the help file: > D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt: > No such file or directory. Sorry! > -------------------------------------------------------------------------- > ... > > > > With OPENMPI_HOME: It nearly uses the correct directory. Unfortunately > the pathname contains the character " in the wrong place so that it > couldn't find the available help file. > > set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1" > > D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe > -------------------------------------------------------------------------- > Sorry! You were supposed to get help about: > no-hostfile > But I couldn't open the help file: > "c:\Program Files (x86)\openmpi-1.6.1"\share\openmpi\help-hostfile.txt: > Invalid argument. Sorry > ! > -------------------------------------------------------------------------- > [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file > ..\..\openmpi-1.6.1\orte\mca\ras\base > \ras_base_allocate.c at line 200 > [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file > ..\..\openmpi-1.6.1\orte\mca\plm\base > \plm_base_launch_support.c at line 99 > [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file > ..\..\openmpi-1.6.1\orte\mca\plm\proc > ess\plm_process_module.c at line 996 > > > > It looks like that the environment variable can also solve my > problem in the 64-bit environment. > > D:\g...\prog\mpi\small_prog>mpicc init_finalize.c > > Microsoft (R) C/C++-Optimierungscompiler Version 16.00.40219.01 f?r x64 > ... > > > The process hangs without OPENMPI_HOME. > > D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe > ^C > > > With OPENMPI_HOME: > > set OPENMPI_HOME="c:\Program Files\openmpi-1.6.1" > > D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe > -------------------------------------------------------------------------- > Sorry! You were supposed to get help about: > no-hostfile > But I couldn't open the help file: > "c:\Program Files\openmpi-1.6.1"\share\openmpi\help-hostfile.txt: Invalid > argument. S > orry! > -------------------------------------------------------------------------- > [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file > ..\..\openmpi-1.6.1\orte\mc > a\ras\base\ras_base_allocate.c at line 200 > [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file > ..\..\openmpi-1.6.1\orte\mc > a\plm\base\plm_base_launch_support.c at line 99 > [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file > ..\..\openmpi-1.6.1\orte\mc > a\plm\process\plm_process_module.c at line 996 > > > At least the program doesn't block any longer. Do you have any ideas > how this new problem can be solved? > > > Kind regards > > Siegmar > > > >> On 2012-09-05 1:02 PM, Siegmar Gross wrote: >> > Hi Shiqing, >> > >> >>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >> >>>> --------------------------------------------------------------------- >> >>>> Sorry! You were supposed to get help about: >> >>>> invalid if_inexclude >> >>>> But I couldn't open the help file: >> >>>> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt: >> >>>> No such file or directory. Sorry! >> >>>> --------------------------------------------------------------------- >> >>> ... >> >>>> Why does "mpiexec" look for the help file relativ to my current >> >>>> program and not relative to itself? The file is part of the >> >>>> package. >> >>> Do you know how I can solve this problem? >> >> I have similar issue with message from tcp, but it's not finding the >> >> file, it's something else, which doesn't affect the execution of the >> >> application. Could you make sure the help-mpi-btl-tcp.txt is actually in >> >> the path D:\...\prog\mpi\small_prog\..\share\openmpi\? >> > That wouldn't be a good idea because I have MPI programs in different >> > directories so that I would have to install all help files in several >> > places (<my_directory>/../share/openmpi/help*.txt). All help files are >> > available in the installation directory of Open MPI. >> > >> > dir "c:\Program Files (x86)\openmpi-1.6.1\bin\mpiexec.exe" >> > ... >> > 29.08.2012 10:59 38.912 mpiexec.exe >> > ... >> > dir "c:\Program Files >> > (x86)\openmpi-1.6.1\bin\..\share\openmpi\help-mpi-btl-tcp.txt" >> > ... >> > 03.04.2012 16:30 631 help-mpi-btl-tcp.txt >> > ... >> > >> > I don't know if "mpiexec" or my program "init_finilize" is responsible >> > for the error message but whoever is responsible shouldn't use the path >> > to my program but the prefix_dir from MPI to find the help files. Perhaps >> > you can change the behaviour in the Open MPI source code. >> > >> > >> >>>> I can also compile in 64-bit mode but the program hangs. >> >>> Do you have any ideas why the program hangs? Thank you very much for any >> >>> help in advance. >> >> To be honest I don't know. I couldn't reproduce it. Did you try >> >> installing the binary installer, will it also behave the same? >> > I like to have different versions of Open MPI which I activate via >> > a batch file so that I can still run my program in an old version if >> > something goes wrong in a new one. I have no entries in the system >> > environment or registry so that I can even run different versions in >> > different command windows without problems (everything is only known >> > within the command window in which a have run my batch file). It seems >> > that you put something in the registry when I use your installer. >> > Perhaps you remember an earlier email where I had to uninstall an old >> > version because the environment in my own installation was wrong >> > as long as your installation was active. Nevertheless I can give it >> > a try. Perhaps I find out if you set more than just the path to your >> > binaries. Do you know if there is something similar to "truss" or >> > "strace" in the UNIX world so that I can see where the program hangs? >> > Thank you very much for your help in advance. >> > >> > >> > Kind regards >> > >> > Siegmar >> > >> >> >> -- >> --------------------------------------------------------------- >> Shiqing Fan >> High Performance Computing Center Stuttgart (HLRS) >> Tel: ++49(0)711-685-87234 Nobelstrasse 19 >> Fax: ++49(0)711-685-65832 70569 Stuttgart >> http://www.hlrs.de/organization/people/shiqing-fan/ >> email: f...@hlrs.de >> > > > > > ------------------------------ > > Message: 2 > Date: Wed, 5 Sep 2012 09:07:35 -0700 > From: Yong Qin <yong....@gmail.com> > Subject: Re: [OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time > To: klit...@dev.mellanox.co.il > Cc: Open MPI Users <us...@open-mpi.org> > Message-ID: > <CADEJBEWq0Rzfi_uKx8U4Uz4tjz=vJzn1=rdtphpyul04cv9...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Yes, so far this has only been observed in VASP and a specific dataset. > > Thanks, > > On Wed, Sep 5, 2012 at 4:52 AM, Yevgeny Kliteynik > <klit...@dev.mellanox.co.il> wrote: >> On 9/4/2012 7:21 PM, Yong Qin wrote: >>> On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik >>> <klit...@dev.mellanox.co.il> wrote: >>>> On 8/30/2012 10:28 PM, Yong Qin wrote: >>>>> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres<jsquy...@cisco.com> wrote: >>>>>> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote: >>>>>> >>>>>>> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but >>>>>>> not on 1.4.5 (tcp btl is always fine). The application is VASP and >>>>>>> only one specific dataset is identified during the testing, and the OS >>>>>>> is SL 6.2 with kernel 2.6.32-220.23.1.el6.x86_64. The issue is that >>>>>>> when a certain type of load is put on OMPI 1.6.x, khugepaged thread >>>>>>> always runs with 100% CPU load, and it looks to me like that OMPI is >>>>>>> waiting for some memory to be available thus appears to be hung. >>>>>>> Reducing the per node processes would sometimes ease the problem a bit >>>>>>> but not always. So I did some further testing by playing around with >>>>>>> the kernel transparent hugepage support. >>>>>>> >>>>>>> 1. Disable transparent hugepage support completely (echo never >>>>>>>> /sys/kernel/mm/redhat_transparent_hugepage/enabled). This would allow >>>>>>> the program to progress as normal (as in 1.4.5). Total run time for an >>>>>>> iteration is 3036.03 s. >>>>>> >>>>>> I'll admit that we have not tested using transparent hugepages. I >>>>>> wonder if there's some kind of bad interaction going on here... >>>>> >>>>> The transparent hugepage is "transparent", which means it is >>>>> automatically applied to all applications unless it is explicitly told >>>>> otherwise. I highly suspect that it is not working properly in this >>>>> case. >>>> >>>> Like Jeff said - I don't think we've ever tested OMPI with transparent >>>> huge pages. >>>> >>> >>> Thanks. But have you tested OMPI under RHEL 6 or its variants (CentOS >>> 6, SL 6)? THP is on by default in RHEL 6 so no matter you want it or >>> not it's there. >> >> Interesting. Indeed, THP is on be default in RHEL 6.x. >> I run OMPI 1.6.x constantly on RHEL 6.2, and I've never seen this problem. >> >> I'm checking it with OFED folks, but I doubt that there are some dedicated >> tests for THP. >> >> So do you see it only with a specific application and only on a specific >> data set? Wonder if I can somehow reproduce it in-house... >> >> -- YK > > > ------------------------------ > > Message: 3 > Date: Wed, 5 Sep 2012 20:23:05 +0200 > From: seshendra seshu <seshu...@gmail.com> > Subject: [OMPI users] Regarding the Pthreads > To: Open MPI Users <us...@open-mpi.org> > Message-ID: > <CAJ_xm3AYtMt22NgjtY67TuwOpZxev0ZYSW4fEYGxKA=2yvd...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hi, > I am learning pthreads and trying to implement the pthreads in my > quicksortprogram. > My problem is iam unable to understand how to implement the pthreads at > data received at a node from the master (In detail: In my program Master > will divide the data and send to the slaves and each slave will do the > sorting independently of The received data and send back to master after > sorting is done. Now Iam having a problem in Implementing the pthreads at > the slaves,i.e how to implement the pthreads in order to share data among > the cores in each slave and sort the data and send it back to master. > So could anyone help in solving this problem by providing some suggestions > and clues. > > Thanking you very much. > > -- > WITH REGARDS > M.L.N.Seshendra > -------------- next part -------------- > HTML attachment scrubbed and removed > > ------------------------------ > > Message: 4 > Date: Thu, 6 Sep 2012 02:40:19 +0200 > From: George Bosilca <bosi...@eecs.utk.edu> > Subject: Re: [OMPI users] some mpi processes "disappear" on a cluster > of servers > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <f6f521b2-df90-4827-8abf-abe0f3599...@eecs.utk.edu> > Content-Type: text/plain; charset=us-ascii > > Andrea, > > As suggested by the previous answers I guess the size of your problem is too > large for the memory available on the nodes. I can runs ZeusMP without any > issues up to 64 processes, both over Ethernet and Infiniband. I tried the 1.6 > and the current trunk, and both perform as expected. > > What is the content of your zmp_inp file? > > george. > > On Sep 1, 2012, at 16:01 , Andrea Negri <negri.an...@gmail.com> wrote: > >> I have tried to run with a single process (i.e. the entire grid is >> contained by one process) and the the command free -m on the compute >> node returns >> >> total used free shared buffers cached >> Mem: 3913 1540 2372 0 49 1234 >> -/+ buffers/cache: 257 3656 >> Swap: 1983 0 1983 >> >> >> while top returns >> top - 16:01:09 up 4 days, 5:56, 1 user, load average: 0.53, 0.16, 0.10 >> Tasks: 63 total, 3 running, 60 sleeping, 0 stopped, 0 zombie >> Cpu(s): 49.4% us, 0.7% sy, 0.0% ni, 49.9% id, 0.0% wa, 0.0% hi, 0.0% si >> Mem: 4007720k total, 1577968k used, 2429752k free, 50664k buffers >> Swap: 2031608k total, 0k used, 2031608k free, 1263844k cached >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > ------------------------------ > > Message: 5 > Date: Wed, 5 Sep 2012 21:06:12 -0700 > From: Yong Qin <yong....@gmail.com> > Subject: [OMPI users] SIGSEGV in OMPI 1.6.x > To: Open MPI Users <us...@open-mpi.org> > Message-ID: > <CADEJBEVFcsyh5WnK=3yj6w7b2aasrf7yc4uimcvaqia-j6c...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > While debugging a mysterious crash of a code, I was able to trace down > to a SIGSEGV in OMPI 1.6 and 1.6.1. The offending code is in > opal/mca/memory/linux/malloc.c. Please see the following gdb log. > > (gdb) c > Continuing. > > Program received signal SIGSEGV, Segmentation fault. > opal_memory_ptmalloc2_int_free (av=0x2fd0637, mem=0x203a746f74512000) > at malloc.c:4385 > 4385 nextsize = chunksize(nextchunk); > (gdb) l > 4380 Consolidate other non-mmapped chunks as they arrive. > 4381 */ > 4382 > 4383 else if (!chunk_is_mmapped(p)) { > 4384 nextchunk = chunk_at_offset(p, size); > 4385 nextsize = chunksize(nextchunk); > 4386 assert(nextsize > 0); > 4387 > 4388 /* consolidate backward */ > 4389 if (!prev_inuse(p)) { > (gdb) bt > #0 opal_memory_ptmalloc2_int_free (av=0x2fd0637, > mem=0x203a746f74512000) at malloc.c:4385 > #1 0x00002ae6b18ea0c0 in opal_memory_ptmalloc2_free (mem=0x2fd0637) > at malloc.c:3511 > #2 0x00002ae6b18ea736 in opal_memory_linux_free_hook > (__ptr=0x2fd0637, caller=0x203a746f74512000) at hooks.c:705 > #3 0x0000000001412fcc in for_dealloc_allocatable () > #4 0x00000000007767b1 in ALLOC::dealloc_d2 (array=@0x2fd0647, > name=@0x6f6e6f69006f6e78, routine=Cannot access memory at address 0x0 > ) at alloc.F90:1357 > #5 0x000000000082628c in M_LDAU::hubbard_term (scell=..., nua=@0xd5, > na=@0xd5, isa=..., xa=..., indxua=..., maxnh=@0xcf4ff, maxnd=@0xcf4ff, > lasto=..., iphorb=..., > numd=..., listdptr=..., listd=..., numh=..., listhptr=..., > listh=..., nspin=@0xcf4ff00000002, dscf=..., eldau=@0x0, deldau=@0x0, > fa=..., stress=..., h=..., > first=@0x0, last=@0x0) at ldau.F:752 > #6 0x00000000006cd532 in M_SETUP_HAMILTONIAN::setup_hamiltonian > (first=@0x0, last=@0x0, iscf=@0x2) at setup_hamiltonian.F:199 > #7 0x000000000070e257 in M_SIESTA_FORCES::siesta_forces > (istep=@0xf9a4d07000000000) at siesta_forces.F:90 > #8 0x000000000070e475 in siesta () at siesta.F:23 > #9 0x000000000045e47c in main () > > Can anybody shed some light here on what could be wrong? > > Thanks, > > Yong Qin > > > ------------------------------ > > Message: 6 > Date: Thu, 6 Sep 2012 07:48:34 +0200 (CEST) > From: Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> > Subject: Re: [OMPI users] error compiling openmpi-1.6.1 on Windows 7 > To: f...@hlrs.de > Cc: us...@open-mpi.org > Message-ID: <201209060548.q865myke023...@tyr.informatik.hs-fulda.de> > Content-Type: TEXT/plain; charset=ISO-8859-1 > > Hi Shiqing, > > I have solved the problem with the double quotes in OPENMPI_HOME but > there is still something wrong. > > set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1" > > mpicc init_finalize.c > Cannot open configuration file "c:\Program Files > (x86)\openmpi-1.6.1"/share/openmpi\mpicc-wrapper-data.txt > Error parsing data file mpicc: Not found > > > Everything is OK if you remove the double quotes which Windows > automatically adds. > > set OPENMPI_HOME=c:\Program Files (x86)\openmpi-1.6.1 > > mpicc init_finalize.c > Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 16.00.40219.01 f?r > 80x86 > ... > > mpiexec init_finalize.exe > -------------------------------------------------------------------------- > WARNING: An invalid value was given for btl_tcp_if_exclude. This > value will be ignored. > > Local host: hermes > Value: 127.0.0.1/8 > Message: Did not find interface matching this subnet > -------------------------------------------------------------------------- > > Hello! > > > I get the output from my program but also a warning from Open MPI. > The new value for the loopback device was introduced a short time > ago when I have had problems with the loopback device on Solaris > (it used "lo0" instead of your default "lo"). How can I avoid this > message? The 64-bit version of my program still hangs. > > > Kind regards > > Siegmar > > >> > Could you try set OPENMPI_HOME env var to the root of the Open MPI dir? >> > This env is a backup option for the registry. >> >> It solves one problem but there is a new problem now :-(( >> >> >> Without OPENMPI_HOME: Wrong pathname to help files. >> >> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >> -------------------------------------------------------------------------- >> Sorry! You were supposed to get help about: >> invalid if_inexclude >> But I couldn't open the help file: >> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt: >> No such file or directory. Sorry! >> -------------------------------------------------------------------------- >> ... >> >> >> >> With OPENMPI_HOME: It nearly uses the correct directory. Unfortunately >> the pathname contains the character " in the wrong place so that it >> couldn't find the available help file. >> >> set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1" >> >> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >> -------------------------------------------------------------------------- >> Sorry! You were supposed to get help about: >> no-hostfile >> But I couldn't open the help file: >> "c:\Program Files (x86)\openmpi-1.6.1"\share\openmpi\help-hostfile.txt: >> Invalid argument. Sorry >> ! >> -------------------------------------------------------------------------- >> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file >> ..\..\openmpi-1.6.1\orte\mca\ras\base >> \ras_base_allocate.c at line 200 >> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file >> ..\..\openmpi-1.6.1\orte\mca\plm\base >> \plm_base_launch_support.c at line 99 >> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file >> ..\..\openmpi-1.6.1\orte\mca\plm\proc >> ess\plm_process_module.c at line 996 >> >> >> >> It looks like that the environment variable can also solve my >> problem in the 64-bit environment. >> >> D:\g...\prog\mpi\small_prog>mpicc init_finalize.c >> >> Microsoft (R) C/C++-Optimierungscompiler Version 16.00.40219.01 f?r x64 >> ... >> >> >> The process hangs without OPENMPI_HOME. >> >> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >> ^C >> >> >> With OPENMPI_HOME: >> >> set OPENMPI_HOME="c:\Program Files\openmpi-1.6.1" >> >> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >> -------------------------------------------------------------------------- >> Sorry! You were supposed to get help about: >> no-hostfile >> But I couldn't open the help file: >> "c:\Program Files\openmpi-1.6.1"\share\openmpi\help-hostfile.txt: >> Invalid argument. S >> orry! >> -------------------------------------------------------------------------- >> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file >> ..\..\openmpi-1.6.1\orte\mc >> a\ras\base\ras_base_allocate.c at line 200 >> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file >> ..\..\openmpi-1.6.1\orte\mc >> a\plm\base\plm_base_launch_support.c at line 99 >> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file >> ..\..\openmpi-1.6.1\orte\mc >> a\plm\process\plm_process_module.c at line 996 >> >> >> At least the program doesn't block any longer. Do you have any ideas >> how this new problem can be solved? >> >> >> Kind regards >> >> Siegmar >> >> >> >> > On 2012-09-05 1:02 PM, Siegmar Gross wrote: >> > > Hi Shiqing, >> > > >> > >>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >> > >>>> --------------------------------------------------------------------- >> > >>>> Sorry! You were supposed to get help about: >> > >>>> invalid if_inexclude >> > >>>> But I couldn't open the help file: >> > >>>> >> > >>>> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt: >> > >>>> No such file or directory. Sorry! >> > >>>> --------------------------------------------------------------------- >> > >>> ... >> > >>>> Why does "mpiexec" look for the help file relativ to my current >> > >>>> program and not relative to itself? The file is part of the >> > >>>> package. >> > >>> Do you know how I can solve this problem? >> > >> I have similar issue with message from tcp, but it's not finding the >> > >> file, it's something else, which doesn't affect the execution of the >> > >> application. Could you make sure the help-mpi-btl-tcp.txt is actually in >> > >> the path D:\...\prog\mpi\small_prog\..\share\openmpi\? >> > > That wouldn't be a good idea because I have MPI programs in different >> > > directories so that I would have to install all help files in several >> > > places (<my_directory>/../share/openmpi/help*.txt). All help files are >> > > available in the installation directory of Open MPI. >> > > >> > > dir "c:\Program Files (x86)\openmpi-1.6.1\bin\mpiexec.exe" >> > > ... >> > > 29.08.2012 10:59 38.912 mpiexec.exe >> > > ... >> > > dir "c:\Program Files >> > > (x86)\openmpi-1.6.1\bin\..\share\openmpi\help-mpi-btl-tcp.txt" >> > > ... >> > > 03.04.2012 16:30 631 help-mpi-btl-tcp.txt >> > > ... >> > > >> > > I don't know if "mpiexec" or my program "init_finilize" is responsible >> > > for the error message but whoever is responsible shouldn't use the path >> > > to my program but the prefix_dir from MPI to find the help files. Perhaps >> > > you can change the behaviour in the Open MPI source code. >> > > >> > > >> > >>>> I can also compile in 64-bit mode but the program hangs. >> > >>> Do you have any ideas why the program hangs? Thank you very much for >> > >>> any >> > >>> help in advance. >> > >> To be honest I don't know. I couldn't reproduce it. Did you try >> > >> installing the binary installer, will it also behave the same? >> > > I like to have different versions of Open MPI which I activate via >> > > a batch file so that I can still run my program in an old version if >> > > something goes wrong in a new one. I have no entries in the system >> > > environment or registry so that I can even run different versions in >> > > different command windows without problems (everything is only known >> > > within the command window in which a have run my batch file). It seems >> > > that you put something in the registry when I use your installer. >> > > Perhaps you remember an earlier email where I had to uninstall an old >> > > version because the environment in my own installation was wrong >> > > as long as your installation was active. Nevertheless I can give it >> > > a try. Perhaps I find out if you set more than just the path to your >> > > binaries. Do you know if there is something similar to "truss" or >> > > "strace" in the UNIX world so that I can see where the program hangs? >> > > Thank you very much for your help in advance. >> > > >> > > >> > > Kind regards >> > > >> > > Siegmar >> > > >> > >> > >> > -- >> > --------------------------------------------------------------- >> > Shiqing Fan >> > High Performance Computing Center Stuttgart (HLRS) >> > Tel: ++49(0)711-685-87234 Nobelstrasse 19 >> > Fax: ++49(0)711-685-65832 70569 Stuttgart >> > http://www.hlrs.de/organization/people/shiqing-fan/ >> > email: f...@hlrs.de >> > >> >> > > > > > ------------------------------ > > Message: 7 > Date: Thu, 06 Sep 2012 11:03:04 +0300 > From: Yevgeny Kliteynik <klit...@dev.mellanox.co.il> > Subject: Re: [OMPI users] Infiniband performance Problem and stalling > To: Randolph Pullen <randolph_pul...@yahoo.com.au>, OpenMPI Users > <us...@open-mpi.org> > Message-ID: <504858b8.3050...@dev.mellanox.co.il> > Content-Type: text/plain; charset=UTF-8 > > On 9/3/2012 4:14 AM, Randolph Pullen wrote: >> No RoCE, Just native IB with TCP over the top. > > Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card". > Could you run "ibstat" and post the results? > > What is the expected BW on your cards? > Could you run "ib_write_bw" between two machines? > > Also, please see below. > >> No I haven't used 1.6 I was trying to stick with the standards on the >> mellanox disk. >> Is there a known problem with 1.4.3 ? >> >> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------! > --- >> *From:* Yevgeny Kliteynik <klit...@dev.mellanox.co.il> >> *To:* Randolph Pullen <randolph_pul...@yahoo.com.au>; Open MPI Users >> <us...@open-mpi.org> >> *Sent:* Sunday, 2 September 2012 10:54 PM >> *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling >> >> Randolph, >> >> Some clarification on the setup: >> >> "Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to >> Ethernet? >> That is, when you're using openib BTL, you mean RoCE, right? >> >> Also, have you had a chance to try some newer OMPI release? >> Any 1.6.x would do. >> >> >> -- YK >> >> On 8/31/2012 10:53 AM, Randolph Pullen wrote: >> > (reposted with consolidatedinformation) >> > I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA >> 10G cards >> > running Centos 5.7 Kernel 2.6.18-274 >> > Open MPI 1.4.3 >> > MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2): >> > On a Cisco 24 pt switch >> > Normal performance is: >> > $ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts PingPong >> > results in: >> > Max rate = 958.388867 MB/sec Min latency = 4.529953 usec >> > and: >> > $ mpirun --mca btl tcp,self -n 2 -hostfile mpi.hosts PingPong >> > Max rate = 653.547293 MB/sec Min latency = 19.550323 usec >> > NetPipeMPI results show a max of 7.4 Gb/s at 8388605 bytes which seems >> fine. >> > log_num_mtt =20 and log_mtts_per_seg params =2 >> > My application exchanges about a gig of data between the processes with 2 >> sender and 2 consumer processes on each node with 1 additional controller >> process on the starting node. >> > The program splits the data into 64K blocks and uses non blocking sends >> and receives with busy/sleep loops to monitor progress until completion. >> > Each process owns a single buffer for these 64K blocks. >> > My problem is I see better performance under IPoIB then I do on native IB >> (RDMA_CM). >> > My understanding is that IPoIB is limited to about 1G/s so I am at a loss >> to know why it is faster. >> > These 2 configurations are equivelant (about 8-10 seconds per cycle) >> > mpirun --mca btl_openib_flags 2 --mca mpi_leave_pinned 1 --mca btl >> tcp,self -H vh2,vh1 -np 9 --bycore prog >> > mpirun --mca btl_openib_flags 3 --mca mpi_leave_pinned 1 --mca btl >> tcp,self -H vh2,vh1 -np 9 --bycore prog > > When you say "--mca btl tcp,self", it means that openib btl is not enabled. > Hence "--mca btl_openib_flags" is irrelevant. > >> > And this one produces similar run times but seems to degrade with >> repeated cycles: >> > mpirun --mca btl_openib_eager_limit 64 --mca mpi_leave_pinned 1 --mca btl >> openib,self -H vh2,vh1 -np 9 --bycore prog > > You're running 9 ranks on two machines, but you're using IB for intra-node > communication. > Is it intentional? If not, you can add "sm" btl and have performance improved. > > -- YK > >> > Other btl_openib_flags settings result in much lower performance. >> > Changing the first of the above configs to use openIB results in a 21 >> second run time at best. Sometimes it takes up to 5 minutes. >> > In all cases, OpenIB runs in twice the time it takes TCP,except if I push >> the small message max to 64K and force short messages. Then the openib times >> are the same as TCP and no faster. >> > With openib: >> > - Repeated cycles during a single run seem to slow down with each cycle >> > (usually by about 10 seconds). >> > - On occasions it seems to stall indefinitely, waiting on a single >> receive. >> > I'm still at a loss as to why. I can?t find any errors logged during the >> runs. >> > Any ideas appreciated. >> > Thanks in advance, >> > Randolph >> > >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org <mailto:us...@open-mpi.org> >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> > > > > ------------------------------ > > Message: 8 > Date: Thu, 6 Sep 2012 08:01:01 -0400 > From: Jeff Squyres <jsquy...@cisco.com> > Subject: Re: [OMPI users] SIGSEGV in OMPI 1.6.x > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <256da22f-f9ac-4746-acd9-501f8208e...@cisco.com> > Content-Type: text/plain; charset=us-ascii > > If you run into a segv in this code, it almost certainly means that you have > heap corruption somewhere. FWIW, that has *always* been what it meant when > I've run into segv's in any code under in opal/mca/memory/linux/. Meaning: > my user code did something wrong, it created heap corruption, and then later > some malloc() or free() caused a segv in this area of the code. > > This code is the same ptmalloc memory allocator that has shipped in glibc for > years. I'll be hard-pressed to say that any code is 100% bug free :-), but > I'd be surprised if there is a bug in this particular chunk of code. > > I'd run your code through valgrind or some other memory-checking debugger and > see if that can shed any light on what's going on. > > > On Sep 6, 2012, at 12:06 AM, Yong Qin wrote: > >> Hi, >> >> While debugging a mysterious crash of a code, I was able to trace down >> to a SIGSEGV in OMPI 1.6 and 1.6.1. The offending code is in >> opal/mca/memory/linux/malloc.c. Please see the following gdb log. >> >> (gdb) c >> Continuing. >> >> Program received signal SIGSEGV, Segmentation fault. >> opal_memory_ptmalloc2_int_free (av=0x2fd0637, mem=0x203a746f74512000) >> at malloc.c:4385 >> 4385 nextsize = chunksize(nextchunk); >> (gdb) l >> 4380 Consolidate other non-mmapped chunks as they arrive. >> 4381 */ >> 4382 >> 4383 else if (!chunk_is_mmapped(p)) { >> 4384 nextchunk = chunk_at_offset(p, size); >> 4385 nextsize = chunksize(nextchunk); >> 4386 assert(nextsize > 0); >> 4387 >> 4388 /* consolidate backward */ >> 4389 if (!prev_inuse(p)) { >> (gdb) bt >> #0 opal_memory_ptmalloc2_int_free (av=0x2fd0637, >> mem=0x203a746f74512000) at malloc.c:4385 >> #1 0x00002ae6b18ea0c0 in opal_memory_ptmalloc2_free (mem=0x2fd0637) >> at malloc.c:3511 >> #2 0x00002ae6b18ea736 in opal_memory_linux_free_hook >> (__ptr=0x2fd0637, caller=0x203a746f74512000) at hooks.c:705 >> #3 0x0000000001412fcc in for_dealloc_allocatable () >> #4 0x00000000007767b1 in ALLOC::dealloc_d2 (array=@0x2fd0647, >> name=@0x6f6e6f69006f6e78, routine=Cannot access memory at address 0x0 >> ) at alloc.F90:1357 >> #5 0x000000000082628c in M_LDAU::hubbard_term (scell=..., nua=@0xd5, >> na=@0xd5, isa=..., xa=..., indxua=..., maxnh=@0xcf4ff, maxnd=@0xcf4ff, >> lasto=..., iphorb=..., >> numd=..., listdptr=..., listd=..., numh=..., listhptr=..., >> listh=..., nspin=@0xcf4ff00000002, dscf=..., eldau=@0x0, deldau=@0x0, >> fa=..., stress=..., h=..., >> first=@0x0, last=@0x0) at ldau.F:752 >> #6 0x00000000006cd532 in M_SETUP_HAMILTONIAN::setup_hamiltonian >> (first=@0x0, last=@0x0, iscf=@0x2) at setup_hamiltonian.F:199 >> #7 0x000000000070e257 in M_SIESTA_FORCES::siesta_forces >> (istep=@0xf9a4d07000000000) at siesta_forces.F:90 >> #8 0x000000000070e475 in siesta () at siesta.F:23 >> #9 0x000000000045e47c in main () >> >> Can anybody shed some light here on what could be wrong? >> >> Thanks, >> >> Yong Qin >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > ------------------------------ > > Message: 9 > Date: Thu, 6 Sep 2012 08:03:06 -0400 > From: Jeff Squyres <jsquy...@cisco.com> > Subject: Re: [OMPI users] Regarding the Pthreads > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <7fd0702a-4a29-4ff6-a80a-170d2002f...@cisco.com> > Content-Type: text/plain; charset=iso-8859-1 > > Your question is somewhat outside the scope of this list. Perhaps people may > chime in with some suggestions, but that's more of a threading question than > an MPI question. > > Be warned that you need to call MPI_Init_thread (not MPI_Init) with > MPI_THREAD_MULTIPLE in order to get true multi-threaded support in Open MPI. > And we only support that on the TCP and shared memory transports if you built > Open MPI with threading support enabled. > > > On Sep 5, 2012, at 2:23 PM, seshendra seshu wrote: > >> Hi, >> I am learning pthreads and trying to implement the pthreads in my quicksort >> program. >> My problem is iam unable to understand how to implement the pthreads at data >> received at a node from the master (In detail: In my program Master will >> divide the data and send to the slaves and each slave will do the sorting >> independently of The received data and send back to master after sorting is >> done. Now Iam having a problem in Implementing the pthreads at the >> slaves,i.e how to implement the pthreads in order to share data among the >> cores in each slave and sort the data and send it back to master. >> So could anyone help in solving this problem by providing some suggestions >> and clues. >> >> Thanking you very much. >> >> -- >> WITH REGARDS >> M.L.N.Seshendra >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > ------------------------------ > > Message: 10 > Date: Thu, 6 Sep 2012 08:05:30 -0400 > From: Jeff Squyres <jsquy...@cisco.com> > Subject: Re: [OMPI users] python-mrmpi() failed > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <e8aefb84-8702-432c-9fb0-0c34451b0...@cisco.com> > Content-Type: text/plain; charset=us-ascii > > On Sep 4, 2012, at 3:09 PM, mariana Vargas wrote: > >> I 'am new in this, I have some codes that use mpi for python and I >> just installed (openmpi, mrmpi, mpi4py) in my home (from a cluster >> account) without apparent errors and I tried to perform this simple >> test in python and I get the following error related with openmpi, >> could you help to figure out what is going on? I attach as many >> informations as possible... > > I think I know what's happening here. > > It's a complicated linker issue that we've discussed before -- I'm not sure > whether it was on this users list or the OMPI developers list. > > The short version is that you should remove your prior Open MPI installation, > and then rebuild Open MPI with the --disable-dlopen configure switch. See if > that fixes the problem. > >> Thanks. >> >> Mariana >> >> >> From a python console >> >>> from mrmpi import mrmpi >> >>> mr=mrmpi() >> [ferrari:23417] mca: base: component_find: unable to open /home/ >> mvargas/lib/openmpi/mca_paffinity_hwloc: /home/mvargas/lib/openmpi/ >> mca_paffinity_hwloc.so: undefined symbol: opal_hwloc_topology (ignored) >> [ferrari:23417] mca: base: component_find: unable to open /home/ >> mvargas/lib/openmpi/mca_carto_auto_detect: /home/mvargas/lib/openmpi/ >> mca_carto_auto_detect.so: undefined symbol: >> opal_carto_base_graph_get_host_graph_fn (ignored) >> [ferrari:23417] mca: base: component_find: unable to open /home/ >> mvargas/lib/openmpi/mca_carto_file: /home/mvargas/lib/openmpi/ >> mca_carto_file.so: undefined symbol: >> opal_carto_base_graph_get_host_graph_fn (ignored) >> [ferrari:23417] mca: base: component_find: unable to open /home/ >> mvargas/lib/openmpi/mca_shmem_mmap: /home/mvargas/lib/openmpi/ >> mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored) >> [ferrari:23417] mca: base: component_find: unable to open /home/ >> mvargas/lib/openmpi/mca_shmem_posix: /home/mvargas/lib/openmpi/ >> mca_shmem_posix.so: undefined symbol: opal_show_help (ignored) >> [ferrari:23417] mca: base: component_find: unable to open /home/ >> mvargas/lib/openmpi/mca_shmem_sysv: /home/mvargas/lib/openmpi/ >> mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored) >> -------------------------------------------------------------------------- >> It looks like opal_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during opal_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> opal_shmem_base_select failed >> --> Returned value -1 instead of OPAL_SUCCESS >> -------------------------------------------------------------------------- >> [ferrari:23417] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file >> runtime/orte_init.c at line 79 >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or >> environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: orte_init failed >> --> Returned "Error" (-1) instead of "Success" (0) >> -------------------------------------------------------------------------- >> *** An error occurred in MPI_Init >> *** on a NULL communicator >> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >> [ferrari:23417] Local abort before MPI_INIT completed successfully; >> not able to aggregate error messages, and not able to guarantee that >> all other processes were killed! >> >> >> >> echo $PATH >> >> /home/mvargas/idl/pro/LibsSDSSS/idlutilsv5_4_15/bin:/usr/local/itt/ >> idl70/bin:/opt/local/bin:/home/mvargas/bin:/home/mvargas/lib:/home/ >> mvargas/lib/openmpi/:/home/mvargas:/home/vargas/bin/:/home/mvargas/idl/ >> pro/LibsSDSSS/idlutilsv5_4_15/bin:/usr/local/itt/idl70/bin:/opt/local/ >> bin:/home/mvargas/bin:/home/mvargas/lib:/home/mvargas/lib/openmpi/:/ >> home/mvargas:/home/vargas/bin/:/usr/lib64/qt3.3/bin:/usr/kerberos/bin:/ >> usr/local/bin:/bin:/usr/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/ >> envswitcher/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX64:/opt/pvm3/bin/ >> LINUX64:/opt/c3-4/ >> >> echo $LD_LIBRARY_PATH >> /usr/local/mpich2/lib:/home/mvargas/lib:/home/mvargas/:/home/mvargas/ >> lib64:/home/mvargas/lib/openmpi/:/usr/lib64/openmpi/1.4-gcc/lib/:/user/ >> local/:/usr/local/mpich2/lib:/home/mvargas/lib:/home/mvargas/:/home/ >> mvargas/lib64:/home/mvargas/lib/openmpi/:/usr/lib64/openmpi/1.4-gcc/ >> lib/:/user/local/: >> >> Version: openmpi-1.6 >> >> >> >> mpirun --bynode --tag-output ompi_info -v ompi full --parsable >> [1,0]<stdout>:package:Open MPI mvargas@ferrari Distribution >> [1,0]<stdout>:ompi:version:full:1.6 >> [1,0]<stdout>:ompi:version:svn:r26429 >> [1,0]<stdout>:ompi:version:release_date:May 10, 2012 >> [1,0]<stdout>:orte:version:full:1.6 >> [1,0]<stdout>:orte:version:svn:r26429 >> [1,0]<stdout>:orte:version:release_date:May 10, 2012 >> [1,0]<stdout>:opal:version:full:1.6 >> [1,0]<stdout>:opal:version:svn:r26429 >> [1,0]<stdout>:opal:version:release_date:May 10, 2012 >> [1,0]<stdout>:mpi-api:version:full:2.1 >> [1,0]<stdout>:ident:1.6 >> >> >> eth0 Link encap:Ethernet HWaddr 00:30:48:95:99:CC >> inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0 >> inet6 addr: fe80::230:48ff:fe95:99cc/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:4739875255 errors:0 dropped:1636 overruns:0 frame:0 >> TX packets:5196871012 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:4959384349297 (4.5 TiB) TX bytes:3933641883577 (3.5 >> TiB) >> Memory:ef300000-ef320000 >> >> eth1 Link encap:Ethernet HWaddr 00:30:48:95:99:CD >> inet addr:128.2.116.104 Bcast:128.2.119.255 Mask: >> 255.255.248.0 >> inet6 addr: fe80::230:48ff:fe95:99cd/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:2645952109 errors:0 dropped:13353 overruns:0 frame:0 >> TX packets:2974763570 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:2024044043824 (1.8 TiB) TX bytes:3390935387820 (3.0 >> TiB) >> Memory:ef400000-ef420000 >> >> lo Link encap:Local Loopback >> inet addr:127.0.0.1 Mask:255.0.0.0 >> inet6 addr: ::1/128 Scope:Host >> UP LOOPBACK RUNNING MTU:16436 Metric:1 >> RX packets:143359307 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:143359307 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:0 >> RX bytes:80413513464 (74.8 GiB) TX bytes:80413513464 (74.8 >> GiB) >> >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> <files.tar.gz> > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > ------------------------------ > > Message: 11 > Date: Thu, 6 Sep 2012 10:23:04 -0400 > From: Jeff Squyres <jsquy...@cisco.com> > Subject: Re: [OMPI users] MPI_Cart_sub periods > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <346c2878-a5a6-4043-b890-09dab6880...@cisco.com> > Content-Type: text/plain; charset=iso-8859-1 > > John -- > > This cartesian stuff always makes my head hurt. :-) > > You seem to have hit on a bona-fide bug. I have fixed the issue in our SVN > trunk and will get the fixed moved over to the v1.6 and v1.7 branches. > > Thanks for the report! > > > On Aug 29, 2012, at 5:32 AM, Craske, John wrote: > >> Hello, >> >> We are partitioning a two-dimensional Cartesian communicator into >> two one-dimensional subgroups. In this situation we have found >> that both one-dimensional communicators inherit the period >> logical of the first dimension of the original two-dimensional >> communicator when using Open MPI. Using MPICH each >> one-dimensional communicator inherits the period corresponding to >> the dimensions specified in REMAIN_DIMS, as expected. Could this >> be a bug, or are we making a mistake? The relevant calls we make in a >> Fortran code are >> >> CALL MPI_CART_CREATE(MPI_COMM_WORLD, 2, (/ NDIMX, NDIMY /), (/ .True., >> .False. /), .TRUE., >> COMM_CART_2D, IERROR) >> >> CALL MPI_CART_SUB(COMM_CART_2D, (/ .True., .False. /), COMM_CART_X, IERROR) >> CALL MPI_CART_SUB(COMM_CART_2D, (/ .False., .True. /), COMM_CART_Y, IERROR) >> >> Following these requests, >> >> CALL MPI_CART_GET(COMM_CART_X, MAXDIM_X, DIMS_X, PERIODS_X, COORDS_X, IERROR) >> CALL MPI_CART_GET(COMM_CART_Y, MAXDIM_Y, DIMS_Y, PERIODS_Y, COORDS_Y, IERROR) >> >> will result in >> >> PERIODS_X = T >> PERIODS_Y = T >> >> If, on the other hand we define the two-dimensional communicator >> using PERIODS = (/ .False., .True. /), we find >> >> PERIODS_X = F >> PERIODS_Y = F >> >> Your advice on the matter would be greatly appreciated. >> >> Regards, >> >> John. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > ------------------------------ > > Message: 12 > Date: Thu, 06 Sep 2012 16:58:03 +0200 > From: Shiqing Fan <f...@hlrs.de> > Subject: Re: [OMPI users] error compiling openmpi-1.6.1 on Windows 7 > To: Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> > Cc: us...@open-mpi.org > Message-ID: <5048b9fb.3070...@hlrs.de> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi Siegmar, > > Glad to hear that it's working for you. > > The warning message is because the loopback adapter is excluded by > default, but this adapter is actually not installed on Windows. > > One solution might be installing the loopback adapter on Windows. It > very easy, only a few minutes. > > Or it may be possible to avoid this message from internal Open MPI. But > I'm not sure about how this can be done. > > > Regards, > Shiqing > > > On 2012-09-06 7:48 AM, Siegmar Gross wrote: >> Hi Shiqing, >> >> I have solved the problem with the double quotes in OPENMPI_HOME but >> there is still something wrong. >> >> set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1" >> >> mpicc init_finalize.c >> Cannot open configuration file "c:\Program Files >> (x86)\openmpi-1.6.1"/share/openmpi\mpicc-wrapper-data.txt >> Error parsing data file mpicc: Not found >> >> >> Everything is OK if you remove the double quotes which Windows >> automatically adds. >> >> set OPENMPI_HOME=c:\Program Files (x86)\openmpi-1.6.1 >> >> mpicc init_finalize.c >> Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 16.00.40219.01 f?r >> 80x86 >> ... >> >> mpiexec init_finalize.exe >> -------------------------------------------------------------------------- >> WARNING: An invalid value was given for btl_tcp_if_exclude. This >> value will be ignored. >> >> Local host: hermes >> Value: 127.0.0.1/8 >> Message: Did not find interface matching this subnet >> -------------------------------------------------------------------------- >> >> Hello! >> >> >> I get the output from my program but also a warning from Open MPI. >> The new value for the loopback device was introduced a short time >> ago when I have had problems with the loopback device on Solaris >> (it used "lo0" instead of your default "lo"). How can I avoid this >> message? The 64-bit version of my program still hangs. >> >> >> Kind regards >> >> Siegmar >> >> >>>> Could you try set OPENMPI_HOME env var to the root of the Open MPI dir? >>>> This env is a backup option for the registry. >>> It solves one problem but there is a new problem now :-(( >>> >>> >>> Without OPENMPI_HOME: Wrong pathname to help files. >>> >>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >>> -------------------------------------------------------------------------- >>> Sorry! You were supposed to get help about: >>> invalid if_inexclude >>> But I couldn't open the help file: >>> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt: >>> No such file or directory. Sorry! >>> -------------------------------------------------------------------------- >>> ... >>> >>> >>> >>> With OPENMPI_HOME: It nearly uses the correct directory. Unfortunately >>> the pathname contains the character " in the wrong place so that it >>> couldn't find the available help file. >>> >>> set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1" >>> >>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >>> -------------------------------------------------------------------------- >>> Sorry! You were supposed to get help about: >>> no-hostfile >>> But I couldn't open the help file: >>> "c:\Program Files >>> (x86)\openmpi-1.6.1"\share\openmpi\help-hostfile.txt: Invalid argument. >>> Sorry >>> ! >>> -------------------------------------------------------------------------- >>> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file >>> ..\..\openmpi-1.6.1\orte\mca\ras\base >>> \ras_base_allocate.c at line 200 >>> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file >>> ..\..\openmpi-1.6.1\orte\mca\plm\base >>> \plm_base_launch_support.c at line 99 >>> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file >>> ..\..\openmpi-1.6.1\orte\mca\plm\proc >>> ess\plm_process_module.c at line 996 >>> >>> >>> >>> It looks like that the environment variable can also solve my >>> problem in the 64-bit environment. >>> >>> D:\g...\prog\mpi\small_prog>mpicc init_finalize.c >>> >>> Microsoft (R) C/C++-Optimierungscompiler Version 16.00.40219.01 f?r x64 >>> ... >>> >>> >>> The process hangs without OPENMPI_HOME. >>> >>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >>> ^C >>> >>> >>> With OPENMPI_HOME: >>> >>> set OPENMPI_HOME="c:\Program Files\openmpi-1.6.1" >>> >>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >>> -------------------------------------------------------------------------- >>> Sorry! You were supposed to get help about: >>> no-hostfile >>> But I couldn't open the help file: >>> "c:\Program Files\openmpi-1.6.1"\share\openmpi\help-hostfile.txt: >>> Invalid argument. S >>> orry! >>> -------------------------------------------------------------------------- >>> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file >>> ..\..\openmpi-1.6.1\orte\mc >>> a\ras\base\ras_base_allocate.c at line 200 >>> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file >>> ..\..\openmpi-1.6.1\orte\mc >>> a\plm\base\plm_base_launch_support.c at line 99 >>> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file >>> ..\..\openmpi-1.6.1\orte\mc >>> a\plm\process\plm_process_module.c at line 996 >>> >>> >>> At least the program doesn't block any longer. Do you have any ideas >>> how this new problem can be solved? >>> >>> >>> Kind regards >>> >>> Siegmar >>> >>> >>> >>>> On 2012-09-05 1:02 PM, Siegmar Gross wrote: >>>>> Hi Shiqing, >>>>> >>>>>>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe >>>>>>>> --------------------------------------------------------------------- >>>>>>>> Sorry! You were supposed to get help about: >>>>>>>> invalid if_inexclude >>>>>>>> But I couldn't open the help file: >>>>>>>> >>>>>>>> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt: >>>>>>>> No such file or directory. Sorry! >>>>>>>> --------------------------------------------------------------------- >>>>>>> ... >>>>>>>> Why does "mpiexec" look for the help file relativ to my current >>>>>>>> program and not relative to itself? The file is part of the >>>>>>>> package. >>>>>>> Do you know how I can solve this problem? >>>>>> I have similar issue with message from tcp, but it's not finding the >>>>>> file, it's something else, which doesn't affect the execution of the >>>>>> application. Could you make sure the help-mpi-btl-tcp.txt is actually in >>>>>> the path D:\...\prog\mpi\small_prog\..\share\openmpi\? >>>>> That wouldn't be a good idea because I have MPI programs in different >>>>> directories so that I would have to install all help files in several >>>>> places (<my_directory>/../share/openmpi/help*.txt). All help files are >>>>> available in the installation directory of Open MPI. >>>>> >>>>> dir "c:\Program Files (x86)\openmpi-1.6.1\bin\mpiexec.exe" >>>>> ... >>>>> 29.08.2012 10:59 38.912 mpiexec.exe >>>>> ... >>>>> dir "c:\Program Files >>>>> (x86)\openmpi-1.6.1\bin\..\share\openmpi\help-mpi-btl-tcp.txt" >>>>> ... >>>>> 03.04.2012 16:30 631 help-mpi-btl-tcp.txt >>>>> ... >>>>> >>>>> I don't know if "mpiexec" or my program "init_finilize" is responsible >>>>> for the error message but whoever is responsible shouldn't use the path >>>>> to my program but the prefix_dir from MPI to find the help files. Perhaps >>>>> you can change the behaviour in the Open MPI source code. >>>>> >>>>> >>>>>>>> I can also compile in 64-bit mode but the program hangs. >>>>>>> Do you have any ideas why the program hangs? Thank you very much for any >>>>>>> help in advance. >>>>>> To be honest I don't know. I couldn't reproduce it. Did you try >>>>>> installing the binary installer, will it also behave the same? >>>>> I like to have different versions of Open MPI which I activate via >>>>> a batch file so that I can still run my program in an old version if >>>>> something goes wrong in a new one. I have no entries in the system >>>>> environment or registry so that I can even run different versions in >>>>> different command windows without problems (everything is only known >>>>> within the command window in which a have run my batch file). It seems >>>>> that you put something in the registry when I use your installer. >>>>> Perhaps you remember an earlier email where I had to uninstall an old >>>>> version because the environment in my own installation was wrong >>>>> as long as your installation was active. Nevertheless I can give it >>>>> a try. Perhaps I find out if you set more than just the path to your >>>>> binaries. Do you know if there is something similar to "truss" or >>>>> "strace" in the UNIX world so that I can see where the program hangs? >>>>> Thank you very much for your help in advance. >>>>> >>>>> >>>>> Kind regards >>>>> >>>>> Siegmar >>>>> >>>> >>>> -- >>>> --------------------------------------------------------------- >>>> Shiqing Fan >>>> High Performance Computing Center Stuttgart (HLRS) >>>> Tel: ++49(0)711-685-87234 Nobelstrasse 19 >>>> Fax: ++49(0)711-685-65832 70569 Stuttgart >>>> http://www.hlrs.de/organization/people/shiqing-fan/ >>>> email: f...@hlrs.de >>>> >>> >> > > > -- > --------------------------------------------------------------- > Shiqing Fan > High Performance Computing Center Stuttgart (HLRS) > Tel: ++49(0)711-685-87234 Nobelstrasse 19 > Fax: ++49(0)711-685-65832 70569 Stuttgart > http://www.hlrs.de/organization/people/shiqing-fan/ > email: f...@hlrs.de > > > > ------------------------------ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > End of users Digest, Vol 2345, Issue 1 > **************************************