George,
I hace done some modifications to the code, however this is the first
part my zmp_list:
! ZEUSMP2 CONFIGURATION FILE
&GEOMCONF LGEOM = 2,
LDIMEN = 2 /
&PHYSCONF LRAD = 0,
XHYDRO = .TRUE.,
XFORCE = .TRUE.,
XMHD = .false.,
XTOTNRG = .false.,
XGRAV = .false.,
XGRVFFT = .false.,
XPTMASS = .false.,
XISO = .false.,
XSUBAV = .false.,
XVGRID = .false.,
!- - - - - - - - - - - - - - - - - - -
XFIXFORCE = .TRUE.,
XFIXFORCE2 = .TRUE.,
!- - - - - - - - - - - - - - - - - - -
XSOURCEENERGY = .TRUE.,
XSOURCEMASS = .TRUE.,
!- - - - - - - - - - - - - - - - - - -
XRADCOOL = .TRUE.,
XA_RGB_WINDS = .TRUE.,
XSNIa = .TRUE./
!=====================================
&IOCONF XASCII = .false.,
XA_MULT = .false.,
XHDF = .TRUE.,
XHST = .TRUE.,
XRESTART = .TRUE.,
XTSL = .false.,
XDPRCHDF = .TRUE.,
XTTY = .TRUE. ,
XAGRID = .false. /
&PRECONF SMALL_NO = 1.0D-307,
LARGE_NO = 1.0D+307 /
&ARRAYCONF IZONES = 100,
JZONES = 125,
KZONES = 1,
MAXIJK = 125/
&mpitop ntiles(1)=5,ntiles(2)=2,ntiles(3)=1,periodic=2*.false.,.true. /
I have done some tests, and currently I'm able to perform a run with
10 processes on 10 nodes, ie I use only 1 of two CPUs in a node. It
crashes after 6 hours, and not after 20 minutes!
2012/9/6 <[email protected]>:
> Send users mailing list submissions to
> [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> [email protected]
>
> You can reach the person managing the list at
> [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
> 1. Re: error compiling openmpi-1.6.1 on Windows 7 (Siegmar Gross)
> 2. Re: OMPI 1.6.x Hang on khugepaged 100% CPU time (Yong Qin)
> 3. Regarding the Pthreads (seshendra seshu)
> 4. Re: some mpi processes "disappear" on a cluster of servers
> (George Bosilca)
> 5. SIGSEGV in OMPI 1.6.x (Yong Qin)
> 6. Re: error compiling openmpi-1.6.1 on Windows 7 (Siegmar Gross)
> 7. Re: Infiniband performance Problem and stalling
> (Yevgeny Kliteynik)
> 8. Re: SIGSEGV in OMPI 1.6.x (Jeff Squyres)
> 9. Re: Regarding the Pthreads (Jeff Squyres)
> 10. Re: python-mrmpi() failed (Jeff Squyres)
> 11. Re: MPI_Cart_sub periods (Jeff Squyres)
> 12. Re: error compiling openmpi-1.6.1 on Windows 7 (Shiqing Fan)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 5 Sep 2012 17:43:50 +0200 (CEST)
> From: Siegmar Gross <[email protected]>
> Subject: Re: [OMPI users] error compiling openmpi-1.6.1 on Windows 7
> To: [email protected]
> Cc: [email protected]
> Message-ID: <[email protected]>
> Content-Type: TEXT/plain; charset=ISO-8859-1
>
> Hi Shiqing,
>
>> Could you try set OPENMPI_HOME env var to the root of the Open MPI dir?
>> This env is a backup option for the registry.
>
> It solves one problem but there is a new problem now :-((
>
>
> Without OPENMPI_HOME: Wrong pathname to help files.
>
> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
> --------------------------------------------------------------------------
> Sorry! You were supposed to get help about:
> invalid if_inexclude
> But I couldn't open the help file:
> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt:
> No such file or directory. Sorry!
> --------------------------------------------------------------------------
> ...
>
>
>
> With OPENMPI_HOME: It nearly uses the correct directory. Unfortunately
> the pathname contains the character " in the wrong place so that it
> couldn't find the available help file.
>
> set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1"
>
> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
> --------------------------------------------------------------------------
> Sorry! You were supposed to get help about:
> no-hostfile
> But I couldn't open the help file:
> "c:\Program Files (x86)\openmpi-1.6.1"\share\openmpi\help-hostfile.txt:
> Invalid argument. Sorry
> !
> --------------------------------------------------------------------------
> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file
> ..\..\openmpi-1.6.1\orte\mca\ras\base
> \ras_base_allocate.c at line 200
> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file
> ..\..\openmpi-1.6.1\orte\mca\plm\base
> \plm_base_launch_support.c at line 99
> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file
> ..\..\openmpi-1.6.1\orte\mca\plm\proc
> ess\plm_process_module.c at line 996
>
>
>
> It looks like that the environment variable can also solve my
> problem in the 64-bit environment.
>
> D:\g...\prog\mpi\small_prog>mpicc init_finalize.c
>
> Microsoft (R) C/C++-Optimierungscompiler Version 16.00.40219.01 f?r x64
> ...
>
>
> The process hangs without OPENMPI_HOME.
>
> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
> ^C
>
>
> With OPENMPI_HOME:
>
> set OPENMPI_HOME="c:\Program Files\openmpi-1.6.1"
>
> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
> --------------------------------------------------------------------------
> Sorry! You were supposed to get help about:
> no-hostfile
> But I couldn't open the help file:
> "c:\Program Files\openmpi-1.6.1"\share\openmpi\help-hostfile.txt: Invalid
> argument. S
> orry!
> --------------------------------------------------------------------------
> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file
> ..\..\openmpi-1.6.1\orte\mc
> a\ras\base\ras_base_allocate.c at line 200
> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file
> ..\..\openmpi-1.6.1\orte\mc
> a\plm\base\plm_base_launch_support.c at line 99
> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file
> ..\..\openmpi-1.6.1\orte\mc
> a\plm\process\plm_process_module.c at line 996
>
>
> At least the program doesn't block any longer. Do you have any ideas
> how this new problem can be solved?
>
>
> Kind regards
>
> Siegmar
>
>
>
>> On 2012-09-05 1:02 PM, Siegmar Gross wrote:
>> > Hi Shiqing,
>> >
>> >>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>> >>>> ---------------------------------------------------------------------
>> >>>> Sorry! You were supposed to get help about:
>> >>>> invalid if_inexclude
>> >>>> But I couldn't open the help file:
>> >>>> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt:
>> >>>> No such file or directory. Sorry!
>> >>>> ---------------------------------------------------------------------
>> >>> ...
>> >>>> Why does "mpiexec" look for the help file relativ to my current
>> >>>> program and not relative to itself? The file is part of the
>> >>>> package.
>> >>> Do you know how I can solve this problem?
>> >> I have similar issue with message from tcp, but it's not finding the
>> >> file, it's something else, which doesn't affect the execution of the
>> >> application. Could you make sure the help-mpi-btl-tcp.txt is actually in
>> >> the path D:\...\prog\mpi\small_prog\..\share\openmpi\?
>> > That wouldn't be a good idea because I have MPI programs in different
>> > directories so that I would have to install all help files in several
>> > places (<my_directory>/../share/openmpi/help*.txt). All help files are
>> > available in the installation directory of Open MPI.
>> >
>> > dir "c:\Program Files (x86)\openmpi-1.6.1\bin\mpiexec.exe"
>> > ...
>> > 29.08.2012 10:59 38.912 mpiexec.exe
>> > ...
>> > dir "c:\Program Files
>> > (x86)\openmpi-1.6.1\bin\..\share\openmpi\help-mpi-btl-tcp.txt"
>> > ...
>> > 03.04.2012 16:30 631 help-mpi-btl-tcp.txt
>> > ...
>> >
>> > I don't know if "mpiexec" or my program "init_finilize" is responsible
>> > for the error message but whoever is responsible shouldn't use the path
>> > to my program but the prefix_dir from MPI to find the help files. Perhaps
>> > you can change the behaviour in the Open MPI source code.
>> >
>> >
>> >>>> I can also compile in 64-bit mode but the program hangs.
>> >>> Do you have any ideas why the program hangs? Thank you very much for any
>> >>> help in advance.
>> >> To be honest I don't know. I couldn't reproduce it. Did you try
>> >> installing the binary installer, will it also behave the same?
>> > I like to have different versions of Open MPI which I activate via
>> > a batch file so that I can still run my program in an old version if
>> > something goes wrong in a new one. I have no entries in the system
>> > environment or registry so that I can even run different versions in
>> > different command windows without problems (everything is only known
>> > within the command window in which a have run my batch file). It seems
>> > that you put something in the registry when I use your installer.
>> > Perhaps you remember an earlier email where I had to uninstall an old
>> > version because the environment in my own installation was wrong
>> > as long as your installation was active. Nevertheless I can give it
>> > a try. Perhaps I find out if you set more than just the path to your
>> > binaries. Do you know if there is something similar to "truss" or
>> > "strace" in the UNIX world so that I can see where the program hangs?
>> > Thank you very much for your help in advance.
>> >
>> >
>> > Kind regards
>> >
>> > Siegmar
>> >
>>
>>
>> --
>> ---------------------------------------------------------------
>> Shiqing Fan
>> High Performance Computing Center Stuttgart (HLRS)
>> Tel: ++49(0)711-685-87234 Nobelstrasse 19
>> Fax: ++49(0)711-685-65832 70569 Stuttgart
>> http://www.hlrs.de/organization/people/shiqing-fan/
>> email: [email protected]
>>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 5 Sep 2012 09:07:35 -0700
> From: Yong Qin <[email protected]>
> Subject: Re: [OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time
> To: [email protected]
> Cc: Open MPI Users <[email protected]>
> Message-ID:
> <CADEJBEWq0Rzfi_uKx8U4Uz4tjz=vJzn1=rdtphpyul04cv9...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Yes, so far this has only been observed in VASP and a specific dataset.
>
> Thanks,
>
> On Wed, Sep 5, 2012 at 4:52 AM, Yevgeny Kliteynik
> <[email protected]> wrote:
>> On 9/4/2012 7:21 PM, Yong Qin wrote:
>>> On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik
>>> <[email protected]> wrote:
>>>> On 8/30/2012 10:28 PM, Yong Qin wrote:
>>>>> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres<[email protected]> wrote:
>>>>>> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote:
>>>>>>
>>>>>>> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but
>>>>>>> not on 1.4.5 (tcp btl is always fine). The application is VASP and
>>>>>>> only one specific dataset is identified during the testing, and the OS
>>>>>>> is SL 6.2 with kernel 2.6.32-220.23.1.el6.x86_64. The issue is that
>>>>>>> when a certain type of load is put on OMPI 1.6.x, khugepaged thread
>>>>>>> always runs with 100% CPU load, and it looks to me like that OMPI is
>>>>>>> waiting for some memory to be available thus appears to be hung.
>>>>>>> Reducing the per node processes would sometimes ease the problem a bit
>>>>>>> but not always. So I did some further testing by playing around with
>>>>>>> the kernel transparent hugepage support.
>>>>>>>
>>>>>>> 1. Disable transparent hugepage support completely (echo never
>>>>>>>> /sys/kernel/mm/redhat_transparent_hugepage/enabled). This would allow
>>>>>>> the program to progress as normal (as in 1.4.5). Total run time for an
>>>>>>> iteration is 3036.03 s.
>>>>>>
>>>>>> I'll admit that we have not tested using transparent hugepages. I
>>>>>> wonder if there's some kind of bad interaction going on here...
>>>>>
>>>>> The transparent hugepage is "transparent", which means it is
>>>>> automatically applied to all applications unless it is explicitly told
>>>>> otherwise. I highly suspect that it is not working properly in this
>>>>> case.
>>>>
>>>> Like Jeff said - I don't think we've ever tested OMPI with transparent
>>>> huge pages.
>>>>
>>>
>>> Thanks. But have you tested OMPI under RHEL 6 or its variants (CentOS
>>> 6, SL 6)? THP is on by default in RHEL 6 so no matter you want it or
>>> not it's there.
>>
>> Interesting. Indeed, THP is on be default in RHEL 6.x.
>> I run OMPI 1.6.x constantly on RHEL 6.2, and I've never seen this problem.
>>
>> I'm checking it with OFED folks, but I doubt that there are some dedicated
>> tests for THP.
>>
>> So do you see it only with a specific application and only on a specific
>> data set? Wonder if I can somehow reproduce it in-house...
>>
>> -- YK
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 5 Sep 2012 20:23:05 +0200
> From: seshendra seshu <[email protected]>
> Subject: [OMPI users] Regarding the Pthreads
> To: Open MPI Users <[email protected]>
> Message-ID:
> <CAJ_xm3AYtMt22NgjtY67TuwOpZxev0ZYSW4fEYGxKA=2yvd...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
> I am learning pthreads and trying to implement the pthreads in my
> quicksortprogram.
> My problem is iam unable to understand how to implement the pthreads at
> data received at a node from the master (In detail: In my program Master
> will divide the data and send to the slaves and each slave will do the
> sorting independently of The received data and send back to master after
> sorting is done. Now Iam having a problem in Implementing the pthreads at
> the slaves,i.e how to implement the pthreads in order to share data among
> the cores in each slave and sort the data and send it back to master.
> So could anyone help in solving this problem by providing some suggestions
> and clues.
>
> Thanking you very much.
>
> --
> WITH REGARDS
> M.L.N.Seshendra
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> Message: 4
> Date: Thu, 6 Sep 2012 02:40:19 +0200
> From: George Bosilca <[email protected]>
> Subject: Re: [OMPI users] some mpi processes "disappear" on a cluster
> of servers
> To: Open MPI Users <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=us-ascii
>
> Andrea,
>
> As suggested by the previous answers I guess the size of your problem is too
> large for the memory available on the nodes. I can runs ZeusMP without any
> issues up to 64 processes, both over Ethernet and Infiniband. I tried the 1.6
> and the current trunk, and both perform as expected.
>
> What is the content of your zmp_inp file?
>
> george.
>
> On Sep 1, 2012, at 16:01 , Andrea Negri <[email protected]> wrote:
>
>> I have tried to run with a single process (i.e. the entire grid is
>> contained by one process) and the the command free -m on the compute
>> node returns
>>
>> total used free shared buffers cached
>> Mem: 3913 1540 2372 0 49 1234
>> -/+ buffers/cache: 257 3656
>> Swap: 1983 0 1983
>>
>>
>> while top returns
>> top - 16:01:09 up 4 days, 5:56, 1 user, load average: 0.53, 0.16, 0.10
>> Tasks: 63 total, 3 running, 60 sleeping, 0 stopped, 0 zombie
>> Cpu(s): 49.4% us, 0.7% sy, 0.0% ni, 49.9% id, 0.0% wa, 0.0% hi, 0.0% si
>> Mem: 4007720k total, 1577968k used, 2429752k free, 50664k buffers
>> Swap: 2031608k total, 0k used, 2031608k free, 1263844k cached
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Wed, 5 Sep 2012 21:06:12 -0700
> From: Yong Qin <[email protected]>
> Subject: [OMPI users] SIGSEGV in OMPI 1.6.x
> To: Open MPI Users <[email protected]>
> Message-ID:
> <CADEJBEVFcsyh5WnK=3yj6w7b2aasrf7yc4uimcvaqia-j6c...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi,
>
> While debugging a mysterious crash of a code, I was able to trace down
> to a SIGSEGV in OMPI 1.6 and 1.6.1. The offending code is in
> opal/mca/memory/linux/malloc.c. Please see the following gdb log.
>
> (gdb) c
> Continuing.
>
> Program received signal SIGSEGV, Segmentation fault.
> opal_memory_ptmalloc2_int_free (av=0x2fd0637, mem=0x203a746f74512000)
> at malloc.c:4385
> 4385 nextsize = chunksize(nextchunk);
> (gdb) l
> 4380 Consolidate other non-mmapped chunks as they arrive.
> 4381 */
> 4382
> 4383 else if (!chunk_is_mmapped(p)) {
> 4384 nextchunk = chunk_at_offset(p, size);
> 4385 nextsize = chunksize(nextchunk);
> 4386 assert(nextsize > 0);
> 4387
> 4388 /* consolidate backward */
> 4389 if (!prev_inuse(p)) {
> (gdb) bt
> #0 opal_memory_ptmalloc2_int_free (av=0x2fd0637,
> mem=0x203a746f74512000) at malloc.c:4385
> #1 0x00002ae6b18ea0c0 in opal_memory_ptmalloc2_free (mem=0x2fd0637)
> at malloc.c:3511
> #2 0x00002ae6b18ea736 in opal_memory_linux_free_hook
> (__ptr=0x2fd0637, caller=0x203a746f74512000) at hooks.c:705
> #3 0x0000000001412fcc in for_dealloc_allocatable ()
> #4 0x00000000007767b1 in ALLOC::dealloc_d2 (array=@0x2fd0647,
> name=@0x6f6e6f69006f6e78, routine=Cannot access memory at address 0x0
> ) at alloc.F90:1357
> #5 0x000000000082628c in M_LDAU::hubbard_term (scell=..., nua=@0xd5,
> na=@0xd5, isa=..., xa=..., indxua=..., maxnh=@0xcf4ff, maxnd=@0xcf4ff,
> lasto=..., iphorb=...,
> numd=..., listdptr=..., listd=..., numh=..., listhptr=...,
> listh=..., nspin=@0xcf4ff00000002, dscf=..., eldau=@0x0, deldau=@0x0,
> fa=..., stress=..., h=...,
> first=@0x0, last=@0x0) at ldau.F:752
> #6 0x00000000006cd532 in M_SETUP_HAMILTONIAN::setup_hamiltonian
> (first=@0x0, last=@0x0, iscf=@0x2) at setup_hamiltonian.F:199
> #7 0x000000000070e257 in M_SIESTA_FORCES::siesta_forces
> (istep=@0xf9a4d07000000000) at siesta_forces.F:90
> #8 0x000000000070e475 in siesta () at siesta.F:23
> #9 0x000000000045e47c in main ()
>
> Can anybody shed some light here on what could be wrong?
>
> Thanks,
>
> Yong Qin
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 6 Sep 2012 07:48:34 +0200 (CEST)
> From: Siegmar Gross <[email protected]>
> Subject: Re: [OMPI users] error compiling openmpi-1.6.1 on Windows 7
> To: [email protected]
> Cc: [email protected]
> Message-ID: <[email protected]>
> Content-Type: TEXT/plain; charset=ISO-8859-1
>
> Hi Shiqing,
>
> I have solved the problem with the double quotes in OPENMPI_HOME but
> there is still something wrong.
>
> set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1"
>
> mpicc init_finalize.c
> Cannot open configuration file "c:\Program Files
> (x86)\openmpi-1.6.1"/share/openmpi\mpicc-wrapper-data.txt
> Error parsing data file mpicc: Not found
>
>
> Everything is OK if you remove the double quotes which Windows
> automatically adds.
>
> set OPENMPI_HOME=c:\Program Files (x86)\openmpi-1.6.1
>
> mpicc init_finalize.c
> Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 16.00.40219.01 f?r
> 80x86
> ...
>
> mpiexec init_finalize.exe
> --------------------------------------------------------------------------
> WARNING: An invalid value was given for btl_tcp_if_exclude. This
> value will be ignored.
>
> Local host: hermes
> Value: 127.0.0.1/8
> Message: Did not find interface matching this subnet
> --------------------------------------------------------------------------
>
> Hello!
>
>
> I get the output from my program but also a warning from Open MPI.
> The new value for the loopback device was introduced a short time
> ago when I have had problems with the loopback device on Solaris
> (it used "lo0" instead of your default "lo"). How can I avoid this
> message? The 64-bit version of my program still hangs.
>
>
> Kind regards
>
> Siegmar
>
>
>> > Could you try set OPENMPI_HOME env var to the root of the Open MPI dir?
>> > This env is a backup option for the registry.
>>
>> It solves one problem but there is a new problem now :-((
>>
>>
>> Without OPENMPI_HOME: Wrong pathname to help files.
>>
>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>> --------------------------------------------------------------------------
>> Sorry! You were supposed to get help about:
>> invalid if_inexclude
>> But I couldn't open the help file:
>> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt:
>> No such file or directory. Sorry!
>> --------------------------------------------------------------------------
>> ...
>>
>>
>>
>> With OPENMPI_HOME: It nearly uses the correct directory. Unfortunately
>> the pathname contains the character " in the wrong place so that it
>> couldn't find the available help file.
>>
>> set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1"
>>
>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>> --------------------------------------------------------------------------
>> Sorry! You were supposed to get help about:
>> no-hostfile
>> But I couldn't open the help file:
>> "c:\Program Files (x86)\openmpi-1.6.1"\share\openmpi\help-hostfile.txt:
>> Invalid argument. Sorry
>> !
>> --------------------------------------------------------------------------
>> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file
>> ..\..\openmpi-1.6.1\orte\mca\ras\base
>> \ras_base_allocate.c at line 200
>> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file
>> ..\..\openmpi-1.6.1\orte\mca\plm\base
>> \plm_base_launch_support.c at line 99
>> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file
>> ..\..\openmpi-1.6.1\orte\mca\plm\proc
>> ess\plm_process_module.c at line 996
>>
>>
>>
>> It looks like that the environment variable can also solve my
>> problem in the 64-bit environment.
>>
>> D:\g...\prog\mpi\small_prog>mpicc init_finalize.c
>>
>> Microsoft (R) C/C++-Optimierungscompiler Version 16.00.40219.01 f?r x64
>> ...
>>
>>
>> The process hangs without OPENMPI_HOME.
>>
>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>> ^C
>>
>>
>> With OPENMPI_HOME:
>>
>> set OPENMPI_HOME="c:\Program Files\openmpi-1.6.1"
>>
>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>> --------------------------------------------------------------------------
>> Sorry! You were supposed to get help about:
>> no-hostfile
>> But I couldn't open the help file:
>> "c:\Program Files\openmpi-1.6.1"\share\openmpi\help-hostfile.txt:
>> Invalid argument. S
>> orry!
>> --------------------------------------------------------------------------
>> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file
>> ..\..\openmpi-1.6.1\orte\mc
>> a\ras\base\ras_base_allocate.c at line 200
>> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file
>> ..\..\openmpi-1.6.1\orte\mc
>> a\plm\base\plm_base_launch_support.c at line 99
>> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file
>> ..\..\openmpi-1.6.1\orte\mc
>> a\plm\process\plm_process_module.c at line 996
>>
>>
>> At least the program doesn't block any longer. Do you have any ideas
>> how this new problem can be solved?
>>
>>
>> Kind regards
>>
>> Siegmar
>>
>>
>>
>> > On 2012-09-05 1:02 PM, Siegmar Gross wrote:
>> > > Hi Shiqing,
>> > >
>> > >>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>> > >>>> ---------------------------------------------------------------------
>> > >>>> Sorry! You were supposed to get help about:
>> > >>>> invalid if_inexclude
>> > >>>> But I couldn't open the help file:
>> > >>>>
>> > >>>> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt:
>> > >>>> No such file or directory. Sorry!
>> > >>>> ---------------------------------------------------------------------
>> > >>> ...
>> > >>>> Why does "mpiexec" look for the help file relativ to my current
>> > >>>> program and not relative to itself? The file is part of the
>> > >>>> package.
>> > >>> Do you know how I can solve this problem?
>> > >> I have similar issue with message from tcp, but it's not finding the
>> > >> file, it's something else, which doesn't affect the execution of the
>> > >> application. Could you make sure the help-mpi-btl-tcp.txt is actually in
>> > >> the path D:\...\prog\mpi\small_prog\..\share\openmpi\?
>> > > That wouldn't be a good idea because I have MPI programs in different
>> > > directories so that I would have to install all help files in several
>> > > places (<my_directory>/../share/openmpi/help*.txt). All help files are
>> > > available in the installation directory of Open MPI.
>> > >
>> > > dir "c:\Program Files (x86)\openmpi-1.6.1\bin\mpiexec.exe"
>> > > ...
>> > > 29.08.2012 10:59 38.912 mpiexec.exe
>> > > ...
>> > > dir "c:\Program Files
>> > > (x86)\openmpi-1.6.1\bin\..\share\openmpi\help-mpi-btl-tcp.txt"
>> > > ...
>> > > 03.04.2012 16:30 631 help-mpi-btl-tcp.txt
>> > > ...
>> > >
>> > > I don't know if "mpiexec" or my program "init_finilize" is responsible
>> > > for the error message but whoever is responsible shouldn't use the path
>> > > to my program but the prefix_dir from MPI to find the help files. Perhaps
>> > > you can change the behaviour in the Open MPI source code.
>> > >
>> > >
>> > >>>> I can also compile in 64-bit mode but the program hangs.
>> > >>> Do you have any ideas why the program hangs? Thank you very much for
>> > >>> any
>> > >>> help in advance.
>> > >> To be honest I don't know. I couldn't reproduce it. Did you try
>> > >> installing the binary installer, will it also behave the same?
>> > > I like to have different versions of Open MPI which I activate via
>> > > a batch file so that I can still run my program in an old version if
>> > > something goes wrong in a new one. I have no entries in the system
>> > > environment or registry so that I can even run different versions in
>> > > different command windows without problems (everything is only known
>> > > within the command window in which a have run my batch file). It seems
>> > > that you put something in the registry when I use your installer.
>> > > Perhaps you remember an earlier email where I had to uninstall an old
>> > > version because the environment in my own installation was wrong
>> > > as long as your installation was active. Nevertheless I can give it
>> > > a try. Perhaps I find out if you set more than just the path to your
>> > > binaries. Do you know if there is something similar to "truss" or
>> > > "strace" in the UNIX world so that I can see where the program hangs?
>> > > Thank you very much for your help in advance.
>> > >
>> > >
>> > > Kind regards
>> > >
>> > > Siegmar
>> > >
>> >
>> >
>> > --
>> > ---------------------------------------------------------------
>> > Shiqing Fan
>> > High Performance Computing Center Stuttgart (HLRS)
>> > Tel: ++49(0)711-685-87234 Nobelstrasse 19
>> > Fax: ++49(0)711-685-65832 70569 Stuttgart
>> > http://www.hlrs.de/organization/people/shiqing-fan/
>> > email: [email protected]
>> >
>>
>>
>
>
>
>
> ------------------------------
>
> Message: 7
> Date: Thu, 06 Sep 2012 11:03:04 +0300
> From: Yevgeny Kliteynik <[email protected]>
> Subject: Re: [OMPI users] Infiniband performance Problem and stalling
> To: Randolph Pullen <[email protected]>, OpenMPI Users
> <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=UTF-8
>
> On 9/3/2012 4:14 AM, Randolph Pullen wrote:
>> No RoCE, Just native IB with TCP over the top.
>
> Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card".
> Could you run "ibstat" and post the results?
>
> What is the expected BW on your cards?
> Could you run "ib_write_bw" between two machines?
>
> Also, please see below.
>
>> No I haven't used 1.6 I was trying to stick with the standards on the
>> mellanox disk.
>> Is there a known problem with 1.4.3 ?
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
> ---
>> *From:* Yevgeny Kliteynik <[email protected]>
>> *To:* Randolph Pullen <[email protected]>; Open MPI Users
>> <[email protected]>
>> *Sent:* Sunday, 2 September 2012 10:54 PM
>> *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling
>>
>> Randolph,
>>
>> Some clarification on the setup:
>>
>> "Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to
>> Ethernet?
>> That is, when you're using openib BTL, you mean RoCE, right?
>>
>> Also, have you had a chance to try some newer OMPI release?
>> Any 1.6.x would do.
>>
>>
>> -- YK
>>
>> On 8/31/2012 10:53 AM, Randolph Pullen wrote:
>> > (reposted with consolidatedinformation)
>> > I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA
>> 10G cards
>> > running Centos 5.7 Kernel 2.6.18-274
>> > Open MPI 1.4.3
>> > MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2):
>> > On a Cisco 24 pt switch
>> > Normal performance is:
>> > $ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts PingPong
>> > results in:
>> > Max rate = 958.388867 MB/sec Min latency = 4.529953 usec
>> > and:
>> > $ mpirun --mca btl tcp,self -n 2 -hostfile mpi.hosts PingPong
>> > Max rate = 653.547293 MB/sec Min latency = 19.550323 usec
>> > NetPipeMPI results show a max of 7.4 Gb/s at 8388605 bytes which seems
>> fine.
>> > log_num_mtt =20 and log_mtts_per_seg params =2
>> > My application exchanges about a gig of data between the processes with 2
>> sender and 2 consumer processes on each node with 1 additional controller
>> process on the starting node.
>> > The program splits the data into 64K blocks and uses non blocking sends
>> and receives with busy/sleep loops to monitor progress until completion.
>> > Each process owns a single buffer for these 64K blocks.
>> > My problem is I see better performance under IPoIB then I do on native IB
>> (RDMA_CM).
>> > My understanding is that IPoIB is limited to about 1G/s so I am at a loss
>> to know why it is faster.
>> > These 2 configurations are equivelant (about 8-10 seconds per cycle)
>> > mpirun --mca btl_openib_flags 2 --mca mpi_leave_pinned 1 --mca btl
>> tcp,self -H vh2,vh1 -np 9 --bycore prog
>> > mpirun --mca btl_openib_flags 3 --mca mpi_leave_pinned 1 --mca btl
>> tcp,self -H vh2,vh1 -np 9 --bycore prog
>
> When you say "--mca btl tcp,self", it means that openib btl is not enabled.
> Hence "--mca btl_openib_flags" is irrelevant.
>
>> > And this one produces similar run times but seems to degrade with
>> repeated cycles:
>> > mpirun --mca btl_openib_eager_limit 64 --mca mpi_leave_pinned 1 --mca btl
>> openib,self -H vh2,vh1 -np 9 --bycore prog
>
> You're running 9 ranks on two machines, but you're using IB for intra-node
> communication.
> Is it intentional? If not, you can add "sm" btl and have performance improved.
>
> -- YK
>
>> > Other btl_openib_flags settings result in much lower performance.
>> > Changing the first of the above configs to use openIB results in a 21
>> second run time at best. Sometimes it takes up to 5 minutes.
>> > In all cases, OpenIB runs in twice the time it takes TCP,except if I push
>> the small message max to 64K and force short messages. Then the openib times
>> are the same as TCP and no faster.
>> > With openib:
>> > - Repeated cycles during a single run seem to slow down with each cycle
>> > (usually by about 10 seconds).
>> > - On occasions it seems to stall indefinitely, waiting on a single
>> receive.
>> > I'm still at a loss as to why. I can?t find any errors logged during the
>> runs.
>> > Any ideas appreciated.
>> > Thanks in advance,
>> > Randolph
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > [email protected] <mailto:[email protected]>
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>
>
>
> ------------------------------
>
> Message: 8
> Date: Thu, 6 Sep 2012 08:01:01 -0400
> From: Jeff Squyres <[email protected]>
> Subject: Re: [OMPI users] SIGSEGV in OMPI 1.6.x
> To: Open MPI Users <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=us-ascii
>
> If you run into a segv in this code, it almost certainly means that you have
> heap corruption somewhere. FWIW, that has *always* been what it meant when
> I've run into segv's in any code under in opal/mca/memory/linux/. Meaning:
> my user code did something wrong, it created heap corruption, and then later
> some malloc() or free() caused a segv in this area of the code.
>
> This code is the same ptmalloc memory allocator that has shipped in glibc for
> years. I'll be hard-pressed to say that any code is 100% bug free :-), but
> I'd be surprised if there is a bug in this particular chunk of code.
>
> I'd run your code through valgrind or some other memory-checking debugger and
> see if that can shed any light on what's going on.
>
>
> On Sep 6, 2012, at 12:06 AM, Yong Qin wrote:
>
>> Hi,
>>
>> While debugging a mysterious crash of a code, I was able to trace down
>> to a SIGSEGV in OMPI 1.6 and 1.6.1. The offending code is in
>> opal/mca/memory/linux/malloc.c. Please see the following gdb log.
>>
>> (gdb) c
>> Continuing.
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> opal_memory_ptmalloc2_int_free (av=0x2fd0637, mem=0x203a746f74512000)
>> at malloc.c:4385
>> 4385 nextsize = chunksize(nextchunk);
>> (gdb) l
>> 4380 Consolidate other non-mmapped chunks as they arrive.
>> 4381 */
>> 4382
>> 4383 else if (!chunk_is_mmapped(p)) {
>> 4384 nextchunk = chunk_at_offset(p, size);
>> 4385 nextsize = chunksize(nextchunk);
>> 4386 assert(nextsize > 0);
>> 4387
>> 4388 /* consolidate backward */
>> 4389 if (!prev_inuse(p)) {
>> (gdb) bt
>> #0 opal_memory_ptmalloc2_int_free (av=0x2fd0637,
>> mem=0x203a746f74512000) at malloc.c:4385
>> #1 0x00002ae6b18ea0c0 in opal_memory_ptmalloc2_free (mem=0x2fd0637)
>> at malloc.c:3511
>> #2 0x00002ae6b18ea736 in opal_memory_linux_free_hook
>> (__ptr=0x2fd0637, caller=0x203a746f74512000) at hooks.c:705
>> #3 0x0000000001412fcc in for_dealloc_allocatable ()
>> #4 0x00000000007767b1 in ALLOC::dealloc_d2 (array=@0x2fd0647,
>> name=@0x6f6e6f69006f6e78, routine=Cannot access memory at address 0x0
>> ) at alloc.F90:1357
>> #5 0x000000000082628c in M_LDAU::hubbard_term (scell=..., nua=@0xd5,
>> na=@0xd5, isa=..., xa=..., indxua=..., maxnh=@0xcf4ff, maxnd=@0xcf4ff,
>> lasto=..., iphorb=...,
>> numd=..., listdptr=..., listd=..., numh=..., listhptr=...,
>> listh=..., nspin=@0xcf4ff00000002, dscf=..., eldau=@0x0, deldau=@0x0,
>> fa=..., stress=..., h=...,
>> first=@0x0, last=@0x0) at ldau.F:752
>> #6 0x00000000006cd532 in M_SETUP_HAMILTONIAN::setup_hamiltonian
>> (first=@0x0, last=@0x0, iscf=@0x2) at setup_hamiltonian.F:199
>> #7 0x000000000070e257 in M_SIESTA_FORCES::siesta_forces
>> (istep=@0xf9a4d07000000000) at siesta_forces.F:90
>> #8 0x000000000070e475 in siesta () at siesta.F:23
>> #9 0x000000000045e47c in main ()
>>
>> Can anybody shed some light here on what could be wrong?
>>
>> Thanks,
>>
>> Yong Qin
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> [email protected]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 9
> Date: Thu, 6 Sep 2012 08:03:06 -0400
> From: Jeff Squyres <[email protected]>
> Subject: Re: [OMPI users] Regarding the Pthreads
> To: Open MPI Users <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=iso-8859-1
>
> Your question is somewhat outside the scope of this list. Perhaps people may
> chime in with some suggestions, but that's more of a threading question than
> an MPI question.
>
> Be warned that you need to call MPI_Init_thread (not MPI_Init) with
> MPI_THREAD_MULTIPLE in order to get true multi-threaded support in Open MPI.
> And we only support that on the TCP and shared memory transports if you built
> Open MPI with threading support enabled.
>
>
> On Sep 5, 2012, at 2:23 PM, seshendra seshu wrote:
>
>> Hi,
>> I am learning pthreads and trying to implement the pthreads in my quicksort
>> program.
>> My problem is iam unable to understand how to implement the pthreads at data
>> received at a node from the master (In detail: In my program Master will
>> divide the data and send to the slaves and each slave will do the sorting
>> independently of The received data and send back to master after sorting is
>> done. Now Iam having a problem in Implementing the pthreads at the
>> slaves,i.e how to implement the pthreads in order to share data among the
>> cores in each slave and sort the data and send it back to master.
>> So could anyone help in solving this problem by providing some suggestions
>> and clues.
>>
>> Thanking you very much.
>>
>> --
>> WITH REGARDS
>> M.L.N.Seshendra
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> [email protected]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 10
> Date: Thu, 6 Sep 2012 08:05:30 -0400
> From: Jeff Squyres <[email protected]>
> Subject: Re: [OMPI users] python-mrmpi() failed
> To: Open MPI Users <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=us-ascii
>
> On Sep 4, 2012, at 3:09 PM, mariana Vargas wrote:
>
>> I 'am new in this, I have some codes that use mpi for python and I
>> just installed (openmpi, mrmpi, mpi4py) in my home (from a cluster
>> account) without apparent errors and I tried to perform this simple
>> test in python and I get the following error related with openmpi,
>> could you help to figure out what is going on? I attach as many
>> informations as possible...
>
> I think I know what's happening here.
>
> It's a complicated linker issue that we've discussed before -- I'm not sure
> whether it was on this users list or the OMPI developers list.
>
> The short version is that you should remove your prior Open MPI installation,
> and then rebuild Open MPI with the --disable-dlopen configure switch. See if
> that fixes the problem.
>
>> Thanks.
>>
>> Mariana
>>
>>
>> From a python console
>> >>> from mrmpi import mrmpi
>> >>> mr=mrmpi()
>> [ferrari:23417] mca: base: component_find: unable to open /home/
>> mvargas/lib/openmpi/mca_paffinity_hwloc: /home/mvargas/lib/openmpi/
>> mca_paffinity_hwloc.so: undefined symbol: opal_hwloc_topology (ignored)
>> [ferrari:23417] mca: base: component_find: unable to open /home/
>> mvargas/lib/openmpi/mca_carto_auto_detect: /home/mvargas/lib/openmpi/
>> mca_carto_auto_detect.so: undefined symbol:
>> opal_carto_base_graph_get_host_graph_fn (ignored)
>> [ferrari:23417] mca: base: component_find: unable to open /home/
>> mvargas/lib/openmpi/mca_carto_file: /home/mvargas/lib/openmpi/
>> mca_carto_file.so: undefined symbol:
>> opal_carto_base_graph_get_host_graph_fn (ignored)
>> [ferrari:23417] mca: base: component_find: unable to open /home/
>> mvargas/lib/openmpi/mca_shmem_mmap: /home/mvargas/lib/openmpi/
>> mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
>> [ferrari:23417] mca: base: component_find: unable to open /home/
>> mvargas/lib/openmpi/mca_shmem_posix: /home/mvargas/lib/openmpi/
>> mca_shmem_posix.so: undefined symbol: opal_show_help (ignored)
>> [ferrari:23417] mca: base: component_find: unable to open /home/
>> mvargas/lib/openmpi/mca_shmem_sysv: /home/mvargas/lib/openmpi/
>> mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
>> --------------------------------------------------------------------------
>> It looks like opal_init failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during opal_init; some of which are due to configuration or
>> environment problems. This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>> opal_shmem_base_select failed
>> --> Returned value -1 instead of OPAL_SUCCESS
>> --------------------------------------------------------------------------
>> [ferrari:23417] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
>> runtime/orte_init.c at line 79
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> ompi_mpi_init: orte_init failed
>> --> Returned "Error" (-1) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>> [ferrari:23417] Local abort before MPI_INIT completed successfully;
>> not able to aggregate error messages, and not able to guarantee that
>> all other processes were killed!
>>
>>
>>
>> echo $PATH
>>
>> /home/mvargas/idl/pro/LibsSDSSS/idlutilsv5_4_15/bin:/usr/local/itt/
>> idl70/bin:/opt/local/bin:/home/mvargas/bin:/home/mvargas/lib:/home/
>> mvargas/lib/openmpi/:/home/mvargas:/home/vargas/bin/:/home/mvargas/idl/
>> pro/LibsSDSSS/idlutilsv5_4_15/bin:/usr/local/itt/idl70/bin:/opt/local/
>> bin:/home/mvargas/bin:/home/mvargas/lib:/home/mvargas/lib/openmpi/:/
>> home/mvargas:/home/vargas/bin/:/usr/lib64/qt3.3/bin:/usr/kerberos/bin:/
>> usr/local/bin:/bin:/usr/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/
>> envswitcher/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX64:/opt/pvm3/bin/
>> LINUX64:/opt/c3-4/
>>
>> echo $LD_LIBRARY_PATH
>> /usr/local/mpich2/lib:/home/mvargas/lib:/home/mvargas/:/home/mvargas/
>> lib64:/home/mvargas/lib/openmpi/:/usr/lib64/openmpi/1.4-gcc/lib/:/user/
>> local/:/usr/local/mpich2/lib:/home/mvargas/lib:/home/mvargas/:/home/
>> mvargas/lib64:/home/mvargas/lib/openmpi/:/usr/lib64/openmpi/1.4-gcc/
>> lib/:/user/local/:
>>
>> Version: openmpi-1.6
>>
>>
>>
>> mpirun --bynode --tag-output ompi_info -v ompi full --parsable
>> [1,0]<stdout>:package:Open MPI mvargas@ferrari Distribution
>> [1,0]<stdout>:ompi:version:full:1.6
>> [1,0]<stdout>:ompi:version:svn:r26429
>> [1,0]<stdout>:ompi:version:release_date:May 10, 2012
>> [1,0]<stdout>:orte:version:full:1.6
>> [1,0]<stdout>:orte:version:svn:r26429
>> [1,0]<stdout>:orte:version:release_date:May 10, 2012
>> [1,0]<stdout>:opal:version:full:1.6
>> [1,0]<stdout>:opal:version:svn:r26429
>> [1,0]<stdout>:opal:version:release_date:May 10, 2012
>> [1,0]<stdout>:mpi-api:version:full:2.1
>> [1,0]<stdout>:ident:1.6
>>
>>
>> eth0 Link encap:Ethernet HWaddr 00:30:48:95:99:CC
>> inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0
>> inet6 addr: fe80::230:48ff:fe95:99cc/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:4739875255 errors:0 dropped:1636 overruns:0 frame:0
>> TX packets:5196871012 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:4959384349297 (4.5 TiB) TX bytes:3933641883577 (3.5
>> TiB)
>> Memory:ef300000-ef320000
>>
>> eth1 Link encap:Ethernet HWaddr 00:30:48:95:99:CD
>> inet addr:128.2.116.104 Bcast:128.2.119.255 Mask:
>> 255.255.248.0
>> inet6 addr: fe80::230:48ff:fe95:99cd/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:2645952109 errors:0 dropped:13353 overruns:0 frame:0
>> TX packets:2974763570 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:2024044043824 (1.8 TiB) TX bytes:3390935387820 (3.0
>> TiB)
>> Memory:ef400000-ef420000
>>
>> lo Link encap:Local Loopback
>> inet addr:127.0.0.1 Mask:255.0.0.0
>> inet6 addr: ::1/128 Scope:Host
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:143359307 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:143359307 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:80413513464 (74.8 GiB) TX bytes:80413513464 (74.8
>> GiB)
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> <files.tar.gz>
>
>
> --
> Jeff Squyres
> [email protected]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 11
> Date: Thu, 6 Sep 2012 10:23:04 -0400
> From: Jeff Squyres <[email protected]>
> Subject: Re: [OMPI users] MPI_Cart_sub periods
> To: Open MPI Users <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=iso-8859-1
>
> John --
>
> This cartesian stuff always makes my head hurt. :-)
>
> You seem to have hit on a bona-fide bug. I have fixed the issue in our SVN
> trunk and will get the fixed moved over to the v1.6 and v1.7 branches.
>
> Thanks for the report!
>
>
> On Aug 29, 2012, at 5:32 AM, Craske, John wrote:
>
>> Hello,
>>
>> We are partitioning a two-dimensional Cartesian communicator into
>> two one-dimensional subgroups. In this situation we have found
>> that both one-dimensional communicators inherit the period
>> logical of the first dimension of the original two-dimensional
>> communicator when using Open MPI. Using MPICH each
>> one-dimensional communicator inherits the period corresponding to
>> the dimensions specified in REMAIN_DIMS, as expected. Could this
>> be a bug, or are we making a mistake? The relevant calls we make in a
>> Fortran code are
>>
>> CALL MPI_CART_CREATE(MPI_COMM_WORLD, 2, (/ NDIMX, NDIMY /), (/ .True.,
>> .False. /), .TRUE.,
>> COMM_CART_2D, IERROR)
>>
>> CALL MPI_CART_SUB(COMM_CART_2D, (/ .True., .False. /), COMM_CART_X, IERROR)
>> CALL MPI_CART_SUB(COMM_CART_2D, (/ .False., .True. /), COMM_CART_Y, IERROR)
>>
>> Following these requests,
>>
>> CALL MPI_CART_GET(COMM_CART_X, MAXDIM_X, DIMS_X, PERIODS_X, COORDS_X, IERROR)
>> CALL MPI_CART_GET(COMM_CART_Y, MAXDIM_Y, DIMS_Y, PERIODS_Y, COORDS_Y, IERROR)
>>
>> will result in
>>
>> PERIODS_X = T
>> PERIODS_Y = T
>>
>> If, on the other hand we define the two-dimensional communicator
>> using PERIODS = (/ .False., .True. /), we find
>>
>> PERIODS_X = F
>> PERIODS_Y = F
>>
>> Your advice on the matter would be greatly appreciated.
>>
>> Regards,
>>
>> John.
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> [email protected]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 12
> Date: Thu, 06 Sep 2012 16:58:03 +0200
> From: Shiqing Fan <[email protected]>
> Subject: Re: [OMPI users] error compiling openmpi-1.6.1 on Windows 7
> To: Siegmar Gross <[email protected]>
> Cc: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Siegmar,
>
> Glad to hear that it's working for you.
>
> The warning message is because the loopback adapter is excluded by
> default, but this adapter is actually not installed on Windows.
>
> One solution might be installing the loopback adapter on Windows. It
> very easy, only a few minutes.
>
> Or it may be possible to avoid this message from internal Open MPI. But
> I'm not sure about how this can be done.
>
>
> Regards,
> Shiqing
>
>
> On 2012-09-06 7:48 AM, Siegmar Gross wrote:
>> Hi Shiqing,
>>
>> I have solved the problem with the double quotes in OPENMPI_HOME but
>> there is still something wrong.
>>
>> set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1"
>>
>> mpicc init_finalize.c
>> Cannot open configuration file "c:\Program Files
>> (x86)\openmpi-1.6.1"/share/openmpi\mpicc-wrapper-data.txt
>> Error parsing data file mpicc: Not found
>>
>>
>> Everything is OK if you remove the double quotes which Windows
>> automatically adds.
>>
>> set OPENMPI_HOME=c:\Program Files (x86)\openmpi-1.6.1
>>
>> mpicc init_finalize.c
>> Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 16.00.40219.01 f?r
>> 80x86
>> ...
>>
>> mpiexec init_finalize.exe
>> --------------------------------------------------------------------------
>> WARNING: An invalid value was given for btl_tcp_if_exclude. This
>> value will be ignored.
>>
>> Local host: hermes
>> Value: 127.0.0.1/8
>> Message: Did not find interface matching this subnet
>> --------------------------------------------------------------------------
>>
>> Hello!
>>
>>
>> I get the output from my program but also a warning from Open MPI.
>> The new value for the loopback device was introduced a short time
>> ago when I have had problems with the loopback device on Solaris
>> (it used "lo0" instead of your default "lo"). How can I avoid this
>> message? The 64-bit version of my program still hangs.
>>
>>
>> Kind regards
>>
>> Siegmar
>>
>>
>>>> Could you try set OPENMPI_HOME env var to the root of the Open MPI dir?
>>>> This env is a backup option for the registry.
>>> It solves one problem but there is a new problem now :-((
>>>
>>>
>>> Without OPENMPI_HOME: Wrong pathname to help files.
>>>
>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>>> --------------------------------------------------------------------------
>>> Sorry! You were supposed to get help about:
>>> invalid if_inexclude
>>> But I couldn't open the help file:
>>> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt:
>>> No such file or directory. Sorry!
>>> --------------------------------------------------------------------------
>>> ...
>>>
>>>
>>>
>>> With OPENMPI_HOME: It nearly uses the correct directory. Unfortunately
>>> the pathname contains the character " in the wrong place so that it
>>> couldn't find the available help file.
>>>
>>> set OPENMPI_HOME="c:\Program Files (x86)\openmpi-1.6.1"
>>>
>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>>> --------------------------------------------------------------------------
>>> Sorry! You were supposed to get help about:
>>> no-hostfile
>>> But I couldn't open the help file:
>>> "c:\Program Files
>>> (x86)\openmpi-1.6.1"\share\openmpi\help-hostfile.txt: Invalid argument.
>>> Sorry
>>> !
>>> --------------------------------------------------------------------------
>>> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file
>>> ..\..\openmpi-1.6.1\orte\mca\ras\base
>>> \ras_base_allocate.c at line 200
>>> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file
>>> ..\..\openmpi-1.6.1\orte\mca\plm\base
>>> \plm_base_launch_support.c at line 99
>>> [hermes:04964] [[12187,0],0] ORTE_ERROR_LOG: Not found in file
>>> ..\..\openmpi-1.6.1\orte\mca\plm\proc
>>> ess\plm_process_module.c at line 996
>>>
>>>
>>>
>>> It looks like that the environment variable can also solve my
>>> problem in the 64-bit environment.
>>>
>>> D:\g...\prog\mpi\small_prog>mpicc init_finalize.c
>>>
>>> Microsoft (R) C/C++-Optimierungscompiler Version 16.00.40219.01 f?r x64
>>> ...
>>>
>>>
>>> The process hangs without OPENMPI_HOME.
>>>
>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>>> ^C
>>>
>>>
>>> With OPENMPI_HOME:
>>>
>>> set OPENMPI_HOME="c:\Program Files\openmpi-1.6.1"
>>>
>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>>> --------------------------------------------------------------------------
>>> Sorry! You were supposed to get help about:
>>> no-hostfile
>>> But I couldn't open the help file:
>>> "c:\Program Files\openmpi-1.6.1"\share\openmpi\help-hostfile.txt:
>>> Invalid argument. S
>>> orry!
>>> --------------------------------------------------------------------------
>>> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file
>>> ..\..\openmpi-1.6.1\orte\mc
>>> a\ras\base\ras_base_allocate.c at line 200
>>> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file
>>> ..\..\openmpi-1.6.1\orte\mc
>>> a\plm\base\plm_base_launch_support.c at line 99
>>> [hermes:05248] [[10367,0],0] ORTE_ERROR_LOG: Not found in file
>>> ..\..\openmpi-1.6.1\orte\mc
>>> a\plm\process\plm_process_module.c at line 996
>>>
>>>
>>> At least the program doesn't block any longer. Do you have any ideas
>>> how this new problem can be solved?
>>>
>>>
>>> Kind regards
>>>
>>> Siegmar
>>>
>>>
>>>
>>>> On 2012-09-05 1:02 PM, Siegmar Gross wrote:
>>>>> Hi Shiqing,
>>>>>
>>>>>>>> D:\...\prog\mpi\small_prog>mpiexec init_finalize.exe
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> Sorry! You were supposed to get help about:
>>>>>>>> invalid if_inexclude
>>>>>>>> But I couldn't open the help file:
>>>>>>>>
>>>>>>>> D:\...\prog\mpi\small_prog\..\share\openmpi\help-mpi-btl-tcp.txt:
>>>>>>>> No such file or directory. Sorry!
>>>>>>>> ---------------------------------------------------------------------
>>>>>>> ...
>>>>>>>> Why does "mpiexec" look for the help file relativ to my current
>>>>>>>> program and not relative to itself? The file is part of the
>>>>>>>> package.
>>>>>>> Do you know how I can solve this problem?
>>>>>> I have similar issue with message from tcp, but it's not finding the
>>>>>> file, it's something else, which doesn't affect the execution of the
>>>>>> application. Could you make sure the help-mpi-btl-tcp.txt is actually in
>>>>>> the path D:\...\prog\mpi\small_prog\..\share\openmpi\?
>>>>> That wouldn't be a good idea because I have MPI programs in different
>>>>> directories so that I would have to install all help files in several
>>>>> places (<my_directory>/../share/openmpi/help*.txt). All help files are
>>>>> available in the installation directory of Open MPI.
>>>>>
>>>>> dir "c:\Program Files (x86)\openmpi-1.6.1\bin\mpiexec.exe"
>>>>> ...
>>>>> 29.08.2012 10:59 38.912 mpiexec.exe
>>>>> ...
>>>>> dir "c:\Program Files
>>>>> (x86)\openmpi-1.6.1\bin\..\share\openmpi\help-mpi-btl-tcp.txt"
>>>>> ...
>>>>> 03.04.2012 16:30 631 help-mpi-btl-tcp.txt
>>>>> ...
>>>>>
>>>>> I don't know if "mpiexec" or my program "init_finilize" is responsible
>>>>> for the error message but whoever is responsible shouldn't use the path
>>>>> to my program but the prefix_dir from MPI to find the help files. Perhaps
>>>>> you can change the behaviour in the Open MPI source code.
>>>>>
>>>>>
>>>>>>>> I can also compile in 64-bit mode but the program hangs.
>>>>>>> Do you have any ideas why the program hangs? Thank you very much for any
>>>>>>> help in advance.
>>>>>> To be honest I don't know. I couldn't reproduce it. Did you try
>>>>>> installing the binary installer, will it also behave the same?
>>>>> I like to have different versions of Open MPI which I activate via
>>>>> a batch file so that I can still run my program in an old version if
>>>>> something goes wrong in a new one. I have no entries in the system
>>>>> environment or registry so that I can even run different versions in
>>>>> different command windows without problems (everything is only known
>>>>> within the command window in which a have run my batch file). It seems
>>>>> that you put something in the registry when I use your installer.
>>>>> Perhaps you remember an earlier email where I had to uninstall an old
>>>>> version because the environment in my own installation was wrong
>>>>> as long as your installation was active. Nevertheless I can give it
>>>>> a try. Perhaps I find out if you set more than just the path to your
>>>>> binaries. Do you know if there is something similar to "truss" or
>>>>> "strace" in the UNIX world so that I can see where the program hangs?
>>>>> Thank you very much for your help in advance.
>>>>>
>>>>>
>>>>> Kind regards
>>>>>
>>>>> Siegmar
>>>>>
>>>>
>>>> --
>>>> ---------------------------------------------------------------
>>>> Shiqing Fan
>>>> High Performance Computing Center Stuttgart (HLRS)
>>>> Tel: ++49(0)711-685-87234 Nobelstrasse 19
>>>> Fax: ++49(0)711-685-65832 70569 Stuttgart
>>>> http://www.hlrs.de/organization/people/shiqing-fan/
>>>> email: [email protected]
>>>>
>>>
>>
>
>
> --
> ---------------------------------------------------------------
> Shiqing Fan
> High Performance Computing Center Stuttgart (HLRS)
> Tel: ++49(0)711-685-87234 Nobelstrasse 19
> Fax: ++49(0)711-685-65832 70569 Stuttgart
> http://www.hlrs.de/organization/people/shiqing-fan/
> email: [email protected]
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 2345, Issue 1
> **************************************