Re: [OMPI users] OpenMPI installation issue or mpi4py compatibility problem

2017-05-21 Thread Reuti
_LIBRARY_PATH differently I don't think that Ubuntu will do anything different than any other Linux. Did you compile Open MPI on your own, or did you install any repository? Are the CUDA application written by yourself or any freely available applications? - -- Reuti > and instead add

Re: [OMPI users] OpenMPI installation issue or mpi4py compatibility problem

2017-05-22 Thread Reuti
s. > Regarding the final part of the email, is it a problem that 'undefined > reference' is appearing? Yes, it tries to resolve missing symbols and didn't succeed. -- Reuti > > Thanks and regards, > Tim > > On 22 May 2017 at 06:54, Reuti wrote: > >&

Re: [OMPI users] OpenMPI installation issue or mpi4py compatibility problem

2017-05-23 Thread Reuti
Hi, Am 23.05.2017 um 05:03 schrieb Tim Jim: > Dear Reuti, > > Thanks for the reply. What options do I have to test whether it has > successfully built? LIke before: can you compile and run mpihello.c this time – all as ordinary user in case you installed the Open MPI into so

Re: [OMPI users] No components were able to be opened in the pml framework

2017-05-30 Thread Reuti
chemistry > program. Did you compile Open MPI on your own? Did you move it after the installation? -- Reuti ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] No components were able to be opened in the pml framework

2017-05-30 Thread Reuti
64? 2) do you source .bashrc also for interactive logins? Otherwise it should go to ~/.bash_profile or ~/.profile > > > En date de : Mar 30.5.17, Reuti a écrit : > > Objet: Re: [OMPI users] No components were able to be opened in the pml &

Re: [OMPI users] No components were able to be opened in the pml framework

2017-05-30 Thread Reuti
is an additional point: which one? It might be, that you have to put the two exports of PATHS and LD_LIBRARY_PATH in your jobscript instead, in you never want to run the application from the command line in parallel. -- Reuti > > En date de : Mar

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Reuti
t; > qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out m_mem_free is part of Univa SGE (but not the various free ones of SGE AFAIK). Also: this syntax is for SGE, in LSF it's different. To have this independent from the actual queuing system, one could look into DR

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Reuti
e their headers installed on it. Then configure OMPI > --with-xxx pointing to each of the RM’s headers so all the components get > built. When the binary hits your customer’s machine, only those components > that have active libraries present will execute. Just note, th

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Reuti
a string in the environment variable, you may want to use the plain value in bytes there. -- Reuti ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Reuti
rsions. So I can't comment on this for sure, but it seems to set the memory also in cgroups. -- Reuti > mpirun just uses the nodes that SGE provides. > > What your cmd line does is restrict the entire operation on each node (daemon > + 8 procs) to 40GB of memory. OMPI

Re: [OMPI users] Questions about integration with resource distribution systems

2017-08-01 Thread Reuti
arently > never propagated through remote startup, Isn't it a setting inside SGE which the sge_execd is aware of? I never exported any environment variable for this purpose. -- Reuti > so killing those orphans after > VASP crashes may fail, though resource reporting works. (I ne

Re: [OMPI users] Setting LD_LIBRARY_PATH for orted

2017-08-21 Thread Reuti
> How do I get around this cleanly? This works just fine when I set > LD_LIBRARY_PATH in my .bashrc, but I’d rather not pollute that if I can avoid > it. Do you set or extend the LD_LIBRARY_PATH in your .bashrc? -- Reuti ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Setting LD_LIBRARY_PATH for orted

2017-08-22 Thread Reuti
IBRARY_PATH > > this is the easiest option, but cannot be used if you plan to relocate the > Open MPI installation directory. There is the tool `chrpath` to change rpath and runpath inside a binary/library. This has to match relocated directory then. -- Reuti > an other

[OMPI users] Honor host_aliases file for tight SGE integration

2017-09-13 Thread Reuti
obscript to feed an "adjusted" $PE_HOSTFILE to Open MPI and then it's working as intended: Open MPI creates forks. Does anyone else need such a patch in Open MPI and is it suitable to be included? -- Reuti PS: Only the headnodes have more than one network interface in our cas

[OMPI users] SGE integration when getting slots from different queues on one and the same host mismatch

2010-08-10 Thread Reuti
list. -- Reuti

Re: [OMPI users] compilation error using pgf90 ver 9.0

2010-09-01 Thread Reuti
n package with pgcc/pgCC too (then it succeeds for me)? Although I think it's a bug, I tend to stay inside one and the same compile process with only one compiler vendor anyway. -- Reuti NB: Latest version is 1.4.2. > I appreciate if any one can help me. > > Best Regard >

Re: [OMPI users] compiler upgrades require openmpi rebuild?

2010-09-03 Thread Reuti
> a simple as changing your LD_LIBRARY_PATH, but it might not. what's inside the binaries (like mpirun) can be checked with: $ readelf -a mpirun ... 0x000f (RPATH) Library rpath: [/opt/pgi/linux86-64/9.0-4/lib] 0x0000001d (RUNPATH)Library run

Re: [OMPI users] Thread as MPI process

2010-09-21 Thread Reuti
ot; or "linux threads" then no, you cannot have different > threads on different nodes under any programming paradigm. There are some efforts like http://www.kerrighed.org/wiki/index.php/Main_Page, but for the current release the thread migration is indeed disabled. -- Reuti > Ho

Re: [OMPI users] Thread as MPI process

2010-09-21 Thread Reuti
w you to transfer information between nodes and start worker processes: http://www.netlib.org/pvm3/book/node17.html Looks like PVM is no longer included in Pelican_HPC by default, but you can compile it on your own. -- Reuti > This is what i've tried to explain in the last msg. A dream for

Re: [OMPI users] Shared memory

2010-09-24 Thread Reuti
index.php/Main_Page). Wasn't this also one idea behind "High Performance Fortran" - running in parallel across nodes even without knowing that it's across nodes at all while programming and access all data like it's being local. -- Reuti > ___

Re: [OMPI users] how to tell if opempi is using rsh or ssh

2010-09-30 Thread Reuti
ed passphraseless keys which I would suggest to replace with hostbased authentication anyway). -- Reuti > Does this setting fail over to ssh if rsh is not available or should it just > use rsh only??? Also is there any command > (this is a linux cluster) to see if ssh is being used. I

Re: [OMPI users] STDIN

2010-10-03 Thread Reuti
t file, not the name of the input file (maybe the web page you mention describes MPICH(1)). -- Reuti > Is there a solution for our problem? > > Regards, > Andrei > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] memory limits on remote nodes

2010-10-07 Thread Reuti
and it's per slot then; and with a tight integration all Open MPI processes will be children of sge_execd and the limit will be enforced. -- Reuti > The nodes of our cluster each have 24GB of physical memory, of > which 4GB is taken up by the kernel and the root file system. > Not

Re: [OMPI users] memory limits on remote nodes

2010-10-08 Thread Reuti
Am 08.10.2010 um 00:40 schrieb Ralph Castain: > > On Oct 7, 2010, at 2:55 AM, Reuti wrote: > >> Am 07.10.2010 um 01:55 schrieb David Turner: >> >>> Hi, >>> >>> We would like to set process memory limits (vmemoryuse, in csh >>> ter

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-10-12 Thread Reuti
e the fact that all 4 forks are bound to one core. Should it really be four? -- Reuti > __ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-10-14 Thread Reuti
Hi, Am 14.10.2010 um 13:23 schrieb Dave Love: > Reuti writes: > >> With the default binding_instance set to "set" (the default) the >> shepherd should bind the processes to cores already. With other types >> of binding_instance these selected cores must be fo

Re: [OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Reuti
just to monitor the physical and virtual machines by an application running under MPI? It sounds like it could be done by Ganglia or Nagios then. -- Reuti > Please, Can You help me with architecture of this system (is my thoughts > right) ? > And one more qeustion - that is the bes

Re: [OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Reuti
Am 22.10.2010 um 14:09 schrieb Vasiliy G Tolstov: > On Fri, 2010-10-22 at 14:07 +0200, Reuti wrote: >> Hi, >> >> Am 22.10.2010 um 10:58 schrieb Vasiliy G Tolstov: >> >>> Hello. May be this question already answered, but i can't see it in list >>>

Re: [OMPI users] Help passing filename arguments with spaces through mpirun on windows

2010-10-26 Thread Reuti
gument list from the program, it has split the > argument into two (or more). > > I have also enclosed the arguments in quotes, but that doesn't seem to > help. I don't know for Windows, but sometimes it can help to use single and double quotes in combination: '"C:\

Re: [OMPI users] Using hostfile with default hostfile

2010-10-27 Thread Reuti
hosts > ./hello > -- > There are no allocated resources for the application > ./hello > that match the requested mapping: > ../Cluster.hosts what's in the above file? Do you run by hand in a cluster or directed by a queuing system? -- Reuti > Verify tha

Re: [OMPI users] mixed versions of openmpi ? (1.4.1 and 1.4.3)

2010-10-29 Thread Reuti
you mention below. So they should work together by design. -- Reuti > Open MPI makes an ABI promise (that started with version 1.3.2) that all the > releases in a given feature series and its corresponding super-stable series > (i.e., x.y.* and x.(y+1).*, where y is odd) are ABI compat

Re: [OMPI users] mixed versions of openmpi ? (1.4.1 and 1.4.3)

2010-10-29 Thread Reuti
Am 29.10.2010 um 18:47 schrieb Jeff Squyres: > On Oct 29, 2010, at 12:40 PM, Reuti wrote: > >>> I'd have to go check 1.4.3 and 1.4.1 to be sure, but I would generally >>> *NOT* assume that different versions like this are compatible. >> >> I'm g

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-14 Thread Reuti
pe_hostfile. Here the logical core and socket numbers are printed (they start at 0 and have no holes) in colon separated pairs (i.e. 0,0:1,0 which means core 0 on socket 0 and core 0 on socket 1). For more information about the $pe_hostfile check ge_pe(5)

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Reuti
binding_instance" "pe" and reformat the information in the $PE_HOSTFILE to a "rankfile", it should work to get the desired allocation. Maybe you can share the script with this list once you got it working. -- Reuti

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Reuti
he -binding pe linear:1, each > execution node binds processes for the job_id to one core. If I have > -binding pe linear:2, I get: > > exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1:0,2 So the cores 1 and 2 on socket 0 aren't free? -- Reuti > exec1.

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Reuti
ocesses on each slave node are threads and don't invoke an additional `qrsh -inherit ...`. If you have only one MPI process per node it's working fine? -- Reuti > Cheers, > > Chris > > > -- > Dr Chris Jewell > Department of Statistics > University of Wa

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Reuti
er.stats.local > 0,1:0,2 > exec3.cluster.stats.local 1 > batch.q@exec3.cluster.stats.local > 0,1:0,2 > exec2.cluster.stats.local 1 > batch.q@exec2.cluster.stats.local > 0,1:0,2 > exec5.cluster.stats.local 1 > batch.q@exec5.cluster.stats.local > 0,1:0,2 > > Is

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Reuti
maschine if possible. - possibly only 3, when only 3 slots are granted on a machine - you will never ever get more than 4 slots per machine, i.e. it's an upper limit for slots per machine for this particular job -- Reuti > > --td > >> -- Reuti >> >> >>

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Reuti
Correction: Am 15.11.2010 um 20:23 schrieb Terry Dontje: > On 11/15/2010 02:11 PM, Reuti wrote: >> Just to give my understanding of the problem: >> >> Am 15.11.2010 um 19:57 schrieb Terry Dontje: >> >> >>> On 11/15/2010 11:08 AM, Chris Jewell wrote: &

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Reuti
Am 16.11.2010 um 10:26 schrieb Chris Jewell: > Hi all, > >> On 11/15/2010 02:11 PM, Reuti wrote: >>> Just to give my understanding of the problem: >>>> >>>>>> Sorry, I am still trying to grok all your email as what the problem you >>&g

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Reuti
rd SGE option today (at least, I know > it used to be). I don't believe any patch or devel work is required (to > either SGE or OMPI). When you use a fixed allocation_rule and a matching -binding request it will work today. But any other case won't be distributed in the correct w

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Reuti
n we get a full output of such a run with -report-bindings turned on. I > think we should find out that things actually are happening correctly except > for the fact that the 6 of the nodes have 2 cores allocated but only one is > being bound to by a process. You mean, to accept the current behavior as being the intended one, as finally for having only one job running on these machines we get what we asked for - despite the fact that cores are lost for other processes? -- Reuti

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Reuti
Am 16.11.2010 um 17:35 schrieb Terry Dontje: > On 11/16/2010 10:59 AM, Reuti wrote: >> Am 16.11.2010 um 15:26 schrieb Terry Dontje: >> >> >>>>> >>>>> 1. allocate a specified number of cores on each node to your job >>>>> >&g

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Reuti
lf unconstrained. > If I understand the thread correctly, you're saying that this is what happens > today - true? Exactly. It won't apply any binding at all and orted would think of being unlimited. I.e. limited only by the number of slots it should use thereon. -- Reuti

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-18 Thread Reuti
And a second one "limit_cores_by_slot_count true/false" instead of new allocation_rules. To choose $fillup, $round_robin or others is independent from limiting it IMO. -- Reuti > In summary I don't believe there is any OMPI bugs related to what we've seen > and the OGE i

Re: [OMPI users] glut display 'occasionally' opens

2010-12-08 Thread Reuti
on a Rocks cluster. Any >> ideas on how to debug this will be greatly appreciated. is `xterm` working, when you start it with `mpirun` on the nodes? Besides defining -X all the time, you can also put it in ~/.ssh/config in a "Host *" rule. SSH authentication is hostbased (or

Re: [OMPI users] Using (or not using) Torque/Moab under PBS Pro as the OpenMPI launcher

2010-12-18 Thread Reuti
ostfile $PBS_NODEFILE \ >./testmom often it's sufficient to unset some of the environment variable during execution to switch off the automatic integration. I.e. unsetting $PBS_JOBID and $PBS_ENVIRONMENT in the jobscript could do it already. -- Reuti > Brock Palen > www.umic

Re: [OMPI users] bash: orted: command not found despite env vars being set

2011-01-24 Thread Reuti
PATH=/usr/lib/openmpi/bin${PATH:+:$PATH} export LD_LIBRARY_PATH=/usr/lib/openmpi/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH} -- Reuti > Currently, mpirun executes successfully on either node individually. > However, when trying to run over the network, I get: > > [mpiuser@c-199 ~]$ mpirun

Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?

2011-01-25 Thread Reuti
the nodes. When some of the parallel tasks are started on the nodes, they will get most of the computing time (this means: oversubscription by intention). The background queue can be used for less important jobs. Such a setup is usefull when your parallel application isn't running in parallel all the

Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?

2011-01-25 Thread Reuti
Am 25.01.2011 um 20:10 schrieb Will Glover: > Thanks for your response, Reuti. Actually I had seen you mention the SGE > mailing list in response to a similar question but I can't for the life of me > find that list :( The list was removed with the shutdown of the open source

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Reuti
queuing system this happens sometimes). What you can try is: $ mpirun -np 2 --debug ./a.out a b "'c d'" $ mpirun -np 2 --debug ./a.out a b "\"c d\"" -- Reuti > and not > > a b "c d" > > I think there is an issue in parsing the

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Reuti
ame happens with single quotes inside double quotes? -- Reuti > and not: > > a > b > "c d" > > 2011/1/27 Reuti > Hi, > > Am 27.01.2011 um 09:48 schrieb Gabriele Fatigati: > > > Dear OpenMPI users and developers, > > > > i'm usi

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Reuti
a failed rank and act appropriate on its own? Having a true ability to survive a dying process (i.e. rank) which might be computing already for hours would mean to have some kind of "rank RAID" or "rank Parchive". E.g. start 12 ranks when you need 10 - what ever 2 ranks

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Reuti
Am 27.01.2011 um 16:10 schrieb Joshua Hursey: > > On Jan 27, 2011, at 9:47 AM, Reuti wrote: > >> Am 27.01.2011 um 15:23 schrieb Joshua Hursey: >> >>> The current version of Open MPI does not support continued operation of an >>> MPI application after pro

Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?

2011-02-01 Thread Reuti
Open MPI, compile it to build into e.g. ~/local/openmpi-1.4.3, adjust your PATHs and you are done. Unless you build it with static libraries, it might in addition be necessary to adjust LD_LIBRARY_PATH at runtime. I use most often my own version on the clusters I have access to and disregard any i

Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)

2011-02-10 Thread Reuti
will avoid setting any known_hosts file or passphraseless ssh-keys for each user. -- Reuti > HostName domU-12-31-39-07-35-21 > BatchMode yes > IdentityFile /home/tsakai/.ssh/tsakai > ChallengeResponseAuthentication no > IdentitiesOnly yes > > # machine B

Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)

2011-02-10 Thread Reuti
hostkeys (private and public), this way it works for all users. Just for reference: http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html You could look into it later. == - Can you try to use a command when connecting from A to B? E.g. ssh `domU-12-31-39-06-74-E2 ls`. Is this working too? - Wha

Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)

2011-02-10 Thread Reuti
Hi, Am 10.02.2011 um 22:03 schrieb Tena Sakai: > Hi Reuti, > > Thanks for suggesting "LogLevel DEBUG3." I did so and complete > session is captured in the attached file. > > What I did is much similar to what I have done before: verify > that ssh works and th

Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)

2011-02-14 Thread Reuti
2011 from vixen.egcrc.org > [tsakai@dasher ~]$ > [tsakai@dasher ~]$ cd Notes/R/parallel/Rmpi/ > [tsakai@dasher Rmpi]$ > [tsakai@dasher Rmpi]$ mpirun -app app.ac3 > mpirun: killing job... a) can you ssh from dasher to vixen? b) firewall on vixen? -- Reuti >

Re: [OMPI users] Shared Memory Problem.

2011-03-26 Thread Reuti
of your source and execution. -- Reuti > 2011/3/26 Ralph Castain > Can you update to a more recent version? That version is several years > out-of-date - we don't even really support it any more. > > > On Mar 26, 2011, at 1:04 PM, Michele Marena wrote: > >> Yes

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Reuti
east under Torque. > > Somebody is playing an April Fools joke on you. The majority of > supercomputers use ssh as their sole launch mechanism, and I have seen no > indication that anyone intends to change that situation. That said, Torque is > certainly popular and a good environment.

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Reuti
up, each daemon monitors mpirun's existence. So Torque > only knows about mpirun, and Torque kills mpirun when (e.g.) walltime is > reached. OMPI's daemons see that mpirun has died and terminate their local > processes prior to terminating themselves. I thought Open MPI has a

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Reuti
r each call Problems: 1) control `ssh` under Torque 2) provide a partially hostlist to `mpirun`, maybe by disabling the default tight integration -- Reuti > A simple example: > > vayu1:~/MPI > qsub -lncpus=24,vmem=24gb,walltime=10:00 -wd -I > qsub: waiting for job 574900.vu-pbs t

Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread Reuti
Am 05.04.2011 um 11:11 schrieb SLIM H.A.: > After an upgrade of our system I receive the following error message > (openmpi 1.4.2 with gridengine): Did you move openmpi 1.4.2 to a new (i.e. different) location? -- Reuti &g

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Reuti
each process on each cluster node. > > I cannot use ssh to access each node. What about submitting another job with `mpirun ... ps -e f` or alike - in case you can request the same nodes? Can you `qrsh` to a node by the queuingsystem? -- Reuti > The program takes 8 hours to fi

Re: [OMPI users] Over committing?

2011-04-13 Thread Reuti
e 8 is the maximum IBM offers from their datasheet, and still you can request 16 per node? Can it be a memory porblem? -- Reuti > We have also tried test jobs on 8+7 (or 7+8) with inconclusive results. > Some of the live jobs run for a month or more and cut down versions do > not model

Re: [OMPI users] Try to submit OMPI job to SGE gives ERRORS (orte_plm_base_select failed & orte_ess_set_name failed)

2011-04-15 Thread Reuti
could run and > completed successfully > > qsub -q dev.q ./ompi_job.sh Then you are bypassing SGE’s slot allocation and will have wrong accounting and no job control of the slave tasks. -- Reuti > 4) Although I don't think PATH and LD_LIBRARY_PATH would cause issues in > ub

Re: [OMPI users] Condor and MPI

2011-04-15 Thread Reuti
nfiguration > of MPI with Condor. > If you know any person who uses Condor for running MPI jobs then please let > me know. Is the use of Open MPI supported by Condor? In former times they had a special universe for MPICH(1) and only for an older version to run parallel jobs under Condor. Di

Re: [OMPI users] Try to submit OMPI job to SGE gives ERRORS (orte_plm_base_select failed & orte_ess_set_name failed) (Reuti)

2011-04-15 Thread Reuti
libraries in your jobscript? Shows the output of: #!/bin/sh which mpiexec echo $LD_LIBRARY_PATH ldd ompi_job the expected ones (ompi_job is the binary and ompi_job.sh the script) when submitted with a PE request? -- Reuti > jsv_url none > jsv_allowed_mod

Re: [OMPI users] Try to submit OMPI job to SGE gives ERRORS (orte_plm_base_select failed & orte_ess_set_name failed) (Reuti)

2011-04-16 Thread Reuti
All stuff below looks fine. You can even try to start "from scratch" with a private copy of Open MPI which you install for example in $HOME/local/openmpi-1.4.3 and set the paths accordingly. -- Reuti > #!/bin/sh > which mpiexec > echo $LD_LIBRARY_PATH > ldd ompi_job >

Re: [OMPI users] Try to submit OMPI job to SGE gives ERRORS (orte_plm_base_select failed & orte_ess_set_name failed) (Reuti)

2011-04-18 Thread Reuti
t works. > > I want to confirm one more thing: does SGE's master host need to have OpenMPI > installed? Is it relevant? In principle: no. But often it's installed too, as you will compile on either the master machine or a dedicated login server. -- Reuti > Many thanks Re

Re: [OMPI users] mpirun unsuccessful when run across multiple nodes

2011-04-18 Thread Reuti
.ac.uk/SGE/howto/hostbased-ssh.html , or enable `rsh` on the machines and tell Open MPI to use it. Is: mpiexec hostname giving you a list of the involved machines? -- Reuti > Thanks a lot, > Regards, > ArchyGU > Nanyang Technological University > _

Re: [OMPI users] mpirun unsuccessful when run across multiple nodes

2011-04-19 Thread Reuti
Good, then please supply a hostfile with the names of the machines you want to run for a particular run and give it as option to `mpiexec`. See options -np and -machinefile. -- Reuti Am 19.04.2011 um 06:38 schrieb mohd naseem: > sir > when i give mpiexec hostname command. > it only

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Reuti
run bash -c "$SCRIPT" >& "$out" &, and with "mpi" it will do the same > with 'mpirun -np 1' prepended. The output I get is: what about: ( trap "" sigint; exec mpiexec ...) & i.e. replace the subshell with changed interrupt

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Reuti
o I stop it > doing so? I get: $ cat /proc/31619/status ... SigCgt: 4b813efb ... $ trap '' int $ cat /proc/31619/status ... SigCgt: 4b813ef9 ... $ trap '' hup $ cat /proc/31619/status ... SigCgt: 4b813ef8 Looks like SIGINT(2) is bit 1 and likewise SIGH

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Reuti
t- will cause ctrl-c to be sent to -both- > processes. What about setsid and pushing it in a new seesion instead of using & in the script? -- Reuti > > At least when I test it, even non-mpirun processes will abort. > >> it's not unreasonable to expect to be able

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Reuti
B, the working script looks like: >> >> setsid bash -c "mpirun command>& out"& >> tail -f out >> > > Yes - but now you can't kill mpirun when something goes wrong You can still send a sigint from the command line to the mpirun process o

Re: [OMPI users] Error using hostfile

2011-07-07 Thread Reuti
tication: http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html You have the same users on all machines with the same UID and GID? -- Reuti > mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello > > > > Copied below is the output. How d

Re: [OMPI users] Does Oracle Cluster Tools aka Sun's MPI work with LDAP?

2011-07-15 Thread Reuti
S. in both parts if the cluster, me (login marked as x here) can login >> to any node by ssh without need to type the password. >From the headnode of the cluster to a node or also between nodes? -- Reuti >> >> >> >> --

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Reuti
ient? Does the -bynode > switch imply only one slot is used on each node before it moves on to the > next? Do I get it right: inside the granted slots by SGE you want the allocation inside Open MPI to follow a specific pattern, i.e.: which rank is where?

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Reuti
Am 26.07.2011 um 21:51 schrieb Ralph Castain: > > On Jul 26, 2011, at 1:39 PM, Reuti wrote: > >> Hi, >> >> Am 26.07.2011 um 21:19 schrieb Lane, William: >> >>> I can successfully run the MPI testcode via OpenMPI 1.3.3 on less than 87 &

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-27 Thread Reuti
Am 27.07.2011 um 19:43 schrieb Lane, William: > Thank you for your help Ralph and Reuti, > > The problem turned out to be the number of file descriptors was insufficient. > > The reason given by a sys admin was that since SGE isn't a user it wasn't > initially us

Re: [OMPI users] Open MPI via SSH noob issue

2011-08-09 Thread Reuti
). > > ring_c was compiled separately on each computer, however both have the same > version of openmpi and OSX. I've gone through the FAQ and searched the user > forum, but I can't quite seems to get this problem unstuck. do you have any firewall on the machines? -- Reuti

Re: [OMPI users] How to add nodes while running job

2011-08-29 Thread Reuti
that code > has hit a distribution yet. Can you please point me to these projects? I was always wondering how to phrase it in a submission request. It would need include to specify: I need 2 hrs 2 cores, then 30 minutes 1 core and finally 6 hrs 4 cores which targets already features of a real-ti

Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Reuti
d of the user before you submit the job? In GridEngine you can specify whether the actual group id should be used for the job, or the default login id. Having a tight integration, also the slave processes will run with the same group id. -- Reuti > Ed > > From: Ralph Castain [mailto:

Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-08 Thread Reuti
Torque I can't make any definite statement for it. Are you resetting inside the job script some variables to let it run outside Torque, i.e. without tight integration? -- Reuti > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On >

Re: [OMPI users] Question on using rsh

2011-09-13 Thread Reuti
15 >> >> bin/mpirun --machinefile mpihosts.dat -np 16 -mca orte_rsh_agent >> /usr/bin/rsh./test_setup Is it a typo or copy/paste error: there is a blank missing between /usr/bin/rsh and ./test_setup - Reuti >> bloscel@f8312's password: &

Re: [OMPI users] mpiexec option for node failure

2011-09-13 Thread Reuti
i-forum.org/mpi3-ft/ which covers fault tolerance. I was pointed to it here http://www.open-mpi.org/community/lists/users/2011/01/15440.php -- Reuti > On Sep 12, 2011, at 5:52 PM, Rob Stewart wrote: > >> Hi, >> >> I have implemented a simple fault tolerant ping pong

Re: [OMPI users] Problem running under SGE

2011-09-13 Thread Reuti
d/or do local configurations exist overwriting this (qconf -sconfl)? -- Reuti > > Thanks for any guidance, > > Ed > > > error: executing task of job 139362 failed: execution daemon on host "f8312" > didn't accept task > -

Re: [OMPI users] Problem running under SGE

2011-09-13 Thread Reuti
failed: execution daemon on host "f8312" > didn't accept task did you supply a machinefile on your own? In a proper SGE integration it's running in a parallel environment. You defined and requested one? The error looks li

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Reuti
but not the launcher). Did this change? I thought you need --with-sge to get SGE support as it's longer default since 1.3? -- Reuti > > Try setting the following: > > -mca plm_rsh_disable_qrsh 1 > > on your cmd line. That should force it to avoid qrsh, and use rsh ins

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Reuti
ll is fine, the job script is already the first task and TRUE should work. -- Reuti > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Reuti > Sent: Tuesday, September 13, 2011 4:27 PM > To: Open MPI Users > Sub

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Reuti
Am 14.09.2011 um 00:25 schrieb Blosch, Edwin L: > Your comment guided me in the right direction, Reuti. And overlapped with > your guidance, Ralph. > > It works: if I add this flag then it runs > --mca plm_rsh_disable_qrsh > > Thank you both for the explanations. >

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Reuti
Am 14.09.2011 um 00:29 schrieb Ralph Castain: > > On Sep 13, 2011, at 4:25 PM, Reuti wrote: > >> Am 13.09.2011 um 23:54 schrieb Blosch, Edwin L: >> >>> This version of OpenMPI I am running was built without any guidance >>> regarding SGE in the configur

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Reuti
Hi, Am 14.09.2011 um 17:39 schrieb Blosch, Edwin L: > Thanks, Ralph, > > I get the failure messages, unfortunately: > > setgid FAILED > setgid FAILED > setgid FAILED N.B.: This would be a side effect of the tight integration to do it automatcially for all slave t

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-14 Thread Reuti
Am 14.09.2011 um 19:02 schrieb Blosch, Edwin L: > Thanks for trying. > > Do you feel that this is an impossible request without the assistance of some > process running as root, for example, as Reuti mentioned, the daemons of a > job scheduler? Or are you saying it will

Re: [OMPI users] EXTERNAL: Re: Can you set the gid of the processes created by mpirun?

2011-09-15 Thread Reuti
rent uid/gid on the machines, but with the new feature they must be the same. Okay, in a cluster they are most likely unique across all machines anyway. But just to note as a side effect. -- Reuti > Thank you again for the support > > > > From: users-boun...@open-mpi.org [ma

Re: [OMPI users] sg riddle (was: EXTERNAL: Re: Can you set the gid of the processes created by mpirun?)

2011-09-15 Thread Reuti
oo many arguments. I works again if I use * in place of a @, but this will change how the bash will assemble the arguments: $ set "11 12" "21 22" $ echo $1 11 12 $ echo $2 21 22 $ sg 24000 "./tt.sh ${*}" 12 11 What I expect is: $ ./tt.sh "${@}" 21 22 11 1

Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-15 Thread Reuti
on other nodes run with group 1040, and the files they create have >> group ownership 1040. What about setting the set-guid flag for the directory? Created files therein will inherit the group from the folder then (which has to be set to the group in question of course). -- Reuti

Re: [OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-20 Thread Reuti
software.intel.com/file/6335 (page 260) You could try with the mentioned switches whether you get more consistent output. If there would be a MPI ABI, and you could just drop in any MPI library, it would be quite easy to spot the real point where the discrepancy occured. -- Reuti > Tha

<    1   2   3   4   5   6   >