Hi Brock, Angel, Reuti,
You might want to look at a tool we developed:
http://radical-cybertools.github.io/radical-pilot/index.html
This was actually one of the drivers for isolating the persistent ORTE DVM
thats being discussed in this thread.
With RADICAL-Pilot you can use a Python API to l
> On 05 Apr 2016, at 16:46 , Aurélien Bouteiller wrote:
> Open MPI uses clock_gettime when it is available, and defaults to
> gettimeofday only when this better option can't be found. Check that your
> system has clock_gettime and the resolution of this timer.
Depending on what you mean, I do
> On 03 Mar 2016, at 23:22 , Davide Vanzo wrote:
> I have built OpenMPI 1.10.2 with RoCE network support on our test cluster. On
> the cluster we use lmod to manage paths to different versions of softwares.
> The problem I have is that I receive the "orted: command not found" message
> given t
Another canonical benchmarking suite can be found at
http://www.nas.nasa.gov/publications/npb.html
> On 24 Jan 2016, at 20:51 , Ibrahim Ikhlawi wrote:
>
> Thanks for reply.
>
> But I want to have an imagination about the behaviour of my server. Therefore
> I need an Code which I can run it on
> On 23 Sep 2015, at 13:49 , Kumar, Sudhir wrote:
> I have a version of OpenMPI 1.8.5 installed. Is there any way of knowing,
> with which version of gcc it was compiled with.
ompi_info |grep -i compiler
Hi Erik,
> On 29 Jul 2015, at 3:35 , Erik Schnetter wrote:
> I was able to build openmpi-v2.x-dev-96-g918650a without problems on Edison,
> and also on other systems.
And does it also work as expected after you have build it? :-)
Thanks
Mark
ch as
>
> [warn] select: Bad file descriptor
>
> Are these important? If not, how can I suppress them?
>
> -erik
>
>
> On Sat, Jul 25, 2015 at 7:49 AM, Mark Santcroos
> wrote:
> Hi Erik,
>
> Do you really want 1.8.7, otherwise you might want to give la
Hi Erik,
Do you really want 1.8.7, otherwise you might want to give latest master a try.
Other including myself had more luck with that on Cray's, including Edison.
Mark
> On 25 Jul 2015, at 1:35 , Erik Schnetter wrote:
>
> I want to build OpenMPI 1.8.7 on a Cray XC30 (Edison at NERSC). I've
> On 26 Mar 2015, at 16:01 , Ralph Castain wrote:
>
>>
>> On Mar 26, 2015, at 1:33 AM, Mark Santcroos
>> wrote:
>>
>> Hi guys,
>>
>> Thanks for the follow-up.
>>
>> It appears that you are ruling out that Munge is required becau
Hi Ralph,
> On 25 Mar 2015, at 21:59 , Mark Santcroos wrote:
>> Anyway, see if this fixes the problem.
>>
>> https://github.com/open-mpi/ompi/pull/497
Can confirm the fallback works now without setting explicitly to basic (with
the merged changes).
Thanks!
Mark
Hi guys,
Thanks for the follow-up.
It appears that you are ruling out that Munge is required because the system
runs TORQUE, but as far as I can see Munge is/can be used by both SLURM and
TORQUE.
(http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/1-installConfig/serverConfig.htm#usi
Hi Ralph,
> On 25 Mar 2015, at 21:25 , Ralph Castain wrote:
> I think I have this resolved,
> though that I still suspect their is something wrong on that system. You
> shouldn’t have some nodes running munge and others not running it.
For completeness, it's not "some" nodes, its the MOM (servi
> On 25 Mar 2015, at 17:39 , Ralph Castain wrote:
> Not surprising - I’m surprised to find munge on the mom’s node anyway given
> that you are using Torque.
>
> I have to finish something else first, and it sounds like you aren’t blocked
> at the moment. I’ll provide a patch for you to try lat
tch for the very reason you are hitting - it becomes
> difficult to resolve authentications.
>
> Let me ponder a bit. We can resolve it easily enough, but I want to ensure we
> don’t do it by creating a security hole.
>
>> On Mar 25, 2015, at 9:25 AM, Mark Santcroos
&g
> On 25 Mar 2015, at 17:06 , Ralph Castain wrote:
>
> OHO! You have munge running on the head node, but not on the backends!
Ok, so I now know that munge is ... :)
It's running on the MOM node (not on the head node):
daemon 18800 0.0 0.0 118476 3212 ?Sl 01:27 0:00
/usr/sbin/
> On 25 Mar 2015, at 17:06 , Ralph Castain wrote:
> OHO! You have munge running on the head node, but not on the backends!
Im all for munching, but what does that mean? ;-)
Is that something actively running or do you mean library available or such?
> Okay, all you have to do is set the MCA pa
> On 25 Mar 2015, at 16:52 , Ralph Castain wrote:
>
> Hmmm…okay, sorry to keep drilling down here, but let’s try adding “-mca
> sec_base_verbose 100” now
> /u/sciteam/marksant/openmpi/installation/bin/mpirun -mca oob_base_verbose 100
> -mca sec_base_verbose 100 ./a.out
[nid25257:09727] mca:
25 Mar 2015, at 16:49 , Ralph Castain wrote:
>
> Hmmm…well, it will generate some output, so keep the system down to two nodes
> if you can just to minimize the chatter. Add “-mca oob_base_verbose 100” to
> your cmd line
>
>> On Mar 25, 2015, at 8:45 AM, Mark Santcroos
25, 2015, at 7:46 AM, Howard Pritchard wrote:
>>
>> turn off the disable getpwuid.
>>
>> On Mar 25, 2015 8:14 AM, "Mark Santcroos" wrote:
>> Hi Howard,
>>
>> > On 25 Mar 2015, at 14:58 , Howard Pritchard wrote:
>> > How are
> On 25 Mar 2015, at 15:46 , Howard Pritchard wrote:
> turn off the disable getpwuid.
That doesn't seem to make a difference.
Have their been changes in this area? Last time I checked this a couple of
months ago on Edison I needed this flag not to get spammed.
e-submit and friends, so I "explicitly" don't want to use
aprun.
> you definitely dont need to use ccm.
> and shouldnt.
Depends on the use-case, but happy to leave that out of scope for now :-)
Thanks!
Mark
>
> On Mar 25, 2015 6:00 AM, "Mark Santcroos" wrote:
&
Hi,
Any users of Open MPI on Blue Waters here?
And then I specifically mean in "native" mode, not inside CCM.
After configuring and building as I do on other Cray's, mpirun gives me the
following:
[nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in file
../../../../../orte/m
gt; needing work, and we can kick around off-list about who does what.
>
> Great to hear this is working with your tool so quickly!!
> Ralph
>
>
> On Tue, Feb 3, 2015 at 3:49 PM, Mark Santcroos
> wrote:
> Hi Ralph,
>
> Besides the items in the other mail, I have th
lot/commit/2d36e886081bf8531097edfc95ada1826257e460)
> On 03 Feb 2015, at 20:38 , Mark Santcroos wrote:
>
> Hi Ralph,
>
>> On 03 Feb 2015, at 16:28 , Ralph Castain wrote:
>> I think I fixed some of the handshake issues - please give it another try.
>> You should see orte-submit proper
Hi Ralph,
> On 03 Feb 2015, at 16:28 , Ralph Castain wrote:
> I think I fixed some of the handshake issues - please give it another try.
> You should see orte-submit properly shutdown upon completion,
Indeed, it works on my laptop now! Great!
It feels quite fast too, for sort tasks :-)
> and or
On 03 Feb 2015, at 0:20 , Ralph Castain wrote:
> Okay, thanks - I'll get on it tonight. Looks like a fairly simple bug, so
> hopefully I'll have it ironed out tonight.
Sorry, I was not completely accurate. Let me be more specific:
* The orte-submit does not return though, so that symptom is sim
FWIW: I see similar behaviour on my laptop (OS X Yosemite 10.10.2).
> On 02 Feb 2015, at 21:26 , Mark Santcroos wrote:
>
> Ok, let me check on some other systems too though, it might be Cray specific.
>
>
>> On 02 Feb 2015, at 19:07 , Ralph Castain wrote:
>>
travel this week,
> but I'll try to dig into this a bit and spot the issue.
>
> Thanks!
> Ralph
>
>
> On Mon, Feb 2, 2015 at 3:50 AM, Mark Santcroos
> wrote:
> Hi Ralph,
>
> Great, the semantics look exactly as what I need!
>
> (To aid in debugging
ing options supported just yet as I
> first wanted to see if this meets your basic needs before worrying about the
> detail.
>
> Let me know what you think
> Ralph
>
>
>> On Jan 21, 2015, at 4:07 PM, Mark Santcroos
>> wrote:
>>
>> Hi Ralph,
will allow
> you to reuse the existing DVM, making each independent job start a great deal
> faster. You’ll need to either manually terminate the DVM, or the RM will do
> so when the allocation expires.
>
> HTH
> Ralph
>
>
>> On Jan 21, 2015, at 12:52 PM,
Hi Ralph,
> On 21 Jan 2015, at 21:20 , Ralph Castain wrote:
>
> Hi Mark
>
>> On Jan 21, 2015, at 11:21 AM, Mark Santcroos
>> wrote:
>>
>> Hi Ralph, all,
>>
>> To give some background, I'm part of the RADICAL-Pilot [1] development team.
Hi Ralph, all,
To give some background, I'm part of the RADICAL-Pilot [1] development team.
RADICAL-Pilot is a Pilot System, an implementation of the Pilot (job) concept,
which is in its most minimal form takes care of the decoupling of resource
acquisition and workload management.
So instead of
ully implemented yet, but the
>> fundamental idea is valid and supports some range of capability.
>>
>> We used to have a cmd line option in ORTE for what you propose - it wouldn’t
>> be too hard to restore. Is there some reason to do so?
>>
>>
>>>
Hi,
Would it be possible to initially run "idle" orted's on my resources and then
use orterun to launch my applications to these already running orted's.
Thanks!
Mark
34 matches
Mail list logo