Hi,
I'm trying to set up a new small cluster. It's based on Sun's X4100
servers running Solaris 10_x86. I have Open MPI that comes within
Clustertools 7. In addition, I have an Infiniband network between
the nodes. I can run parallel jobs fine if processes remain on one
node (each node has
Further to my email below regarding problems with uDAPL across IB, I
found a bug report lodged with Sun (also reported with Opensolaris at:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6545187).
I will lodge a support call with Sun first thing Monday though it
might not get me very
me to reply at the weekend! Much appreciated.
Glenn
Glenn,
Are you running with Solaris 10 Update 3 (11/06) and with this patch
125793-01? It is required for running with udapl btl.
http://www.sun.com/products-n-solutions/hardware/docs/html/819-7478-11/body.html#93180
Glenn Carver
Hi,
I'd be grateful if someone could explain the meaning of this error
message to me and whether it indicates a hardware problem or
application software issue:
[node2:11881] OOB: Connection to HNP lost
[node1:09876] OOB: Connection to HNP lost
I have a small cluster which until last week was
On Jul 10, 2007, at 11:32 AM, Ralph H Castain wrote:
On 7/10/07 11:08 AM, "Glenn Carver"
wrote:
Hi,
I'd be grateful if someone could explain the meaning of this error
message to me and whether it indicates a hardware problem or
application software issue:
[node2:118
Hopefully an easy question to answer... is it possible to get at the
values of mca parameters whilst a program is running? What I had in
mind was either an open-mpi function to call which would print the
current values of mca parameters or a function to call for specific
mca parameters. I don
e. Of course you do take the hit of wireup time
for all connections at MPI_Init.
That's a useful tip and may apply in our case as the code
configuration giving us trouble writes a lot of data to process 0 for
disk output.
Thanks,
Glenn
-DON
Brian Barrett wrote:
On Aug
I'd appreciate some advice and help on this one. We're having
serious problems running parallel applications on our cluster. After
each batch job finishes, we lose a certain amount of available
memory. Additional jobs cause free memory to gradually go down until
the machine starts swapping an
-DON
p.s. orte-clean does not exist in the ompi v1.2 branch, it is in the
trunk but I think there is an issue with it currently
Ralph H Castain wrote:
On 8/5/07 6:35 PM, "Glenn Carver"
>>>> wrote:
I'd appreciate some advice and help on this one. We
uot;--mca btl self,tcp"
If this is successful, i.e. frees memory as expected. The next step
would be to run including shared memory, "--mca btl self,sm,tcp". If
this is successful the last step would be to add in udapl, "--mca btl
self,sm,udapl".
-DON
Glenn Carver wrote:
mber of MPI
jobs running simultaneously? Size of the job(s)? Is your code something
you can share? Reproducing what you are seeing is my intent.
-DON
p.s. I will not be checking email or working on this again until the
week of August 27 as I am taking a little vacation.
Glenn Carver wrote:
Don,
Fol
11 matches
Mail list logo