Re: [OMPI users] mpiexec hangs - new install

2010-07-25 Thread James

On Sat, 24 Jul 2010 16:31:12 -0700, Ralph Castain  wrote:


PS: Hate to kvetch, but wouldn't it save a lot of wasted time if basic
problems like this were addressed in the FAQ?


Yes, it probably should be. However, a simple search for "firewall" on  
the user mailing list provides lots of info on how to deal with this  
issue.


I have to disagree.  I did search for "firewall", and what came back was
numerous variations on the theme of "get your sysadmin to do xxx to the
firewall".  No doubt that's a very useful solution if you're working on
a corporate/university network where you actually HAVE a sysadmin person
to do these sorts of chores for you, but not much help at all for people
working on their own systems.

James


Re: [OMPI users] mpiexec hangs - new install

2010-07-25 Thread James

On Sat, 24 Jul 2010 16:31:12 -0700, Ralph Castain  wrote:


On Jul 24, 2010, at 4:40 PM, James wrote:


OK, that's the problem.  I turned the firewall off on both machines, and
it works.

Now the question: how do I fix it?  I searched through the archives, and
found that it seems to be a pretty common problem.  Unfortunately, I  
didn't
see a solution that I could understand.  (I'm not a sysadmin, just a  
person

trying to do some programming.)

I have a couple of machines on a local net, with IP addresses in the
192.168.10.1xx range.  There's a router at 192.168.10.1, which is  
connected

to the internet via a cable mode.  So how do I set up my system so my
local machines can do whatever talking between themselves that's needed  
by

OpenMPI, while still having a firewall between my system and the outside
world?


Here's what seems to be a solution that works for SuSE.  May be something
similar for other systems:

  1) Edit the file /etc/sysconfig/SuseFirewall2
  2) Look for the keyword FW_TRUSTED_NETS
  3) Add the IP addresses of your internal machines there.  The format
 for multiple machines is wierd: "192.168.10.0/8" means all machines
 in 192.168.10.x.  There doesn't seem to be any way to specify a  
numeric

 range, like .100 to .110.
  4) Add the lines FW_SERVICES_TRUSTED_TCP="1:65535" and
 FW_SERVICES_TRUSTED_UDP="1:65535"
  5) Save the file.  Bring up Yast2, and use it to stop and restart the  
firewall.


Hope this is useful, as it took about 10-15 hours of my time, spread over a
week or so, to figure it out.

Most routers provide their own internal-to-external firewall - you might  
check its setup and see. If it does, then you don't need to also have  
one on your individual machines.


Seems to be the same problem as with the firewalls on the machines.  That
is, there appears to be a firewall, but the little information in the
manual or online help assumes that you already have an expert sysadmin
level of knowledge.

James


[OMPI users] OpenMPI Segmentation fault (11)

2010-07-25 Thread Jack Bryan

Dear All,
I run a 6 parallel processes on OpenMPI. 
When the run-time of the program is short, it works well.
But, if the run-time is long, I got errors: 
[n124:45521] *** Process received signal ***[n124:45521] Signal: Segmentation 
fault (11)[n124:45521] Signal code: Address not mapped (1)[n124:45521] Failing 
at address: 0x44[n124:45521] [ 0] /lib64/libpthread.so.0 
[0x3c50e0e4c0][n124:45521] [ 1] /lib64/libc.so.6(strlen+0x10) 
[0x3c50278d60][n124:45521] [ 2] /lib64/libc.so.6(_IO_vfprintf+0x4479) 
[0x3c50246b19][n124:45521] [ 3] /lib64/libc.so.6(_IO_printf+0x9a) 
[0x3c5024d3aa][n124:45521] [ 4] /home/path/exec [0x40ec9a][n124:45521] [ 5] 
/lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974][n124:45521] [ 6] 
/home/path/exec [0x401139][n124:45521] *** End of error message ***
It seems that there may be some problems about memory management. 
But, I cannot find the reason. 
My program needs to write results to some files. 
If I open the files too many without closing them, I may get the above errors. 
But, I have removed the writing files from my program. 
The problem appears again when the program runs longer time. 
Any help is appreciated. 
Jack
July 25  2010
  
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

Re: [OMPI users] mpiexec hangs - new install

2010-07-25 Thread Kevin . Buckley

> Here's what seems to be a solution that works for SuSE.  May be
> something similar for other systems:
>
>1) Edit the file /etc/sysconfig/SuseFirewall2
>2) Look for the keyword FW_TRUSTED_NETS
>3) Add the IP addresses of your internal machines there.  The format
>   for multiple machines is wierd: "192.168.10.0/8" means all machines
>   in 192.168.10.x.  There doesn't seem to be any way to specify a
>   numeric range, like .100 to .110.

Not a SUSE man and won't go into a full treatise on subnets
and netmasks but ...

192.168.10.0/8 actually means anything that has 192. at the start,
so you have opened things up slightly more widely than you may have
thought.

I recall you said you had machines numbered 192.168.10.1xx ?

If so, then 192.168.10.0/24 ("slash 24") would be slightly better
for you than "slash 8" as that at least narrows things down to all
numeric addresses starting with:

192.168.10.

If you just wanted to "trust" to a single machine then this:

192.168.10.100/32

represents, in the syntax you have already seen in use, the single
machine, 192.168.10.100.

Without wishing to make too many guesses as to what FW_TRUSTED_NETS
is doing but assuming that you can assign more than one "netmask" in
there and armed with the info above, you could add all of your own
machines individually by making:

FW_TRUSTED_NETS

take the values (three machine range, 101 -> 103  here)

192.168.10.100/32192.168.10.101/32192.168.10.102/32

and so on: basically, treating each machine as a trusted "network"
of one machine.

Again, the way one assigns multiple "netmasks" to FW_TRUSTED_NETS is
left to you to discover but I'm sure you will be able to do that.

It might be a better, without being the best, way to do what you
want, or rather, to not do what you didn't want, to do.

-- 
Kevin M. Buckley  Room:  CO327
School of Engineering and Phone: +64 4 463 5971
 Computer Science
Victoria University of Wellington
New Zealand