Re: [OMPI users] mpiexec hangs - new install
On Sat, 24 Jul 2010 16:31:12 -0700, Ralph Castain wrote: PS: Hate to kvetch, but wouldn't it save a lot of wasted time if basic problems like this were addressed in the FAQ? Yes, it probably should be. However, a simple search for "firewall" on the user mailing list provides lots of info on how to deal with this issue. I have to disagree. I did search for "firewall", and what came back was numerous variations on the theme of "get your sysadmin to do xxx to the firewall". No doubt that's a very useful solution if you're working on a corporate/university network where you actually HAVE a sysadmin person to do these sorts of chores for you, but not much help at all for people working on their own systems. James
Re: [OMPI users] mpiexec hangs - new install
On Sat, 24 Jul 2010 16:31:12 -0700, Ralph Castain wrote: On Jul 24, 2010, at 4:40 PM, James wrote: OK, that's the problem. I turned the firewall off on both machines, and it works. Now the question: how do I fix it? I searched through the archives, and found that it seems to be a pretty common problem. Unfortunately, I didn't see a solution that I could understand. (I'm not a sysadmin, just a person trying to do some programming.) I have a couple of machines on a local net, with IP addresses in the 192.168.10.1xx range. There's a router at 192.168.10.1, which is connected to the internet via a cable mode. So how do I set up my system so my local machines can do whatever talking between themselves that's needed by OpenMPI, while still having a firewall between my system and the outside world? Here's what seems to be a solution that works for SuSE. May be something similar for other systems: 1) Edit the file /etc/sysconfig/SuseFirewall2 2) Look for the keyword FW_TRUSTED_NETS 3) Add the IP addresses of your internal machines there. The format for multiple machines is wierd: "192.168.10.0/8" means all machines in 192.168.10.x. There doesn't seem to be any way to specify a numeric range, like .100 to .110. 4) Add the lines FW_SERVICES_TRUSTED_TCP="1:65535" and FW_SERVICES_TRUSTED_UDP="1:65535" 5) Save the file. Bring up Yast2, and use it to stop and restart the firewall. Hope this is useful, as it took about 10-15 hours of my time, spread over a week or so, to figure it out. Most routers provide their own internal-to-external firewall - you might check its setup and see. If it does, then you don't need to also have one on your individual machines. Seems to be the same problem as with the firewalls on the machines. That is, there appears to be a firewall, but the little information in the manual or online help assumes that you already have an expert sysadmin level of knowledge. James
[OMPI users] OpenMPI Segmentation fault (11)
Dear All, I run a 6 parallel processes on OpenMPI. When the run-time of the program is short, it works well. But, if the run-time is long, I got errors: [n124:45521] *** Process received signal ***[n124:45521] Signal: Segmentation fault (11)[n124:45521] Signal code: Address not mapped (1)[n124:45521] Failing at address: 0x44[n124:45521] [ 0] /lib64/libpthread.so.0 [0x3c50e0e4c0][n124:45521] [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c50278d60][n124:45521] [ 2] /lib64/libc.so.6(_IO_vfprintf+0x4479) [0x3c50246b19][n124:45521] [ 3] /lib64/libc.so.6(_IO_printf+0x9a) [0x3c5024d3aa][n124:45521] [ 4] /home/path/exec [0x40ec9a][n124:45521] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974][n124:45521] [ 6] /home/path/exec [0x401139][n124:45521] *** End of error message *** It seems that there may be some problems about memory management. But, I cannot find the reason. My program needs to write results to some files. If I open the files too many without closing them, I may get the above errors. But, I have removed the writing files from my program. The problem appears again when the program runs longer time. Any help is appreciated. Jack July 25 2010 _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
Re: [OMPI users] mpiexec hangs - new install
> Here's what seems to be a solution that works for SuSE. May be > something similar for other systems: > >1) Edit the file /etc/sysconfig/SuseFirewall2 >2) Look for the keyword FW_TRUSTED_NETS >3) Add the IP addresses of your internal machines there. The format > for multiple machines is wierd: "192.168.10.0/8" means all machines > in 192.168.10.x. There doesn't seem to be any way to specify a > numeric range, like .100 to .110. Not a SUSE man and won't go into a full treatise on subnets and netmasks but ... 192.168.10.0/8 actually means anything that has 192. at the start, so you have opened things up slightly more widely than you may have thought. I recall you said you had machines numbered 192.168.10.1xx ? If so, then 192.168.10.0/24 ("slash 24") would be slightly better for you than "slash 8" as that at least narrows things down to all numeric addresses starting with: 192.168.10. If you just wanted to "trust" to a single machine then this: 192.168.10.100/32 represents, in the syntax you have already seen in use, the single machine, 192.168.10.100. Without wishing to make too many guesses as to what FW_TRUSTED_NETS is doing but assuming that you can assign more than one "netmask" in there and armed with the info above, you could add all of your own machines individually by making: FW_TRUSTED_NETS take the values (three machine range, 101 -> 103 here) 192.168.10.100/32192.168.10.101/32192.168.10.102/32 and so on: basically, treating each machine as a trusted "network" of one machine. Again, the way one assigns multiple "netmasks" to FW_TRUSTED_NETS is left to you to discover but I'm sure you will be able to do that. It might be a better, without being the best, way to do what you want, or rather, to not do what you didn't want, to do. -- Kevin M. Buckley Room: CO327 School of Engineering and Phone: +64 4 463 5971 Computer Science Victoria University of Wellington New Zealand