subject:"\[OMPI users\] Dynamic processes connection and segfault on MPI_Comm

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-28 Thread Edgar Gabriel

hm, this looks actually correct. The question now basically is, why the intermediate hand-shake by the processes with rank 0 on the inter-communicator is not finishing. I am wandering whether this could be related to a problem reported in another thread (Processes stuck after MPI_Waitall() in 1.4.1

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-28 Thread Grzegorz Maj

I've attached gdb to the client which has just connected to the grid. Its bt is almost exactly the same as the server's one: #0 0x428066d7 in sched_yield () from /lib/libc.so.6 #1 0x00933cbf in opal_progress () at ../../opal/runtime/opal_progress.c:220 #2 0x00d460b8 in opal_condition_wait (c=0xd

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Edgar Gabriel

based on your output shown here, there is absolutely nothing wrong (yet). Both processes are in the same function and do what they are supposed to do. However, I am fairly sure that the client process bt that you show is already part of current_intracomm. Could you try to create a bt of the proces

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Ralph Castain

This slides outside of my purview - I would suggest you post this question with a different subject line specifically mentioning failure of intercomm_merge to work so it attracts the attention of those with knowledge of that area. On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote: > So now I hav

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Grzegorz Maj

So now I have a new question. When I run my server and a lot of clients on the same machine, everything looks fine. But when I try to run the clients on several machines the most frequent scenario is: * server is stared on machine A * X (= 1, 4, 10, ..) clients are started on machine B and they co

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-26 Thread Ralph Castain

No problem at all - glad it works! On Jul 26, 2010, at 7:58 AM, Grzegorz Maj wrote: > Hi, > I'm very sorry, but the problem was on my side. My installation > process was not always taking the newest sources of openmpi. In this > case it hasn't installed the version with the latest patch. Now I >

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-26 Thread Grzegorz Maj

Hi, I'm very sorry, but the problem was on my side. My installation process was not always taking the newest sources of openmpi. In this case it hasn't installed the version with the latest patch. Now I think everything works fine - I could run over 130 processes with no problems. I'm sorry again t

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-21 Thread Ralph Castain

We're having some problem replicating this once my patches are applied. Can you send us your configure cmd? Just the output from "head config.log" will do for now. Thanks! On Jul 20, 2010, at 9:09 AM, Grzegorz Maj wrote: > My start script looks almost exactly the same as the one published by >

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-20 Thread Grzegorz Maj

My start script looks almost exactly the same as the one published by Edgar, ie. the processes are starting one by one with no delay. 2010/7/20 Ralph Castain : > Grzegorz: something occurred to me. When you start all these processes, how > are you staggering their wireup? Are they flooding us, or

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-20 Thread Ralph Castain

Grzegorz: something occurred to me. When you start all these processes, how are you staggering their wireup? Are they flooding us, or are you time-shifting them a little? On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote: > Hm, so I am not sure how to approach this. First of all, the test case

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-19 Thread Edgar Gabriel

Hm, so I am not sure how to approach this. First of all, the test case works for me. I used up to 80 clients, and for both optimized and non-optimized compilation. I ran the tests with trunk (not with 1.4 series, but the communicator code is identical in both cases). Clearly, the patch from Ralph i

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-13 Thread Ralph Castain

As far as I can tell, it appears the problem is somewhere in our communicator setup. The people knowledgeable on that area are going to look into it later this week. I'm creating a ticket to track the problem and will copy you on it. On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote: > > On J

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-13 Thread Ralph Castain

On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote: > Bad news.. > I've tried the latest patch with and without the prior one, but it > hasn't changed anything. I've also tried using the old code but with > the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also didn't > help. > While lookin

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-13 Thread Grzegorz Maj

Bad news.. I've tried the latest patch with and without the prior one, but it hasn't changed anything. I've also tried using the old code but with the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also didn't help. While looking through the sources of openmpi-1.4.2 I couldn't find any call

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-12 Thread Ralph Castain

Just so you don't have to wait for 1.4.3 release, here is the patch (doesn't include the prior patch). dpm.diff Description: Binary data On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote: > 2010/7/12 Ralph Castain : >> Dug around a bit and found the problem!! >> >> I have no idea who or why

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-12 Thread Grzegorz Maj

2010/7/12 Ralph Castain : > Dug around a bit and found the problem!! > > I have no idea who or why this was done, but somebody set a limit of 64 > separate jobids in the dynamic init called by ompi_comm_set, which builds the > intercommunicator. Unfortunately, they hard-wired the array size, but

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-12 Thread Ralph Castain

Dug around a bit and found the problem!! I have no idea who or why this was done, but somebody set a limit of 64 separate jobids in the dynamic init called by ompi_comm_set, which builds the intercommunicator. Unfortunately, they hard-wired the array size, but never check that size before addin

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-12 Thread Grzegorz Maj

1024 is not the problem: changing it to 2048 hasn't change anything. Following your advice I've run my process using gdb. Unfortunately I didn't get anything more than: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xf7e4c6c0 (LWP 20246)] 0xf7f39905 in ompi_comm_set ()

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-07 Thread Ralph Castain

I would guess the #files limit of 1024. However, if it behaves the same way when spread across multiple machines, I would suspect it is somewhere in your program itself. Given that the segfault is in your process, can you use gdb to look at the core file and see where and why it fails? On Jul 7

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-07 Thread Grzegorz Maj

2010/7/7 Ralph Castain : > > On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: > >> Hi Ralph, >> sorry for the late response, but I couldn't find free time to play >> with this. Finally I've applied the patch you prepared. I've launched >> my processes in the way you've described and I think it's wor

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-07 Thread Ralph Castain

On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: > Hi Ralph, > sorry for the late response, but I couldn't find free time to play > with this. Finally I've applied the patch you prepared. I've launched > my processes in the way you've described and I think it's working as > you expected. None of m

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-06 Thread Grzegorz Maj

Hi Ralph, sorry for the late response, but I couldn't find free time to play with this. Finally I've applied the patch you prepared. I've launched my processes in the way you've described and I think it's working as you expected. None of my processes runs the orted daemon and they can perform MPI o

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-24 Thread Ralph Castain

Actually, OMPI is distributed with a daemon that does pretty much what you want. Checkout "man ompi-server". I originally wrote that code to support cross-application MPI publish/subscribe operations, but we can utilize it here too. Have to blame me for not making it more publicly known.The attache

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-24 Thread Krzysztof Zarzycki

Hi Ralph, I'm Krzysztof and I'm working with Grzegorz Maj on this our small project/experiment. We definitely would like to give your patch a try. But could you please explain your solution a little more? You still would like to start one mpirun per mpi grid, and then have processes started by us

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-23 Thread Ralph Castain

In thinking about this, my proposed solution won't entirely fix the problem - you'll still wind up with all those daemons. I believe I can resolve that one as well, but it would require a patch. Would you like me to send you something you could try? Might take a couple of iterations to get it r

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-23 Thread Ralph Castain

HmmmI -think- this will work, but I cannot guarantee it: 1. launch one process (can just be a spinner) using mpirun that includes the following option: mpirun -report-uri file where file is some filename that mpirun can create and insert its contact info into it. This can be a relative or

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-23 Thread Grzegorz Maj

To be more precise: by 'server process' I mean some process that I could run once on my system and it could help in creating those groups. My typical scenario is: 1. run N separate processes, each without mpirun 2. connect them into MPI group 3. do some job 4. exit all N processes 5. goto 1 2010/4

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-23 Thread Grzegorz Maj

Thank you Ralph for your explanation. And, apart from that descriptors' issue, is there any other way to solve my problem, i.e. to run separately a number of processes, without mpirun and then to collect them into an MPI intracomm group? If I for example would need to run some 'server process' (eve

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-17 Thread Ralph Castain

Okay, but here is the problem. If you don't use mpirun, and are not operating in an environment we support for "direct" launch (i.e., starting processes outside of mpirun), then every one of those processes thinks it is a singleton - yes? What you may not realize is that each singleton immediat

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-17 Thread Grzegorz Maj

Yes, I know. The problem is that I need to use some special way for running my processes provided by the environment in which I'm working and unfortunately I can't use mpirun. 2010/4/18 Ralph Castain : > Guess I don't understand why you can't use mpirun - all it does is start > things, provide a

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-17 Thread Ralph Castain

Guess I don't understand why you can't use mpirun - all it does is start things, provide a means to forward io, etc. It mainly sits there quietly without using any cpu unless required to support the job. Sounds like it would solve your problem. Otherwise, I know of no way to get all these proce

[OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-04-17 Thread Grzegorz Maj

Hi, I'd like to dynamically create a group of processes communicating via MPI. Those processes need to be run without mpirun and create intracommunicator after the startup. Any ideas how to do this efficiently? I came up with a solution in which the processes are connecting one by one using MPI_Com

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

[OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

32 matches

Site Navigation

Mail list logo

Footer information