Re: [OMPI users] delay in launch?

2009-01-15 Thread Reuti
Am 15.01.2009 um 16:20 schrieb Jeff Dusenberry: I'm trying to launch multiple xterms under OpenMPI 1.2.8 and the SGE job scheduler for purposes of running a serial debugger. I'm experiencing file-locking problems on the .Xauthority file. I tried to fix this by asking for a delay between su

Re: [OMPI users] mpirun (signal 15 Termination)

2009-01-15 Thread Hana Milani
Dear all, 1. I have not run it with debugger, could you tell me how to do it? 2. How can I make sure that it is or it is not killing my job. siorry if my questions seems wierd. But I have to solve the problem immediately. Thanks for helping me

Re: [OMPI users] Problem with openmpi and infiniband

2009-01-15 Thread Biagio Lucini
Jeff Squyres wrote: On Jan 7, 2009, at 6:28 PM, Biagio Lucini wrote: [[5963,1],13][btl_openib_component.c:2893:handle_wc] from node24 to: node11 error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0 Ah! If we're dealing a

[OMPI users] delay in launch?

2009-01-15 Thread Jeff Dusenberry
I'm trying to launch multiple xterms under OpenMPI 1.2.8 and the SGE job scheduler for purposes of running a serial debugger. I'm experiencing file-locking problems on the .Xauthority file. I tried to fix this by asking for a delay between successive launches, to reduce the chances of content

[OMPI users] Timeout problem

2009-01-15 Thread Gabriele Fatigati
Dear OpenMPI developers, I'm running my MPI application over Infiniband connection net over 128 processors. During the execution my application, i get a strange time out error: checkPAMRESActionTab: action 63 connecting to RES on host timed out after 200 seconds Is a net problem or an applicatio

Re: [OMPI users] mpirun (signal 15 Termination)

2009-01-15 Thread Jeff Squyres
Have you checked to ensure that the job manager is not killing your job? As I mentioned yesterday, SIGTERM is usually when some external agent kills your job. On Jan 15, 2009, at 3:39 AM, Hana Milani wrote: please tell me how to get rid of the message and how to run the parallel job? I

Re: [OMPI users] mpirun (signal 15 Termination)

2009-01-15 Thread jody
Without any details it's difficult to make a diagnosis, but it looks like one of your processes crashes, perhaps from a segmentation fault . Have you run it with a debugger? Jody On Thu, Jan 15, 2009 at 9:39 AM, Hana Milani wrote: > please tell me how to get rid of the message and how to run th

Re: [OMPI users] mpirun (signal 15 Termination)

2009-01-15 Thread Hana Milani
please tell me how to get rid of the message and how to run the parallel job? I have another code running directly by mpirun without a problem, but this one that needed blacs and scalapack is palying with me. please if there is any solution let me have it. Regards, hana

Re: [OMPI users] mpirun (signal 15 Termination) urgent

2009-01-15 Thread Hana Milani
Hello Simon, For running the program in parallel, I write: mpirun -np 4 ~/program output It takes a second that I receive the message: mpirun noticed that job rank 0 with PID 9477 on node linux-4pel exited on signal 15 (Terminated). and at the end of the output file, I receive: "3 additiona