I gather you have access to bjs? Could you use bjs to get a node allocation, and then send me a printout of the environment? All I need to see is what your environment looks like - how does the system tell you what nodes you have been allocated?
Then we can make something that will solve your problem. Ralph On 11/2/06 1:10 AM, "hpe...@infonie.fr" <hpe...@infonie.fr> wrote: > Thank you for your support Ralf, I really appreciate. > > I have now a better understanding of your very first answer asking if I had a > NODES environment variable. > It was related to the fact that your platform is configured with LSF. > I have read some tutorials about LSF and it seems that LSF provides a "llogin" > command that creates an environment where the NODES variables is permanently > defined. > > Then, under this "llogin" environment, all jobs are automatically allocated to > the nodes defined with NODES. > > This is why, I think, the spawning works fine in this condition. > > Unfortunately, LSF is commercial and then I am not able to install it on my > platform. > I whish I can not do anything more on my side now. > > You proposed to concoct something over the next few days. I look forward to > hearing from you. > > Regards. > > Herve > > > > Date: Tue, 31 Oct 2006 06:53:53 -0700 > From: Ralph H Castain <r...@lanl.gov> > Subject: Re: [OMPI users] MPI_Comm_spawn multiple bproc support > To: "Open MPI Users <us...@open-mpi.org>" <us...@open-mpi.org> > Message-ID: <c16ca381.5759%...@lanl.gov> > Content-Type: text/plain; charset="ISO-8859-1" > > Aha! Thanks for your detailed information - that helps identify the problem. > > See some thoughts below. > Ralph > > > On 10/31/06 3:49 AM, "hpe...@infonie.fr" <hpe...@infonie.fr> wrote: > >> Thank you for you quick reply Ralf, >> >> As far as I know, the NODES environment variable is created when a job is >> submitted to the bjs scheduler. >> The only way I know (but I am a bproc newbe) is to use the bjssub command. > > That is correct. However, Open MPI requires that ALL of the nodes you are > going to use must be allocated in advance. In other words, you have to get > an allocation large enough to run your entire job - both the initial > application and anything you comm_spawn. > > I wish I could help you with the proper bjs commands to get an allocation, > but I am not familiar with bjs and (even after multiple Google searches) > cannot find any documentation on that code. Try doing a "bjs --help" and see > what it says. > >> >> Then, I have retried my test with the following running command: "bjssub -i >> mpirun -np 1 main_exe". >> > > <snip> >> >> I guess, this problem comes from the way I set the parameters to the spawned >> program. Instead of giving instructions to spawn the program on a specific >> host, I should set parameters to spawn the program on a specific node. >> But I do not know how to do it. >> > > What you did was fine. "host" is the correct field to set. I suspect two > possible issues: > > 1. The specified host may not be in the allocation. In the case you showed > here, I would expect it to be since you specified the same host we are > already on. However, you might try running mpirun with the "--nolocal" > option - this will force mpirun to launch the processes on a machine other > than the one you are on (typically you are on the head node. In many bproc > machines, this node is not included in an allocation as the system admins > don't want you running MPI jobs on it). > > 2. We may have something wrong in our code for this case. I'm not sure how > well that has been tested, especially in the 1.1 code branch. > >> Then, I have a bunch of questions: >> - when mpi is used together with bproc, is it necessary to use bjssub or bjs >> in general ? > > You have to use some kind of resource manager to obtain a node allocation > for your use. At our site, we use LSF - other people use bjs. Anything that > sets the NODE variable is fine. > >> - I was wondering if I had to submit to bjs the spawned program ? i.e do I >> have to add 'bjssub' to the commands parameter of the MPI_Comm_spawn_mutliple >> call ? > > You shouldn't have to do so. I suspect, however, that bjssub is not getting > a large enough allocation for your combined mpirun + spawned job. I'm not > familiar enough with bjs to know for certain. >> >> As you can see, I am still not able to spawn a program and need some more >> help >> ? >> Do you have a some examples describing how to do it ? > > Unfortunately, not in the 1.1 branch, nor do I have one for > comm_spawn_multiple that uses the "host" field. I can try to concoct > something over the next few days, though, and verify that our code is > working correctly. > > > > --------------------- ALICE SECURITE ENFANTS --------------------- > Protégez vos enfants des dangers d'Internet en installant Sécurité Enfants, le > contrôle parental d'Alice. > http://www.aliceadsl.fr/securitepc/default_copa.asp > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users