Hi Ralph,
I confirmed that it worked quite well for my purpose.
Thank you very much.
I would point out just a small thing. Since the debug
information in the rank-file block is useful even
when a host is initially detected, OPAL_OUTPUT_VERBOSE
in the line 302 should be out of the else-clause as
On Jan 19, 2014, at 1:36 AM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Thank you for your fix. I will try it tomorrow.
>
> Before that, although I could not understand everything,
> let me ask some questions about the new hostfile.c.
>
> 1. The line 244-248 is included in else-clause, which mig
Thank you for your fix. I will try it tomorrow.
Before that, although I could not understand everything,
let me ask some questions about the new hostfile.c.
1. The line 244-248 is included in else-clause, which might cause
memory leak(it seems to me). Should it be out of the clause?
244
I believe I now have this working correctly on the trunk and setup for 1.7.4.
If you get a chance, please give it a try and confirm it solves the problem.
Thanks
Ralph
On Jan 17, 2014, at 2:16 PM, Ralph Castain wrote:
> Sorry for delay - I understood and was just occupied with something else f
Sorry for delay - I understood and was just occupied with something else for a
while. Thanks for the follow-up. I'm looking at the issue and trying to
decipher the right solution.
On Jan 17, 2014, at 2:00 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph,
>
> I'm sorry that my explanatio
Hi Ralph,
I'm sorry that my explanation was not enough ...
This is the summary of my situation:
1. I create a hostfile as shown below manually.
2. I use mpirun to start the job without Torque, which means I'm running in
an un-managed environment.
3. Firstly, ORTE detects 8 slots on each host(
No, I didn't use Torque this time.
This issue is caused only when it is not in the managed
environment - namely, orte_managed_allocation is false
(and orte_set_slots is NULL).
Under the torque management, it works fine.
I hope you can understand the situation.
Tetsuya Mishima
> I'm sorry, bu
I'm sorry, but I'm really confused, so let me try to understand the situation.
You use Torque to get an allocation, so you are running in a managed
environment.
You then use mpirun to start the job, but pass it a hostfile as shown below.
Somehow, ORTE believes that there is only one slot on eac