Your patch looks fine to me, so I'll apply it. As for this second issue - good
catch. Yes, if the binding directive was provided in the default MCA param
file, then the proc would attempt to bind itself on startup. The best fix
actually is to just tell them not to do so. We already have that mec
Hi Ralph, I misunderstood the point of the problem.
The problem is that BIND_TO_OBJ is re-tried and done in
orte_ess_base_proc_binding @ ess_base_fns.c, although you try to
BIND_TO_NONE in rmaps_rr_mapper.c when it's oversubscribed.
Furthermore, binding in orte_ess_base_proc_binding does not sup
Hi Ralph, I have tested your fix - 30895. I'm afraid to say
I found a mistake.
You should include "SETTING BIND_TO_NONE" in the above if-clause
at the line 74, 256, 511, 656. Othrewise, just warning message
disappears but binding to core is still overwritten by binding
to none. Pleaes see attach
Hi Ralph, I understood what you meant.
I often use float for our applicatoin.
float c = (float)(unsinged int a - unsinged int b) could
be very huge number, if a < b. So I always carefully cast to
int from unsigned int when I subtract them. I didn't know/mind
inc d = (unsinged int a - unsinged in
Yes, indeed. In future, when we will have many many cores
in the machine, we will have to take care of overrun of
num_procs.
Tetsuya
> Cool - easily modified. Thanks!
>
> Of course, you understand (I'm sure) that the cast does nothing to
protect the code from blowing up if we overrun the var. I
Cool - easily modified. Thanks!
Of course, you understand (I'm sure) that the cast does nothing to protect the
code from blowing up if we overrun the var. In other words, if the unsigned var
has wrapped, then casting it to int won't help - you'll still get a negative
integer, and the code will
Hi Ralph, I'm a litte bit late to your release.
I found a minor mistake in byobj_span -integer casting problem.
--- rmaps_rr_mappers.30892.c2014-03-01 08:31:50 +0900
+++ rmaps_rr_mappers.c 2014-03-01 08:33:22 +0900
@@ -689,7 +689,7 @@
}
/* compute how many objs need an extra pro
Please take a look at https://svn.open-mpi.org/trac/ompi/ticket/4317
On Feb 27, 2014, at 8:13 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph, I can't operate our cluster for a few days, sorry.
>
> But now, I'm narrowing down the cause by browsing the source code.
>
> My best guess is
Hi Ralph, I can't operate our cluster for a few days, sorry.
But now, I'm narrowing down the cause by browsing the source code.
My best guess is the line 529. The opal_hwloc_base_get_obj_by_type will
reset the object pointer to the first one when you move on to the next
node.
529
I'm having trouble seeing why it is failing, so I added some more debug output.
Could you run the failure case again with -mca rmaps_base_verbose 10?
Thanks
Ralph
On Feb 27, 2014, at 6:11 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Just checking the difference, not so significant meaning...
>
Just checking the difference, not so significant meaning...
Anyway, I guess it's due to the behavior when slot counts is missing
(regarded as slots=1) and it's oversubscribed unintentionally.
I'm going out now, so I can't verify it quickly. If I provide the
correct slot counts, it wll work, I g
"restore" in what sense?
On Feb 27, 2014, at 4:10 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph, this is just for your information.
>
> I tried to restore previous orte_rmaps_rr_byobj. Then I gets the result
> below with this command line:
>
> mpirun -np 8 -host node05,node06 -report-b
Hi Ralph, this is just for your information.
I tried to restore previous orte_rmaps_rr_byobj. Then I gets the result
below with this command line:
mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2
-display-map -bind-to core:overload-allowed ~/mis/openmpi/demos/myprog
Data
They have 4 cores/socket and 2 sockets, totally 4 X 2 = 8 cores, each.
Here is the output of lstopo.
mishima@manage round_robin]$ rsh node05
Last login: Tue Feb 18 15:10:15 from manage
[mishima@node05 ~]$ lstopo
Machine (32GB)
NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (6144KB)
L2 L#0
Hmmm..what does your node look like again (sockets and cores)?
On Feb 27, 2014, at 3:19 PM, tmish...@jcity.maeda.co.jp wrote:
>
> Hi Ralph, I'm afraid to say your new "map-by obj" causes another problem.
>
> I have overload message with this command line as shown below:
>
> mpirun -np 8 -host
15 matches
Mail list logo