Re: [OMPI users] new map-by-obj has a problem

2014-03-03 Thread Ralph Castain
Your patch looks fine to me, so I'll apply it. As for this second issue - good catch. Yes, if the binding directive was provided in the default MCA param file, then the proc would attempt to bind itself on startup. The best fix actually is to just tell them not to do so. We already have that mec

Re: [OMPI users] new map-by-obj has a problem

2014-03-03 Thread tmishima
Hi Ralph, I misunderstood the point of the problem. The problem is that BIND_TO_OBJ is re-tried and done in orte_ess_base_proc_binding @ ess_base_fns.c, although you try to BIND_TO_NONE in rmaps_rr_mapper.c when it's oversubscribed. Furthermore, binding in orte_ess_base_proc_binding does not sup

Re: [OMPI users] new map-by-obj has a problem

2014-03-02 Thread tmishima
Hi Ralph, I have tested your fix - 30895. I'm afraid to say I found a mistake. You should include "SETTING BIND_TO_NONE" in the above if-clause at the line 74, 256, 511, 656. Othrewise, just warning message disappears but binding to core is still overwritten by binding to none. Pleaes see attach

Re: [OMPI users] new map-by-obj has a problem

2014-02-28 Thread tmishima
Hi Ralph, I understood what you meant. I often use float for our applicatoin. float c = (float)(unsinged int a - unsinged int b) could be very huge number, if a < b. So I always carefully cast to int from unsigned int when I subtract them. I didn't know/mind inc d = (unsinged int a - unsinged in

Re: [OMPI users] new map-by-obj has a problem

2014-02-28 Thread tmishima
Yes, indeed. In future, when we will have many many cores in the machine, we will have to take care of overrun of num_procs. Tetsuya > Cool - easily modified. Thanks! > > Of course, you understand (I'm sure) that the cast does nothing to protect the code from blowing up if we overrun the var. I

Re: [OMPI users] new map-by-obj has a problem

2014-02-28 Thread Ralph Castain
Cool - easily modified. Thanks! Of course, you understand (I'm sure) that the cast does nothing to protect the code from blowing up if we overrun the var. In other words, if the unsigned var has wrapped, then casting it to int won't help - you'll still get a negative integer, and the code will

Re: [OMPI users] new map-by-obj has a problem

2014-02-28 Thread tmishima
Hi Ralph, I'm a litte bit late to your release. I found a minor mistake in byobj_span -integer casting problem. --- rmaps_rr_mappers.30892.c2014-03-01 08:31:50 +0900 +++ rmaps_rr_mappers.c 2014-03-01 08:33:22 +0900 @@ -689,7 +689,7 @@ } /* compute how many objs need an extra pro

Re: [OMPI users] new map-by-obj has a problem

2014-02-28 Thread Ralph Castain
Please take a look at https://svn.open-mpi.org/trac/ompi/ticket/4317 On Feb 27, 2014, at 8:13 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, I can't operate our cluster for a few days, sorry. > > But now, I'm narrowing down the cause by browsing the source code. > > My best guess is

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread tmishima
Hi Ralph, I can't operate our cluster for a few days, sorry. But now, I'm narrowing down the cause by browsing the source code. My best guess is the line 529. The opal_hwloc_base_get_obj_by_type will reset the object pointer to the first one when you move on to the next node. 529

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread Ralph Castain
I'm having trouble seeing why it is failing, so I added some more debug output. Could you run the failure case again with -mca rmaps_base_verbose 10? Thanks Ralph On Feb 27, 2014, at 6:11 PM, tmish...@jcity.maeda.co.jp wrote: > > > Just checking the difference, not so significant meaning... >

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread tmishima
Just checking the difference, not so significant meaning... Anyway, I guess it's due to the behavior when slot counts is missing (regarded as slots=1) and it's oversubscribed unintentionally. I'm going out now, so I can't verify it quickly. If I provide the correct slot counts, it wll work, I g

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread Ralph Castain
"restore" in what sense? On Feb 27, 2014, at 4:10 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, this is just for your information. > > I tried to restore previous orte_rmaps_rr_byobj. Then I gets the result > below with this command line: > > mpirun -np 8 -host node05,node06 -report-b

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread tmishima
Hi Ralph, this is just for your information. I tried to restore previous orte_rmaps_rr_byobj. Then I gets the result below with this command line: mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2 -display-map -bind-to core:overload-allowed ~/mis/openmpi/demos/myprog Data

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread tmishima
They have 4 cores/socket and 2 sockets, totally 4 X 2 = 8 cores, each. Here is the output of lstopo. mishima@manage round_robin]$ rsh node05 Last login: Tue Feb 18 15:10:15 from manage [mishima@node05 ~]$ lstopo Machine (32GB) NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (6144KB) L2 L#0

Re: [OMPI users] new map-by-obj has a problem

2014-02-27 Thread Ralph Castain
Hmmm..what does your node look like again (sockets and cores)? On Feb 27, 2014, at 3:19 PM, tmish...@jcity.maeda.co.jp wrote: > > Hi Ralph, I'm afraid to say your new "map-by obj" causes another problem. > > I have overload message with this command line as shown below: > > mpirun -np 8 -host