date:20140118

Re: [OMPI users] How to use non-primitive types with Java binding

2014-01-18 Thread Siegmar Gross

Hi,

> Anyway I wonder if there are some samples illustrating the use
> of complex structures in OpenMPI

I'm not sure if my small programs are helpful, but perhaps you get
an idea how to use the new interface.


Kind regards

Siegmar


BcastStructMain.java
Description: BcastStructMain.java


BcastStructArrayMain.java
Description: BcastStructArrayMain.java


ScatterStructArrayMain.java
Description: ScatterStructArrayMain.java


MyStruct.java
Description: MyStruct.java

Re: [OMPI users] How to use non-primitive types with Java binding

2014-01-18 Thread Saliya Ekanayake

Thank you, this is exactly what I was hoping for!!

Saliya
On Jan 18, 2014 2:40 AM, "Siegmar Gross" <
siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
>
> > Anyway I wonder if there are some samples illustrating the use
> > of complex structures in OpenMPI
>
> I'm not sure if my small programs are helpful, but perhaps you get
> an idea how to use the new interface.
>
>
> Kind regards
>
> Siegmar
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-18 Thread Ralph Castain

I believe I now have this working correctly on the trunk and setup for 1.7.4. 
If you get a chance, please give it a try and confirm it solves the problem.

Thanks
Ralph

On Jan 17, 2014, at 2:16 PM, Ralph Castain  wrote:

> Sorry for delay - I understood and was just occupied with something else for 
> a while. Thanks for the follow-up. I'm looking at the issue and trying to 
> decipher the right solution.
> 
> 
> On Jan 17, 2014, at 2:00 PM, tmish...@jcity.maeda.co.jp wrote:
> 
>> 
>> 
>> Hi Ralph,
>> 
>> I'm sorry that my explanation was not enough ...
>> This is the summary of my situation:
>> 
>> 1. I create a hostfile as shown below manually.
>> 
>> 2. I use mpirun to start the job without Torque, which means I'm running in
>> an un-managed environment.
>> 
>> 3. Firstly, ORTE detects 8 slots on each host(maybe in
>> "orte_ras_base_allocate").
>>   node05: slots=8 max_slots=0 slots_inuse=0
>>   node06: slots=8 max_slots=0 slots_inuse=0
>> 
>> 4. Then, the code I identified is resetting the slot counts.
>>   node05: slots=1 max_slots=0 slots_inuse=0
>>   node06: slots=1 max_slots=0 slots_inuse=0
>> 
>> 5. Therefore, ORTE believes that there is only one slot on each host.
>> 
>> Regards,
>> Tetsuya Mishima
>> 
>>> No, I didn't use Torque this time.
>>> 
>>> This issue is caused only when it is not in the managed
>>> environment - namely, orte_managed_allocation is false
>>> (and orte_set_slots is NULL).
>>> 
>>> Under the torque management, it works fine.
>>> 
>>> I hope you can understand the situation.
>>> 
>>> Tetsuya Mishima
>>> 
 I'm sorry, but I'm really confused, so let me try to understand the
>>> situation.
 
 You use Torque to get an allocation, so you are running in a managed
>>> environment.
 
 You then use mpirun to start the job, but pass it a hostfile as shown
>>> below.
 
 Somehow, ORTE believes that there is only one slot on each host, and
>> you
>>> believe the code you've identified is resetting the slot counts.
 
 Is that a correct summary of the situation?
 
 Thanks
 Ralph
 
 On Jan 16, 2014, at 4:00 PM, tmish...@jcity.maeda.co.jp wrote:
 
> 
> Hi Ralph,
> 
> I encountered the hostfile issue again where slots are counted by
> listing the node multiple times. This should be fixed by r29765
> - Fix hostfile parsing for the case where RMs count slots 
> 
> The difference is using RM or not. At that time, I executed mpirun
>>> through
> Torque manager. This time I executed it directly from command line as
> shown at the bottom, where node05,06 has 8 cores.
> 
> Then, I checked source files arroud it and found that the line
>> 151-160
>>> in
> plm_base_launch_support.c caused this issue. As node->slots is
>> already
> counted in hostfile.c @ r29765 even when node->slots_given is false,
> I think this part of plm_base_launch_support.c would be unnecesarry.
> 
> orte/mca/plm/base/plm_base_launch_support.c @ 30189:
> 151 } else {
> 152 /* set any non-specified slot counts to 1 */
> 153 for (i=0; i < orte_node_pool->size; i++) {
> 154 if (NULL == (node =
> (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) {
> 155 continue;
> 156 }
> 157 if (!node->slots_given) {
> 158 node->slots = 1;
> 159 }
> 160 }
> 161 }
> 
> Removing this part, it works very well, where the function of
> orte_set_default_slots is still alive. I think this would be better
>> for
> the compatible extention of openmpi-1.7.3.
> 
> Regards,
> Tetsuya Mishima
> 
> [mishima@manage work]$ cat pbs_hosts
> node05
> node05
> node05
> node05
> node05
> node05
> node05
> node05
> node06
> node06
> node06
> node06
> node06
> node06
> node06
> node06
> [mishima@manage work]$ mpirun -np 4 -hostfile pbs_hosts
>> -cpus-per-proc
>>> 4
> -report-bindings myprog
> [node05.cluster:22287] MCW rank 2 bound to socket 1[core 4[hwt 0]],
>>> socket
> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B]
> [node05.cluster:22287] MCW rank 3 is not bound (or bound to all
>>> available
> processors)
> [node05.cluster:22287] MCW rank 0 bound to socket 0[core 0[hwt 0]],
>>> socket
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.]
> [node05.cluster:22287] MCW rank 1 is not bound (or bound to all
>>> available
> processors)
> Hello world from process 0 of 4
> Hello world from process 1 of 4
> Hello world from process 3 of 4
> Hello world from process 2 of 4
> 
> ___
> users ma

[OMPI users] random error bugging me..

2014-01-18 Thread thomas . forde

Hi

I have had a running cluster going good for a while, and 2 days ago we 
decided to upgrade it from 128 to 256 cores.

Most om my deployment of nodes goes through cobbler and scripting, and it 
has worked fine before.on the first 8 nodes.

But after adding new nodes, everything is fucked up and i have no idea 
why:(

#*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was 
invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[dpn10.cfd.local:14994] Local abort after MPI_FINALIZE completed 
successfully; not able to aggregate error messages, and not able to 
guarantee that all other processes were killed!
*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
#

The random strange issue that if i launch 8 32core jobs, 3 end of running, 
while the other 5 dies with this error, and its even using a few of new 
nodes in the job.

Any idea what is causing it?, its so random i dont know where to start..


./Thomas















Denne e-posten kan innehalde informasjon som er konfidensiell 
og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har 
adgang 
til å lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. 
Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr 
e-post, slett denne e-posten med vedlegg og makuler samtlige utskrifter og 
kopiar av den.

This e-mail may contain confidential information, or otherwise 
be protected against unauthorised use. Any disclosure, distribution or 
other use of the information by anyone but the intended recipient is 
strictly prohibited. 
If you have received this e-mail in error, please advise the sender by 
immediate reply and destroy the received documents and any copies hereof.


PBefore 
printing, think about the environment

Re: [OMPI users] How to use non-primitive types with Java binding

Re: [OMPI users] How to use non-primitive types with Java binding

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

[OMPI users] random error bugging me..

4 matches

Site Navigation

Mail list logo

Footer information