Re: [OMPI users] shared memory performance

Gilles Gouaillardet Fri, 24 Jul 2015 06:04:24 -0400 (EDT)

Cristian,

one more thing...
two containers on the same host cannot communicate with the sm btl.
you might want to mpirun with --mca btl tcp,self on one physical machine
without container,
in order to asses the performance degradation due to using tcp btl and
without any containerization effect.


Cheers,

Gilles

On Friday, July 24, 2015, Harald Servat <harald.ser...@bsc.es> wrote:

> Dear Cristian,
>
>   according to your configuration:
>
>   a) - 8 Linux containers on the same machine configured with 2 cores
>   b) - 8 physical machines
>   c) - 1 physical machine
>
>   a) and c) have exactly the same physical computational resources despite
> the fact that a) is being virtualized and the processors are oversubscribed
> (2 virtual cores per physical core). I'm not an expert on virtualization,
> but since a) and c) are exactly the same hardware (in terms of core and
> memory hierarchy), and your application seems memory bounded, I'd expect to
> see what you tabulated and b) is faster because you have 8 times the memory
> cache.
>
> Regards
> PS Your name in the mail is different, maybe you'd like to fix it.
>
> On 22/07/15 10:42, Crisitan RUIZ wrote:
>
>> Thank you for your answer Harald
>>
>> Actually I was already using TAU before but it seems that it is not
>> maintained any more and there are problems when instrumenting
>> applications with the version 1.8.5 of OpenMPI.
>>
>> I was using the OpenMPI 1.6.5 before to test the execution of HPC
>> application on Linux containers. I tested the performance of NAS
>> benchmarks in three different configurations:
>>
>> - 8 Linux containers on the same machine configured with 2 cores
>> - 8 physical machines
>> - 1 physical machine
>>
>> So, as I already described it, each machine counts with 2 processors (8
>> cores each). I instrumented and run all NAS benchmark in these three
>> configurations and I got the results that I attached in this email.
>> In the table "native" corresponds to using 8 physical machines and "SM"
>> corresponds to 1 physical machine using the sm module, time is given in
>> miliseconds.
>>
>> What surprise me in the results is that using containers in the worse
>> case have equal communication time than just using plain mpi processes.
>> Even though the containers use virtual network interfaces to communicate
>> between them. Probably this behaviour is due to process binding and
>> locality, I wanted to redo the test using OpenMPI version 1.8.5 but
>> unfourtunately I couldn't sucessfully instrument the applications. I was
>> looking for another MPI profiler but I couldn't find any. HPCToolkit
>> looks like it is not maintain anymore, Vampir does not maintain any more
>> the tool that instrument the application.  I will probably give a try to
>> Paraver.
>>
>>
>>
>> Best regards,
>>
>> Cristian Ruiz
>>
>>
>>
>> On 07/22/2015 09:44 AM, Harald Servat wrote:
>>
>>>
>>> Cristian,
>>>
>>>   you might observe super-speedup heres because in 8 nodes you have 8
>>> times the cache you have in only 1 node. You can also validate that by
>>> checking for cache miss activity using the tools that I mentioned in
>>> my other email.
>>>
>>> Best regards.
>>>
>>> On 22/07/15 09:42, Crisitan RUIZ wrote:
>>>
>>>> Sorry, I've just discovered that I was using the wrong command to run on
>>>> 8 machines. I have to get rid of the "-np 8"
>>>>
>>>> So, I corrected the command and I used:
>>>>
>>>> mpirun --machinefile machine_mpi_bug.txt --mca btl self,sm,tcp
>>>> --allow-run-as-root mg.C.8
>>>>
>>>> And got these results
>>>>
>>>> 8 cores:
>>>>
>>>> Mop/s total     =                 19368.43
>>>>
>>>>
>>>> 8 machines
>>>>
>>>> Mop/s total     =                 96094.35
>>>>
>>>>
>>>> Why is the performance of mult-node almost 4 times better than
>>>> multi-core? Is this normal behavior?
>>>>
>>>> On 07/22/2015 09:19 AM, Crisitan RUIZ wrote:
>>>>
>>>>>
>>>>>  Hello,
>>>>>
>>>>> I'm running OpenMPI 1.8.5 on a cluster with the following
>>>>> characteristics:
>>>>>
>>>>> Each node is equipped with two Intel Xeon E5-2630v3 processors (with 8
>>>>> cores each), 128 GB of RAM and a 10 Gigabit Ethernet adapter.
>>>>>
>>>>> When I run the NAS benchmarks using 8 cores in the same machine, I'm
>>>>> getting almost the same performance as using 8 machines running a mpi
>>>>> process per machine.
>>>>>
>>>>> I used the following commands:
>>>>>
>>>>> for running multi-node:
>>>>>
>>>>> mpirun -np 8 --machinefile machine_file.txt --mca btl self,sm,tcp
>>>>> --allow-run-as-root mg.C.8
>>>>>
>>>>> for running in with 8 cores:
>>>>>
>>>>> mpirun -np 8  --mca btl self,sm,tcp --allow-run-as-root mg.C.8
>>>>>
>>>>>
>>>>> I got the following results:
>>>>>
>>>>> 8 cores:
>>>>>
>>>>>  Mop/s total     =                 19368.43
>>>>>
>>>>> 8 machines:
>>>>>
>>>>> Mop/s total     =                 19326.60
>>>>>
>>>>>
>>>>> The results are similar for other benchmarks. Is this behavior normal?
>>>>> I was expecting to see a better behavior using 8 cores given that they
>>>>> use directly the memory to communicate.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Cristian
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/07/27295.php
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/07/27297.php
>>>>
>>>
>>>
>>> WARNING / LEGAL TEXT: This message is intended only for the use of the
>>> individual or entity to which it is addressed and may contain
>>> information which is privileged, confidential, proprietary, or exempt
>>> from disclosure under applicable law. If you are not the intended
>>> recipient or the person responsible for delivering the message to the
>>> intended recipient, you are strictly prohibited from disclosing,
>>> distributing, copying, or in any way using this message. If you have
>>> received this communication in error, please notify the sender and
>>> destroy and delete any copies you may have received.
>>>
>>> http://www.bsc.es/disclaimer
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/07/27298.php
>>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/07/27300.php
>>
>>
>
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
>
> http://www.bsc.es/disclaimer
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/07/27320.php
>

Re: [OMPI users] shared memory performance

Reply via email to