Issue #8290 reported.
Thanks all for your help and the workaround provided.
Patrick
Le 14/12/2020 à 17:40, Jeff Squyres (jsquyres) a écrit :
> Yes, opening an issue would be great -- thanks!
>
>
>> On Dec 14, 2020, at 11:32 AM, Patrick Bégou via users
>> mailto:users@lists
und.
> At first glance, I did not spot any issue in the current code.
> It turned out that the memory leak disappeared when doing things
> differently
>
> Cheers,
>
> Gilles
>
> On Mon, Dec 14, 2020 at 7:11 PM Patrick Bégou via users
> mailto:users@lists.open-mpi.org>&
a try?
>
>
> Cheers,
>
>
> Gilles
>
>
>
> On 12/7/2020 6:15 PM, Patrick Bégou via users wrote:
>> Hi,
>>
>> I've written a small piece of code to show the problem. Based on my
>> application but 2D and using integers arrays for testing.
>
n used, and can therefore be used to convert back
>>> into a valid datatype pointer, until OMPI completely releases the
>>> datatype. Look into the ompi_datatype_f_to_c_table table to see the
>>> datatypes that exist and get their pointers, and then use these
>>> pointers as
ion problem but deeper in the code I think.
Patrick
Le 04/12/2020 à 19:20, George Bosilca a écrit :
> On Fri, Dec 4, 2020 at 2:33 AM Patrick Bégou via users
> mailto:users@lists.open-mpi.org>> wrote:
>
> Hi George and Gilles,
>
> Thanks George for your suggesti
and can therefore be used to convert back into a
>> valid datatype pointer, until OMPI completely releases the datatype.
>> Look into the ompi_datatype_f_to_c_table table to see the datatypes
>> that exist and get their pointers, and then use these pointers as
>> argume
)
> are used?
>
>
> mpirun --mca pml_base_verbose 10 --mca mtl_base_verbose 10 --mca
> btl_base_verbose 10 ...
>
> will point you to the component(s) used.
>
> The output is pretty verbose, so feel free to compress and post it if
> you cannot decipher it
>
>
> Cheers,
>
nter, until OMPI completely releases the datatype.
>> Look into the ompi_datatype_f_to_c_table table to see the datatypes
>> that exist and get their pointers, and then use these pointers as
>> arguments to ompi_datatype_dump() to see if any of these existing
>> datatypes are the one
Hi,
I'm trying to solve a memory leak since my new implementation of
communications based on MPI_AllToAllW and MPI_type_Create_SubArray
calls. Arrays of SubArray types are created/destroyed at each time step
and used for communications.
On my laptop the code runs fine (running for 15000 temporal
.
>
> Sharing a reproducer will be very much appreciated in order to improve ompio
>
> Cheers,
>
> Gilles
>
> On Thu, Dec 3, 2020 at 6:05 PM Patrick Bégou via users
> wrote:
>> Thanks Gilles,
>>
>> this is the solution.
>> I will set OMPI_MCA_io=^ompi
.
>
>
> You can force romio with
>
> mpirun --mca io ^ompio ...
>
>
> Cheers,
>
>
> Gilles
>
> On 12/3/2020 4:20 PM, Patrick Bégou via users wrote:
>> Hi,
>>
>> I'm using an old (but required by the codes) version of hdf5 (1.8.12) in
>
Hi,
I'm using an old (but required by the codes) version of hdf5 (1.8.12) in
parallel mode in 2 fortran applications. It relies on MPI/IO. The
storage is NFS mounted on the nodes of a small cluster.
With OpenMPI 1.7 it runs fine but using modern OpenMPI 3.1 or 4.0.5 the
I/Os are 10x to 100x slowe
; the mpirun --version in all 3 nodes are identical, openmpi 2.1.1, and
> same place when testing with "whereis mpirun"
> So is there any problem with mpirun causing it to not launch to other
> nodes?
>
> Regards
> HaChi
>
> On Thu, 4 Jun 2020 at 14:35, Patric
Hi Ha Chi
do you use a batch scheduler with Rocks Cluster or do you log on the
node with ssh ?
If ssh, can you check that you can ssh from one node to the other
without password ?
Ping just says the network is alive, not that you can connect.
Patrick
Le 04/06/2020 à 09:06, Hà Chi Nguyễn Nhật vi
Hi all,
I've built OpenMPI 4.3.0 with GCC 9.3.0 but on the server ucx was not
available when I set --with-ucx. I remove this option and it compiles
fine without ucx. However I have a strange behavior as when using mpirun
I must explicitely remove ucx to avoid an error: in my module file I
have to
twork-and-i-o/fabric-products/OFED_Host_Software_UserGuide_G91902_06.pdf#page72>
>
> TS is the same hardware as the old QLogic QDR HCAs so the manual might
> be helpful to you in the future.
>
> Sent from my iPad
>
>> On May 9, 2020, at 9:52 AM, Patrick Bégou via users
>>
Le 08/05/2020 à 21:56, Prentice Bisbal via users a écrit :
>
> We often get the following errors when more than one job runs on the
> same compute node. We are using Slurm with OpenMPI. The IB cards are
> QLogic using PSM:
>
> 10698ipath_userinit: assign_context command failed: Network is down
> no
(OAR) so the problem was not critical for my futur.
Patrick
>
> On Sun, 26 Apr 2020 at 18:09, Patrick Bégou via users
> mailto:users@lists.open-mpi.org>> wrote:
>
> I have also this problem on servers I'm benching at DELL's lab with
> OpenMPI
I have also this problem on servers I'm benching at DELL's lab with
OpenMPI-4.0.3. I've tried a new build of OpenMPI with "--with-pmi2". No
change.
Finally my work around in the slurm script was to launch my code with
mpirun. As mpirun was only finding one slot per nodes I have used
"--oversubscri
long way of saying: make sure that you have no other Open
> MPI installation findable in your PATH / LD_LIBRARY_PATH and then try
> running `make check` again.
>
>
>> On Apr 21, 2020, at 2:37 PM, Patrick Bégou via users
>> mailto:users@lists.open-mpi.org>> wrote:
>>
>&
Hi OpenMPI maintainers,
I have temporary access to servers with AMD Epyc processors running RHEL7.
I'm trying to deploy OpenMPI with several setup but each time "make
check" fails on *opal_path_nfs*. This test freeze for ever consuming no
cpu resources.
After nearly one hour I have killed the p
Hi David,
could you specify which version of OpenMPI you are using ?
I've also some parallel I/O trouble with one code but still have not
investigated.
Thanks
Patrick
Le 13/04/2020 à 17:11, Dong-In Kang via users a écrit :
>
> Thank you for your suggestion.
> I am more concerned about the poor
Patrick,
>>
>> The root cause is we do not include the localhost interface by
>> default for OOB communications.
>>
>>
>> You should be able to run with
>>
>> mpirun --mca oob_tcp_if_include lo -np 4 hostname
>>
>>
>> Cheers,
>>
&g
Does “no network is available” means the lo interface (localhost
> 127.0.0.1) is not even available ?
>
> Cheers,
>
> Gilles
>
> On Monday, January 28, 2019, Patrick Bégou
> <mailto:patrick.be...@legi.grenoble-inp.fr>> wrote:
>
> Hi,
>
> I fal
Hi,
I fall in a strange problem with OpenMPI 3.1 installed on a CentOS7
laptop. If no network is available I cannot launch a local mpi job on
the laptop:
bash-4.2$ mpirun -np 4 hostname
--
No network interfaces were found fo
I have downloaded the nightly snapshot tarball of october 10th 2018 for
the 3.1 version and it solves the memory problem.
I ran my test case on 1, 2, 4, 10, 16, 20, 32, 40, and 64 cores
successfully.
This version also allows to compile my prerequisites libraries, so we
can use it out of the box to
eds more memory he has to request 2
cores, even if he uses a sequential code. This avoid crashing jobs of
other users on the same node with memory requirements. But this is not
configured on your node.
Duke Nguyen a écrit :
On 3/30/13 3:13 PM, Patrick Bégou wrote:
I do not know about your co
I do not know about your code but:
1) did you check stack limitations ? Typically intel fortran codes needs
large amount of stack when the problem size increase.
Check ulimit -a
2) did your node uses cpuset and memory limitation like fake numa to set
the maximum amount of memory available for
28 matches
Mail list logo