Re: [deal.II] Step-69 : Parallel execution on 4 nodes

2021-09-07 Thread Nicholas Yue
That is great news. I will create and mount a new partition with the settings suggested in the article and try again. Cheers On Sat, Sep 4, 2021, 12:55 Wolfgang Bangerth wrote: > On 9/3/21 7:49 PM, Nicholas Yue wrote: > > The file system I am mounting via NFS is an ordinary Linux file system,

Re: [deal.II] Step-69 : Parallel execution on 4 nodes

2021-09-05 Thread Nicholas Yue
@Matthias yes, it wrote out all 200 *.pvtu and *.vtu file pair and they are non-empty. Seems it is only the check pointing that is having problem via MPI-IO FYI, I have now also created and mounted a different storage and followed the `noac` NFS setting and have been able to run the full simulati

Re: [deal.II] Step-69 : Parallel execution on 4 nodes

2021-09-05 Thread Matthias Maier
> I tried commenting out the call to check-pointing as you suggested and was > able to run the code on 4 node (each with 4 cores) and it finished very > quickly. Did it write out vtu files during this run? (They are written using MPI-IO) Best, Matthias -- The deal.II project is located at h

Re: [deal.II] Step-69 : Parallel execution on 4 nodes

2021-09-04 Thread Wolfgang Bangerth
On 9/3/21 7:49 PM, Nicholas Yue wrote: The file system I am mounting via NFS is an ordinary Linux file system, it is not a HPC parallel filesystems like Lustre or anything like them. I tried commenting out the call to check-pointing as you suggested and was able to run the code on 4 node (each

Re: [deal.II] Step-69 : Parallel execution on 4 nodes

2021-09-03 Thread Nicholas Yue
The file system I am mounting via NFS is an ordinary Linux file system, it is not a HPC parallel filesystems like Lustre or anything like them. I tried commenting out the call to check-pointing as you suggested and was able to run the code on 4 node (each with 4 cores) and it finished very quic

Re: [deal.II] Step-69 : Parallel execution on 4 nodes

2021-09-03 Thread Matthias Maier
Hi Nicholas, On Fri, Sep 3, 2021, at 12:49 CDT, Nicholas Yue wrote: > Hi > > It seems to be consistently failing when writing the checkpoint file(s) > > Are there special flags I need to setup up for some form of parallel IO > that may be happening ? > [...] > Additional information: >

Re: [deal.II] Step-69 : Parallel execution on 4 nodes

2021-09-03 Thread Wolfgang Bangerth
On 9/3/21 11:49 AM, Nicholas Yue wrote: An error occurred in line <1412> of file in function     void dealii::parallel::DistributedTriangulationBasespacedim>::DataTransfer::save(unsigned int, unsigned int, const string&) const [with int dim = 2; int spacedim = 2; std::__cxx11::string = std::

[deal.II] Step-69 : Parallel execution on 4 nodes

2021-09-03 Thread Nicholas Yue
Hi I have build deal.ii on the Nvidia Jetson Nano cluster from Picocluster It has 5 nodes (pc[0-4], I am using pc0 as the head/login node to launch the application The executable is in an NFS filesystem to share between all the nodes The following works mpirun --host pc1 --mca btl_tcp_if_incl