Brock,
The only thing that came to mind was that possibly on the second dump,
the I/O was substantial enough to cause an overload of the OSS's (I/O
servers) resulting in a process or task hang? Can you tell if your
Lustre environment is getting overwhelmed when the Open MPI / FLASH
combination checkpoints the second time? I know you write files > 2gb
all the time, but if this particular combination is delivering that
amount of data in a much shorter period of time.....
Just a thought :-\
Jeff F. Pummill
University of Arkansas
//
Brock Palen wrote:
I started a new run with some changes,
Shortening the run wont work well, it takes 3 days just to get
through the AMR.
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Jan 25, 2008, at 3:01 PM, Daniel Pfenniger wrote:
Hi,
Brock Palen wrote:
Is anyone using flash with openMPI? we are here, but when ever it
tries to write its second checkpoint file it segfaults once it gets
to 2.2GB always in the same location.
Debugging is a pain as it takes 3 days to get to that point. Just
wondering if anyone else has seen this same behavior.
Just to make testing faster you might think reducing the file output
interval (trstrt or nrstrt parameters in flash.par), and decrease the
resolution (lrefine_max) to produce smaller files and to see whether
the problem is related with the file size.
Dan
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users