Dear Jeff,

Of course I was thinking to execute memtester on each node on the same time and gather the outputs. However executing memtester on a node with 48GB memory it takes a lot of time (more than 1-2 hours, I don't remember exactly, maybe even more because I cancelled its execution) and I have to consume resources just for testing. So I was curious if you know a tool/procedure that works much faster. Of course filling the memory with an application works also but I don't know how right it is.

Best regards,
George Markomanolis

On 11/26/2012 06:09 PM, Jeff Squyres wrote:
On Nov 26, 2012, at 4:02 AM, George Markomanolis wrote:

Another more generic question, is about discovering nodes with faulty memory. 
Is there any way to identify nodes with faulty memory? I found accidentally 
that a node with exact the same hardware couldn't execute an MPI application 
when it was using more than 12GB of ram while the second one could use all of 
the 48GB of memory. If I have 500+ nodes is difficult to check all of them and 
I am not familiar with any efficient solution. Initially I thought about 
memtester but it takes a lot of time. I know that this does not apply exactly 
on this mailing list but I thought that maybe an OpenMPI user knows something 
about.
You really do want something like a memory tester.  MPI applications *might* 
beat on your memory to identify errors, but that's really just a side effect of 
HPC access patterns.  You really want a dedicated memory tester.

If such a memory tester takes a long time, you might want to use mpirun to 
launch it on multiple nodes simultaneously to save some time...?


Reply via email to