Just some thoughts offhand: * what version of OMPI are you using?
* are you saying that after the warm reboot, all 48 procs are running on a subset of cores? * it sounds like some of the cores have been marked as “offline” for some reason. Make sure you have hwloc installed on the machine, and run “lstopo” and see if that is the case Ralph > On Mar 17, 2016, at 2:00 AM, Rainer Koenig <rainer.koe...@ts.fujitsu.com> > wrote: > > Hi, > > I'm experiencing a strange problem with running LIGGGHTS on 48 core > workstation running Ubuntu 14.04.4 LTS. > > If I cold boot the workstation and start one of the examples form > LIGGGHTS then everything looks fine: > > $ mpirun -np 48 liggghts < in.chute_wear > > launches the example on all 48 cores, htop in a second window shows that > all cores are occupied and run at nearly 100% workload. > > So far so good. Now I just reboot the workstation and do the exact same > steps as abovre. > > This time the job just runs on a few cores (16 to 20) and the cores > don't even run at 100% load. > > So now I'm trying to find out what is wrong. Bad luck is that I can't > just ask the vendor of the workstation since I'm working for that vendor > and trying to solve this issue. :-) > > I guess that something that OpenMPI needs is initialized different when > I do a cold boot or a warm boot. But how can I find out what is wrong? > > Already tried to look for differences in the Ubuntu boot logs, but there > is nothing different. > > ompi_info --all or even the parsable format doesn't show any difference > between cold boot and warm boot. > > Any ideas what could be wrong after the reboot that causes such a behaviour? > > Thanks, > Rainer > -- > Dipl.-Inf. (FH) Rainer Koenig > Project Manager Linux Clients > Dept. PDG WPS R&D SW OSE > > Fujitsu Technology Solutions > Bürgermeister-Ullrich-Str. 100 > 86199 Augsburg > Germany > > Telephone: +49-821-804-3321 > Telefax: +49-821-804-2131 > Mail: mailto:rainer.koe...@ts.fujitsu.com > > Internet ts.fujtsu.com > Company Details ts.fujitsu.com/imprint.html > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28722.php