Ok, summary of errors begin here. First, errors with MPI in double precision:
1 Simple Test: bham: ns type Simple is not supported with domain decomposition, use particle decomposition: mdrun -pd 7 Complex Tests: acetonitrilRF: ns type Simple is not supported with domain decomposition, use particle decomposition: mdrun -pd aminoacids: ns type Simple is not supported with domain decomposition, use particle decomposition: mdrun -pd argon: ns type Simple is not supported with domain decomposition, use particle decomposition: mdrun -pd sw: ns type Simple is not supported with domain decomposition, use particle decomposition: mdrun -pd tip4p: ns type Simple is not supported with domain decomposition, use particle decomposition: mdrun -pd urea: ns type Simple is not supported with domain decomposition, use particle decomposition: mdrun -pd water: ns type Simple is not supported with domain decomposition, use particle decomposition: mdrun -pd 16 Kernel Tests: 0 computation time. Something gone REALLY bad on those... :( Except for the kernel tests, (seems that) in all I'm getting that same error message (still looking at it). Are those expected to appear? And the kernel ones? Am I wrong, or that means compilation problems (specially because they appear in all tests, single and double precision, with and withou MPI). Also getting error in serial in single precision in 4 complex tests. Those seems to have run, but yelded wrong results? Does anybody has any clue, please? Shall I go straight to recompilation, despite there is no reason for failure here? Thanks a lot! Jones On Mon, May 11, 2009 at 10:42 PM, Jones de Andrade <johanne...@gmail.com>wrote: > Hi Justin. > > Well, bothering again. Good and bad news. > > The good news: I found a strange "work-around" for my problems here. For > some reason, the perl script updates the path, environments and everything > else when runs. So, the variables I placed on the script I was using where > simply lost. Workaround here was, then, to just include those in the .tcshrc > file and log again. > > The problem is that it's not pratical. I'm trying a lot of different MPIs > and libraries compilations, and having to edit that file, and or logou/login > or source it, is not pratical at all. Is there any other way, so that the > perl script will be happy with the variables it has when its called, instead > of initializing all them again? > > Second, here comes the real bad news: Lots of erros. > > Without MPI, in single precision, 4 complex and 16 kernel tests fail. > > Without MPI, but in double precision, "just" the 16 kernel tests fail. > > With MPI, in single precision, it fails on 1 simple, 9 complex and 16 > kernel tests! > > And with MPI and double precision, 1 simple, 7 complex and 16 kernel tests > fails. :P > > Edit: Just received your message. Well, it seems that I've done a mistake > on my script, but since at least part of the tests worked, it means that > it's not the MPI that is, at least, missconfigured. > > I will look deeper into the erros above, and tell you later. > > Thanks a lot, > > Jones > > > On Mon, May 11, 2009 at 9:41 PM, Jones de Andrade <johanne...@gmail.com>wrote: > >> Hi Justin. >> >> Thanks a lot for that. It helped, but enough yet. :( Just made 4.0.4 >> tests reach the same "range of errors" that I'm getting with 3.3.3. :P >> >> Using openMPI, it just complains that it can't find orted. That would mean >> that the paths are not in there, BUT they are. :P If I just try to run orted >> from the command line without any arguments: >> >> ***************** >> *gmxtest404 196% orted >> [palpatine:28366] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >> runtime/orte_init.c at line 125 >> -------------------------------------------------------------------------- >> It looks like orte_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during orte_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> orte_ess_base_select failed >> --> Returned value Not found (-13) instead of ORTE_SUCCESS >> -------------------------------------------------------------------------- >> [palpatine:28366] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >> orted/orted_main.c at line 323 >> ****************** >> >> So, the shell IS finding the file. But when I do it not from the script >> anymore (I was already thinking in something on the "it-else-end" stack), >> all mpi tests fail with the following message on mdrun.out file: >> >> ********************** >> *orted: Command not found. >> -------------------------------------------------------------------------- >> A daemon (pid 27972) died unexpectedly with status 1 while attempting >> to launch so we are aborting. >> >> There may be more information reported by the environment (see above). >> >> This may be because the daemon was unable to find all the needed shared >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the >> location of the shared libraries on the remote nodes and this will >> automatically be forwarded to the remote nodes. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -------------------------------------------------------------------------- >> mpirun: clean termination accomplished >> *********************** >> >> What is going on? Next thing I think about doing is to execute a full >> command line from one of the tests directly, to see that it works... :( :P >> >> Now I'm absolutelly lost. Any ideas, please? >> >> Thanks a lot, >> >> Jones >> >> >> On Mon, May 11, 2009 at 9:07 PM, Justin A. Lemkul <jalem...@vt.edu>wrote: >> >>> >>> >>> Justin A. Lemkul wrote: >>> >>>> >>>> >>>> Jones de Andrade wrote: >>>> >>>>> Hi Justin >>>>> >>>>> This has been discussed several times on the list. The -np flag is >>>>> no longer necessary with grompp. You don't get an mdrun.out because >>>>> the .tpr file is likely never created, since grompp fails. >>>>> >>>>> >>>>> Yes, I know that and that is what I would have expected. But what I'm >>>>> running is the gmxtest.pl script. Even using the 4.0.4 version, it >>>>> explicit >>>>> states that I must use "-np N" to make parallel works on its own command >>>>> line. >>>>> >>>>> ************ >>>>> gmxtest.pl >>>>> Usage: ./gmxtest.pl [ -np N ] [-verbose ] [ -double ] [ simple | >>>>> complex | kernel | pdb2gmx | all ] >>>>> or: ./gmxtest.pl clean | refclean | dist >>>>> ************ >>>>> >>>>> I would expect that the script would use it only for mdrun and not for >>>>> grompp, but it seems to try to use on both. What becomes really strange it >>>>> the testbed really works. So, gmxtest.pl has a bug on 4.0.4? Or how >>>>> should I >>>>> really tell gmxtest.pl to test in a growing number of cores? >>>>> >>>>> >>>> >>>> Ah, sorry for the mis-read :) There is a simple fix that you can apply >>>> to the gmxtest.pl script: >>>> >>>> % diff gmxtest.pl gmxtest_orig.pl >>>> 161c161 >>>> < system("$grompp -maxwarn 10 $ndx > grompp.out 2>&1"); >>>> --- >>>> > system("$grompp -maxwarn 10 $ndx $par > grompp.out 2>&1"); >>>> >>>> -Justin >>>> >>>> >>>>> >>>>> Version 3.3.3 on the other hand already failed in so many >>>>> different places that I'm really thinking IF I'll make it >>>>> available in the new cluster. :P >>>>> >>>>> >>>>> What messages are you getting from 3.3.3? I thought you said the >>>>> 3.3.x series worked fine. >>>>> >>>>> >>>>> I'll login for those and try to get any reproducible error here. ;) As >>>>> soon as I have these, I post back in this thread. >>>>> >>>>> Thanks a lot again, >>>>> >>>>> Jones >>>>> >>>> >>>> >>> -- >>> ======================================== >>> >>> Justin A. Lemkul >>> Ph.D. Candidate >>> ICTAS Doctoral Scholar >>> Department of Biochemistry >>> Virginia Tech >>> Blacksburg, VA >>> jalemkul[at]vt.edu | (540) 231-9080 >>> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin >>> >>> ======================================== >>> _______________________________________________ >>> gmx-users mailing list gmx-users@gromacs.org >>> http://www.gromacs.org/mailman/listinfo/gmx-users >>> Please search the archive at http://www.gromacs.org/search before >>> posting! >>> Please don't post (un)subscribe requests to the list. Use the www >>> interface or send it to gmx-users-requ...@gromacs.org. >>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php >>> >> >> >
_______________________________________________ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php