Thanks you for the tips. I changed to MKL 10.0.3 and the threading problem disappeared. No problem to run 8 threads anymore. Regarding the QTL-B-errors, I tried running with MPI parallelization but without the iterative diagonalization. This seems to work, no QTL-B errors. Another (smaller) case I tried ran without any problem using MPI + iterative diag. From a code design point of view, should there be any difference between lapwX and lapwX_mpi due to different parameters such as NMATMAX, which could give rise to these QTL-B errors? I should mention that NMATMAX is not limiting in my case. You mention the iterative scheme uses a subset of eigenvectors from a previous iteration. Is this subset of old eigenvectors smaller or different when using MPI compared to k-parallel?
To be on the safe side I will run using MPI but without the '-it' switch in the future. /Johan Laurence Marks wrote: > This is one of many issues. > > 1) For mkl 10 make sure that you are using version 10.0.3, the earlier > versions of 10.X had some bugs. > > 2) Make sure that you do not have a problem in your network software. > I have a new cluster on which the "official" version of mvapich was > installed, and this had a scalapack bug. Their current version (via > their equivalent of cvs) works well. For you check the openmpi > webpage. > > 3) For mkl 10 there are some issues with the size of buffer arrays; in > essence unless one uses sizes at least those that the Intel code > "likes" (via a workspace query call), problems can occur. I think that > this is an Intel bug, they probably call it a "feature". While this is > probably not a problem for real cases (because of some code changes) > and non-iterative calculations, it may still be in the current version > on the web for complex iterative cases. > > 4) In the iterative versions only a subset of the eigenvectors from a > previous iteration are used. If the space of these old eigenvectors > does not include a good approximation to a new eigenvalue you may get > ghost-bands (QTL-B errors). One workaround is to use more old > eigenvectors, i.e. increase nband at the bottom of case.in1 or > case.in1c. > > 5) If 4) does not work (it does not always), consider using LAPW for > some of the states. For instance, with relatively large RMT's (2.0) > for d-electron transition elements (e.g. Ni) switching to LAPW rather > than APW+lo for the d's stabilized the iterative mode for some > calculations. > > On Fri, Jun 13, 2008 at 2:47 AM, Johan Eriksson <joher at ifm.liu.se> wrote: > >> Dear Wien community, >> I'm running the latest Wien2k release on a linux cluster. IFORT 10.1, >> cmkl 9.1, openmpi 1.2.5). >> The cases are running fine with k-point parallelization + MPI lapw0. >> However, since there are many more cpus than k-points and infiniband >> interconnects I want to use full MPI parallelization. First I ran my >> case with k-point parallel for a few cycles, stopped, ran clean_lapw and >> then switched to MPI. After a few iterations I started getting QTL-B >> warnings and it crash. If I switch back to k-point parallel it runs just >> fine again. >> What am I doing wrong here? Could it be that I'm using the iterative >> diagonalization scheme (-it switch)? Should I try some other mkl och MPI >> implementation? >> >> Also, why is it that the serial benchmark 'x lapw1 -c' is so unstable >> with mkl 10 then using OMP_NUM_THREADS>=4? With cmkl 9.1 it works fine >> with 1,2,4 and 8 threads. When mkl 10 works it is however faster than >> cmkl 9.1. >> >> >> >> /Johan Eriksson >> _______________________________________________ >> Wien mailing list >> Wien at zeus.theochem.tuwien.ac.at >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien >> >> > > > >