Wolfgang, thanks again for your response. When I started to try to implement PETSc/MPI I didn't realize the normal solverSG<> was running with all CPUs. As of now I have no need for PETSc/MPI. I did a little more looking and found that on a single processor it was no faster than the non-PETSc/MPI. For the 3D elastic problem I was working on I found, also, by trial and error, that SSOR was fastest even with 25% CPU usage with a relaxation parameter = 1.2. Jacobi which showed 80-90% CPU usage in the threads was actually slower. Both were faster the PETSc/MPI. I guess PETSc/MPI is more for large problems on multiple processors. At least I got PETSc/MPI working, which may be useful in the future.
Pete Griffin On Friday, August 5, 2016 at 12:35:23 PM UTC-4, bangerth wrote: > > > Pete, > > > Bruno, I assumed that the thread with 100% CPU usage was somehow feeding > the > > others in step-8, > > It's more like for some functions, we split operations onto as many > threads as > there are CPUs. But then, the next function you call may not be > parallelized, > and so everything only works on one thread. On average, that one thread > has a > load of 100% whereas the others have a lesser load. > > > > I just tested the step-8 program with PreconditionIdentity() witch > showed 100% > > CPU usage on all 8 CPUs. The results follow. Assuming having no > preconditioner > > only slows things down maybe getting 3 times the CPU power will make it > up. I > > haven't checked solve times yet. The preconditioner for step-8 was > > PreconditionSSOR<> with relaxation parameter = 1.2. Is there an optimum > > preconditioner/relaxation parameter for 3d elasticity problems that you > know > > of? Is their determination only by the trial and error? > > 1.2 seems to be what a lot of people use. > > As for thread use: if you use PreconditionIdentity, *all* major operations > that CG calls are parallelized. On the other hand, using PreconditionSSOR, > you > will be spend at least 50% of your time in the preconditioner, but SSOR is > a > sequential method where you need to compute the update for one vector > element > before you can more to the next. So it cannot be parallelized, and > consequently your average thread load will be less than 100%. > > Neither of these are good preconditioners in the big scheme of things, if > you > envision going to large problems. For those, you ought to use variations > of > the multigrid method. > > > > Wolfgang, What I meant by efficiency was the CPU usage in the threads > for > > Step-17 NEW and OLD decreased with the larger #DOFs or cycle #'s. > > If the load decreased for both codes, I would attribute this to memory > traffic. If the problem is small enough, much of it will fit into the > caches > of the processor/cores, and so you get high throughput. If the problem > becomes > bigger, processors wait for data for longer. Waiting is, IIRC, still > counted > as processor load, but it may make some operations that are not > parallelized > take longer than those that are parallelized, and so overall lead to a > lower > average thread load. > > But that's only a theory that would require a lot more digging to verify. > > Best > W. > > > -- > ------------------------------------------------------------------------ > Wolfgang Bangerth email: bang...@colostate.edu > <javascript:> > www: http://www.math.tamu.edu/~bangerth/ > > -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.