> Secondly, I'd like to add a multilevel "simple" partitioning in DMPlex to > optimize communication. I am thinking that I can create a mesh with > 'nnodes' cells and distribute that to 'nnodes*procs_node' processes with a > "spread" distribution. (the default seems to be "compact"). Then refine > that enough to get 'procs_node' more cells and the use a simple partitioner > again to put one cell on each process, in such a way that the locality is > preserved (not sure how that would work). Then refine from there on each > proc for a scaling study. > > Mark
for multilevel partitioning, you need custom code, since what kills performances with one-to-all patterns in DMPlex is the actual communication of the mesh data. However, you can always generate a mesh to have one cell per process, and then refine from there. I have coded a multilevel partitioner that works quite well for general meshes, we have it in a private repo with Lisandro. From my experience, the benefits of using the multilevel scheme start from 4K processes on. If you plan very large runs (say > 32K cores) then you definitely want a multistage scheme. We never contributed the code since it requires some boilerplate code to run through the stages of the partitioning and move the data. If you are using hexas, you can always define your own "shell" partitioner producing box decompositions. Another option is to generate the meshes upfront in sequential, and then use the parallel HDF5 reader that Vaclav and Matt put together. > The point here is to get communication patterns that look like an > (idealized) well partition application. (I suppose I could take an array of > factors, the product of which is the number of processors, and generalize > this in a loop for any number of memory levels, or make an oct-tree). > > Any thoughts? > Thanks, > Mark > > > -- Stefano
