I would like to put a non-overlapping ASM solve on the GPU. It's not clear that we have a model for this.
PCApply_ASM currently pipelines the scater with the subdomain solves. I think we would want to change this and do a 1) scatter begin loop, 2) scatter end and non-blocking solve loop, 3) solve-wait and scatter begging loop and 4) scatter end loop. I'm not sure how to go about doing this. * Should we make a new PCApply_ASM_PARALLEL or dump this pipelining algorithm and rewrite PCApply_ASM? * Add a solver-wait method to KSP? Thoughts? Mark
