Re: [deal.II] a few questions about matrix-free

Martin Kronbichler Fri, 30 Sep 2016 00:47:04 -0700

Dear Denis,


> (1) FEEvaluation has read_dof_values (const std::vector< VectorType >
> &src, const unsigned int first_index=0),
> but i don’t see a function to get to values of each DoF vector at
> quadrature points.
If you need to get the vector values at quadrature points, you would
typically call something along these lines:
  phi.reinit(cell_index);
  phi.read_dof_values(src, 0);
  phi.evaluate(true, false); // only want values in quadrature points
  for (unsigned int q=0; q<phi.n_q_points; ++q)
    Tensor<1,n_components,VectorizedArray<double> > val = phi.get_value(q);

(There is an overload in case n_components is one, which gives you
simply VectorizedArray)

If you instead want to access the solution values directly (for a nodal
basis, this corresponds to the values of the solution in the nodes), you
can also access those:
  phi.get_dof_value(i);
where i is the index of the degree of the 'node'. Note that the
numbering of the degrees of freedom inside continuous elements is a
different from the ordering in FE_Q (or FEValues) because FEEvaluation
needs them in lexicographic order (just as FE_DGQ). Re-indexing would be
too expensive because the access is on the critical path in the
tensorial evaluation parts. Furthermore, vector-valued elements list
their "number of cell dofs" (this is a bit of a misnomer I regret) as
the number of dofs in a scalar component and concatenate the components
in a Tensor<1,n_components,VectArr>.

> (2) For non-linear problems solved in staggered way one has several
> matrix-free operators which need to exchange
> some data on quadrature points. Each of those operators has a separate
> MatrixFree object assigned as they generally
> may require different update flags (values, gradients, etc). In order
> to loop over cells during update of quadrature point
> data one needs to use MatrixFree object together with FEEvaluation. So
> the questions is about this loop (cell/quadrature points)
> guaranteed to be the same for slightly different MatrixFree data objects.
>
> Supposedly if 
>
> MPI_Comm                     mpi_communicator
> TasksParallelScheme tasks_parallel_scheme
> unsigned int                  tasks_block_size
>
> as well as DoFHandler, Constraints and Quadrature 
> are the same among all the MatrixFree objects, then the loop over
> block of local cells and quadrature points 
> is guaranteed to be the same, right?

Even though the layout is going to be the same in serial, there is no
guarantee about the ordering to be same for different
DoFHandler/Constraints in the MPI case. The reason is that the algorithm
detects cells that need data exchange with MPI and those can change for
different elements - FE_Q with hanging node constraints connects to more
neighbors than a FE_DGQ element, for instance, and cells which need data
exchange are put in different positions inside the cell loop. But of
course, if you feed the exact same DoFHandler, ConstraintMatrix, and
options setting, then the result is going to be the same because the
algorithm is deterministic.

I wonder what exactly your requirements are that claim for different
MatrixFree objects, though. Everything that is evaluated in a common
cell loop (i.e., different solution vectors that come from different
time steps) should all use the same MatrixFree object. You would create
a different FEEvaluation object for each of them but let them access the
data from the same MatrixFree object. For the update flags, you would
take the largest set of flags that you need. Note that MatrixFree lets
you collect several DoFHandler and constraint combinations in one object
to allow for combined evaluation. You should really think of the
MatrixFree object as a container that stores the data necessary for
evaluation of the weak forms. It is one or several 'evaluator' objects
attached to MatrixFree that then define the differential operator.

In case you are interested of how to combine different things, I have a
fairly large application (two-phase flow with various solvers) based on
MatrixFree in a project at github,
https://github.com/kronbichler/adaflo

The interesting file to look at for how we build the MatrixFree object
on a system where you have velocity and pressure for Navier-Stokes plus
a few DoFHandler objects for scalars is the method
'initialize_data_structures' in source/two_phase_base.cc (lines 222-250).

For an example of how this is used is in source/level_set_okz.cc, method
local_compute_force, lines 315-392 where a force is computed that
combines evaluators from four different DoFHandler/constraint
combinations (representing a scalar level set function, a scalar
curvature field, the fluid velocity and the pressure space).

For an example of vectors coming from different time steps, check the
file source/navier_stokes_matrix.cc, method 'local_operation', lines
568-828. The path with LocalOps == NavierStokesOps::residual computes
the Navier-Stokes residual using two old solution values in a BDF-2 time
integration scheme.

We wrote a manuscript about the algorithms and performance of those
algorithms which is in the final stage of typesetting. I will put a link
on the github page when the paper is available. In case you are
interested, I can send you a preprint.

> As an example, say one operator needs shape values only, another --
> only gradients, finally in order to initialize quadrature point data i 
> need the locations of quadrature points in real space. I suppose in
> this case I can keep and use an auxiliary MatrixFree object 
> to setup quadrature point data granted that all three objects use the
> same mpi_comm, task parallel scheme, block size,
> dof_handler, constraints and quadrature.

You can do all of this through one MatrixFree object that holds the data
only once. The main idea is that you need the geometry information once
also when you evaluate several fields, in a departure from FEValues
which sets up the internal mapping data for each field. The different
FEEvaluation then call 'evaluate(true,false);' for the field that only
needs values and 'evaluate(false,true)' for the field that only needs
gradients. In the loop over quadrature points, you can ask one of those
two FEEvaluation objects (it doesn't really matter which ones because
they only keep pointers to the quadrature point data) to provide
'FEEvaluation::quadrature_point(q);' which gets combined with the values
or gradients in evaluate().

> (3) It appears that vectors which are used with matrix-free need to be
> initialized with MatrixFree<dim,number::initialize_dof_vector().
> I suppose if the only difference between two
> different MatrixFree objects is UpdateFlags, it does not matter which
> one to use for initialization?
> Supposedly only mpi_communicator and DoFHandler influence the result.
You are right, only mpi_communicator and DoFHandler do matter for this
(and the constraints in case they give rise to some additional vector
entry at the interface between different processors).

> (4) LinearAlgebra::distributed::Vector<double> seems to be different
> from those in Trilinos and PETSc in that it can be used both 
> during solution of SLAE (writing into) and for evaluation of FE fields
> (i.e. error estimator). That is, it behaves both as fully distributed
> vector and the one with
> ghost entries. 
>
> My understanding is the following should work:
>
> // solve SLAE
> constraints.distribute (solution);
> solution.update_ghost_values();
> // do something like evaluate solution at quadrature points
> // solve SLAE again with the same vector
> constraints.distribute (solution);
Yes, this works. The design principle of the vector is that you can
switch between a one-to-one vector that is used for solving linear
systems and one with access to ghost entries. This is mostly performance
reasons because I have applications where the whole integration stuff is
only 2-3 times as expensive as copying the whole vector content into the
right format at the start and end of the integration loop. I don't want
to waste 20-30% of time on those ;-)

The way to switch between the two states is:
vector1.update_ghost_values(solution); // update ghost entries on remote
processors
// evaluation on quadrature points
vector2.compress(VectorOperation::add/set); // accumulate data from
ghost entries to owner after computing weak forms
vector1.zero_out_ghosts(); // remove ghost entries, disallow read-only
access to ghost entries

We keep track of whether a vector has updated its ghost values and you
will get exception in case the algorithm detects that it has not been
the case. It does not work always but I find it quite helpful. Given the
possibility for ambiguity, I have plans to integrate this into the
ReadWriteVector / VectorSpaceVector concept, see #2535, but I've not
really got there yet because it requires some fundamental re-design of
the data structures.

> (5) I guess if evaluation of a function with VectorizedArray has some
> branches (ie if x > 0 ... else...), then this part has to be evaluated
> manually without SIMD
> by looping over VectorizedArray<double>::n_array_elements. At least
> that's what i would expect.
Yes indeed, you have to use this manual loop in case your comparison
does not fit into something simple such as std::abs(), std::min(),
std::max() for which VectorizedArray provides overloads. (x86-64 SIMD
keeps on adding some useful things with each iteration - like masks -
but we have not included those in the wrapper because there is no clear
application yet.)

Best,
Martin

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [deal.II] a few questions about matrix-free

Reply via email to