102. Consistent and Distributed Vectors#

CC Douglas, G Haase, U Langer: A tutorial on elliptic PDE solvers and their parallelization, SIAM (2003)

from ipyparallel import Cluster
c = await Cluster(engines="mpi").start_and_connect(n=4, activate=True)
c.ids
Starting 4 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
[0, 1, 2, 3]
%%px
from mpi4py import MPI
comm = MPI.COMM_WORLD

from ngsolve import *

mesh = Mesh(unit_square.GenerateMesh(maxh=0.2, comm=comm))

102.1. Consistent vectors:#

All local coefficient vectors of a GridFunction store the same value. We call this consistent or cumulated representation.

%%px
fes = H1(mesh, order=1)
gfu = GridFunction(fes)
gfu.Set(1)
print ("parallel vector:", gfu.vec)
[stdout:2] parallel vector: CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
[stdout:0] parallel vector: CUMULATED
[stdout:1] parallel vector: CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
[stdout:3] parallel vector: CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1

102.2. Distributed vectors and matrices#

Recall the finite element assembling of the matrix and the right hand side vector. One computes element-vectors \(f_T\) and element-matrices \(A_T\) as the contribution from each single element to the integral, and assembles them together using connectivity matrices \(C_T\):

\[ f = \sum_T C_T f_T \qquad A = \sum_T C_T A_T C_T^T \]

The matrices \(C_T\) are high rectangular matrices of shape \(\operatorname{dim}(V_h) \times \operatorname{dim}(V_T)\), and look like

\[\begin{split} \left( \begin{array}{ccc} . & . & . \\ 1 & . & . \\ . & . & . \\ . & 1 & . \\ . & . & . \\ . & . & 1 \end{array} \right) \end{split}\]

They are represented by the dof nrs of the element, and are never formed as dense matrices.

Very similarly, in the distributed parallel framework, the right hand side vector and the matrix are formed by sub-domain wise sub-assembled vectors and matrices:

\[ f = \sum_{\Omega_i} C_{\Omega_i} f_{\Omega_i} \qquad A = \sum_{\Omega_i} C_{\Omega_i} A_{\Omega_i} C_{\Omega_i}^T \]

But now, only the sub-domain local vectors \(f_{\Omega_i}\) and local matrices \(A_{\Omega_i}\) are assembled from the element vectors and matrices. The connectivity between sub-domains is represented by the ParallelDofs - class.

The values of the global vector \(f\) are understood as the sum of all processor-wise local entries of identified dofs. We call this type of vector storage distributed.

%%px
u,v = fes.TnT()
f = LinearForm(v*dx).Assemble()
print (f.vec)
[stdout:0] DISTRIBUTED
[stdout:2] DISTRIBUTED
 0.00929141
 0.0115905
 0.0143129
 0.0157114
 0.012363
 0.0183463
 0.0327422
 0.0232373
 0.0382184
 0.0151684
 0.0134339
 0.0339567
 0.0187142
 0.0148526
 0.0133493
 0.0228366
 0.0344815
[stdout:1] DISTRIBUTED
 0.00666667
 0.00436698
 0.0243206
 0.0233311
 0.00641114
 0.0134663
 0.0174519
 0.0231253
 0.0241589
 0.0596323
 0.0233791
 0.00976904
 0.0123374
 0.030805
 0.0421362
 0.0232821
 0.0185708
[stdout:3] DISTRIBUTED
 0.00666667
 0.00444576
 0.00532115
 0.0159095
 0.019749
 0.0263138
 0.0185867
 0.019659
 0.0139698
 0.0156347
 0.0344221
 0.0278865
 0.0320692
 0.0231229
 0.00838466
 0.0152942
 0.00674705

If \(u\) is a global vector of a gridfunction, then the sub-domain wise local vectors are

\[ u_{\Omega_i} = C_{\Omega_i}^T u \]

These \(u_{\Omega_i}\) are stored as consistent vectors.

102.3. Inner products:#

It is natural to form the Euclidean inner product of a distributed vector and a consistent vector:

\[\begin{eqnarray*} \left< f, u \right> & = & \left< \sum C_{\Omega_i} f_i, u \right> \\ & = & \sum \left< f_i, C_{\Omega_i}^T u \right> \\ & = & \sum \left< f_i, u_i \right> \end{eqnarray*}\]

The global inner product is the sum of inner products of the local vectors. It does not work as simple to compute the inner product of two consistent, or of two distributed vectors.

The ngsolve built-in inner product returns the same value as a sequential computation, and it returns the same value for every processor:

%%px
print (InnerProduct(f.vec,gfu.vec))
[stdout:1] 1.0
[stdout:3] 1.0
[stdout:2] 1.0
[stdout:0] 1.0

This inner product can be easily formed from local ips by summing up the local contributions. The required communication is summing up one double value.

%%px
localip = InnerProduct(f.vec.local_vec, gfu.vec.local_vec)
print ("local=", localip, "global=", comm.allreduce(localip))
[stdout:1] local= 0.36321061217922673 global= 1.0
[stdout:3] local= 0.2941827144102711 global= 1.0
[stdout:0] local= 0.0 global= 1.0
[stdout:2] local= 0.3426066734105022 global= 1.0

102.4. Matrix vector multiplication:#

Assume \(A\) is stored as distributed matrix, and \(u\) is a consistent vector. Then

\[ r = A u = \sum C_{\Omega_i} A_i C_{\Omega_i}^T u = \sum C_{\Omega_i} A_i u_i = \sum C_{\Omega_i} r_i \]

We perform local matrix-vector products \(r_i = A_i u_i\), and interpret the results \(r_i\) as local vectors of the global result in distributed storage. Here, no communication is needed at all.

102.5. Vector operations:#

Adding vectors of the same type is a purely local operation. One can simple add the local vectors, if both are either stored consistently, or distributed.

Distributed to Consistent conversion:

This operation requires global communication. All local values identified as the same global dof are summed up, and all processors get the same value.

Consistent to Distributed conversion:

This operation can be performed locally. It is not unique: One can divide a value by the number of processes sharing this dof, or just give the value to one process, e.g. the one with the smallest rank, and give 0 to all other processes.

%%px
gfu.vec.Distribute()
print (gfu.vec)
gfu.vec.Cumulate()
print (gfu.vec)
[stdout:0] DISTRIBUTED


CUMULATED
[stdout:1] DISTRIBUTED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1


CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
[stdout:3] DISTRIBUTED
       1
       0
       0
       1
       1
       1
       1
       1
       1
       0
       1
       1
       0
       0
       0
       0
       0


CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
[stdout:2] DISTRIBUTED
       1
       0
       1
       1
       1
       0
       1
       1
       1
       1
       1
       1
       0
       1
       0
       1
       1


CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
c.shutdown(hub=True)