Consistent and Distributed Vectors

102. Consistent and Distributed Vectors#

CC Douglas, G Haase, U Langer: A tutorial on elliptic PDE solvers and their parallelization, SIAM (2003)

from ipyparallel import Cluster
c = await Cluster(engines="mpi").start_and_connect(n=4, activate=True)
c.ids

Starting 4 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>

[0, 1, 2, 3]

%%px
from mpi4py import MPI
comm = MPI.COMM_WORLD

from ngsolve import *

mesh = Mesh(unit_square.GenerateMesh(maxh=0.2, comm=comm))

102.1. Consistent vectors:#

All local coefficient vectors of a GridFunction store the same value. We call this consistent or cumulated representation.

%%px
fes = H1(mesh, order=1)
gfu = GridFunction(fes)
gfu.Set(1)
print ("parallel vector:", gfu.vec)

[stdout:2] parallel vector: CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1

[stdout:0] parallel vector: CUMULATED

[stdout:1] parallel vector: CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1

[stdout:3] parallel vector: CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1

102.2. Distributed vectors and matrices#

Recall the finite element assembling of the matrix and the right hand side vector. One computes element-vectors \(f_T\) and element-matrices \(A_T\) as the contribution from each single element to the integral, and assembles them together using connectivity matrices \(C_T\):

\[ f = \sum_T C_T f_T \qquad A = \sum_T C_T A_T C_T^T \]

The matrices \(C_T\) are high rectangular matrices of shape \(\operatorname{dim}(V_h) \times \operatorname{dim}(V_T)\), and look like

\[\begin{split} \left( \begin{array}{ccc} . & . & . \\ 1 & . & . \\ . & . & . \\ . & 1 & . \\ . & . & . \\ . & . & 1 \end{array} \right) \end{split}\]

They are represented by the dof nrs of the element, and are never formed as dense matrices.

Very similarly, in the distributed parallel framework, the right hand side vector and the matrix are formed by sub-domain wise sub-assembled vectors and matrices:

\[ f = \sum_{\Omega_i} C_{\Omega_i} f_{\Omega_i} \qquad A = \sum_{\Omega_i} C_{\Omega_i} A_{\Omega_i} C_{\Omega_i}^T \]

But now, only the sub-domain local vectors \(f_{\Omega_i}\) and local matrices \(A_{\Omega_i}\) are assembled from the element vectors and matrices. The connectivity between sub-domains is represented by the ParallelDofs - class.

The values of the global vector \(f\) are understood as the sum of all processor-wise local entries of identified dofs. We call this type of vector storage distributed.

%%px
u,v = fes.TnT()
f = LinearForm(v*dx).Assemble()
print (f.vec)

[stdout:0] DISTRIBUTED

[stdout:2] DISTRIBUTED
00929141
0115905
0143129
0157114
012363
0183463
0327422
0232373
0382184
0151684
0134339
0339567
0187142
0148526
0133493
0228366
0344815

[stdout:1] DISTRIBUTED
00666667
00436698
0243206
0233311
00641114
0134663
0174519
0231253
0241589
0596323
0233791
00976904
0123374
030805
0421362
0232821
0185708

[stdout:3] DISTRIBUTED
00666667
00444576
00532115
0159095
019749
0263138
0185867
019659
0139698
0156347
0344221
0278865
0320692
0231229
00838466
0152942
00674705

If \(u\) is a global vector of a gridfunction, then the sub-domain wise local vectors are

\[ u_{\Omega_i} = C_{\Omega_i}^T u \]

These \(u_{\Omega_i}\) are stored as consistent vectors.

102.3. Inner products:#

It is natural to form the Euclidean inner product of a distributed vector and a consistent vector:

\[\begin{eqnarray*} \left< f, u \right> & = & \left< \sum C_{\Omega_i} f_i, u \right> \\ & = & \sum \left< f_i, C_{\Omega_i}^T u \right> \\ & = & \sum \left< f_i, u_i \right> \end{eqnarray*}\]

The global inner product is the sum of inner products of the local vectors. It does not work as simple to compute the inner product of two consistent, or of two distributed vectors.

The ngsolve built-in inner product returns the same value as a sequential computation, and it returns the same value for every processor:

%%px
print (InnerProduct(f.vec,gfu.vec))

[stdout:1] 1.0

[stdout:3] 1.0

[stdout:2] 1.0

[stdout:0] 1.0

This inner product can be easily formed from local ips by summing up the local contributions. The required communication is summing up one double value.

%%px
localip = InnerProduct(f.vec.local_vec, gfu.vec.local_vec)
print ("local=", localip, "global=", comm.allreduce(localip))

[stdout:1] local= 0.36321061217922673 global= 1.0

[stdout:3] local= 0.2941827144102711 global= 1.0

[stdout:0] local= 0.0 global= 1.0

[stdout:2] local= 0.3426066734105022 global= 1.0

102.4. Matrix vector multiplication:#

Assume \(A\) is stored as distributed matrix, and \(u\) is a consistent vector. Then

\[ r = A u = \sum C_{\Omega_i} A_i C_{\Omega_i}^T u = \sum C_{\Omega_i} A_i u_i = \sum C_{\Omega_i} r_i \]

We perform local matrix-vector products \(r_i = A_i u_i\), and interpret the results \(r_i\) as local vectors of the global result in distributed storage. Here, no communication is needed at all.

102.5. Vector operations:#

Adding vectors of the same type is a purely local operation. One can simple add the local vectors, if both are either stored consistently, or distributed.

Distributed to Consistent conversion:

This operation requires global communication. All local values identified as the same global dof are summed up, and all processors get the same value.

Consistent to Distributed conversion:

This operation can be performed locally. It is not unique: One can divide a value by the number of processes sharing this dof, or just give the value to one process, e.g. the one with the smallest rank, and give 0 to all other processes.

%%px
gfu.vec.Distribute()
print (gfu.vec)
gfu.vec.Cumulate()
print (gfu.vec)

[stdout:0] DISTRIBUTED

CUMULATED

[stdout:1] DISTRIBUTED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1


CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1

[stdout:3] DISTRIBUTED
       1
       0
       0
       1
       1
       1
       1
       1
       1
       0
       1
       1
       0
       0
       0
       0
       0


CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1

[stdout:2] DISTRIBUTED
       1
       0
       1
       1
       1
       0
       1
       1
       1
       1
       1
       1
       0
       1
       0
       1
       1


CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1

c.shutdown(hub=True)