102. Consistent and Distributed Vectors#

CC Douglas, G Haase, U Langer: A tutorial on elliptic PDE solvers and their parallelization, SIAM (2003)

from ipyparallel import Cluster
c = await Cluster(engines="mpi").start_and_connect(n=4, activate=True)
c.ids
Starting 4 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
100%
 4/4 [00:05<00:00,  5.94s/engine]
[0, 1, 2, 3]
%%px
from mpi4py import MPI
comm = MPI.COMM_WORLD

from ngsolve import *

mesh = Mesh(unit_square.GenerateMesh(maxh=0.2, comm=comm))

102.1. Consistent vectors:#

All local coefficient vectors of a GridFunction store the same value. We call this consistent or cumulated representation.

%%px
fes = H1(mesh, order=1)
gfu = GridFunction(fes)
gfu.Set(1)
print ("parallel vector:", gfu.vec)
[stdout:2] parallel vector: CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
[stdout:0] parallel vector: CUMULATED
[stdout:1] parallel vector: CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
[stdout:3] parallel vector: CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1

102.2. Distributed vectors and matrices#

Recall the finite element assembling of the matrix and the right hand side vector. One computes element-vectors fT and element-matrices AT as the contribution from each single element to the integral, and assembles them together using connectivity matrices CT:

f=TCTfTA=TCTATCTT

The matrices CT are high rectangular matrices of shape dim(Vh)×dim(VT), and look like

(...1......1......1)

They are represented by the dof nrs of the element, and are never formed as dense matrices.

Very similarly, in the distributed parallel framework, the right hand side vector and the matrix are formed by sub-domain wise sub-assembled vectors and matrices:

f=ΩiCΩifΩiA=ΩiCΩiAΩiCΩiT

But now, only the sub-domain local vectors fΩi and local matrices AΩi are assembled from the element vectors and matrices. The connectivity between sub-domains is represented by the ParallelDofs - class.

The values of the global vector f are understood as the sum of all processor-wise local entries of identified dofs. We call this type of vector storage distributed.

%%px
u,v = fes.TnT()
f = LinearForm(v*dx).Assemble()
print (f.vec)
[stdout:0] DISTRIBUTED
[stdout:2] DISTRIBUTED
 0.00929141
 0.0115905
 0.0143129
 0.0157114
 0.012363
 0.0183463
 0.0327422
 0.0232373
 0.0382184
 0.0151684
 0.0134339
 0.0339567
 0.0187142
 0.0148526
 0.0133493
 0.0228366
 0.0344815
[stdout:1] DISTRIBUTED
 0.00666667
 0.00436698
 0.0243206
 0.0233311
 0.00641114
 0.0134663
 0.0174519
 0.0231253
 0.0241589
 0.0596323
 0.0233791
 0.00976904
 0.0123374
 0.030805
 0.0421362
 0.0232821
 0.0185708
[stdout:3] DISTRIBUTED
 0.00666667
 0.00444576
 0.00532115
 0.0159095
 0.019749
 0.0263138
 0.0185867
 0.019659
 0.0139698
 0.0156347
 0.0344221
 0.0278865
 0.0320692
 0.0231229
 0.00838466
 0.0152942
 0.00674705

If u is a global vector of a gridfunction, then the sub-domain wise local vectors are

uΩi=CΩiTu

These uΩi are stored as consistent vectors.

102.3. Inner products:#

It is natural to form the Euclidean inner product of a distributed vector and a consistent vector:

f,u=CΩifi,u=fi,CΩiTu=fi,ui

The global inner product is the sum of inner products of the local vectors. It does not work as simple to compute the inner product of two consistent, or of two distributed vectors.

The ngsolve built-in inner product returns the same value as a sequential computation, and it returns the same value for every processor:

%%px
print (InnerProduct(f.vec,gfu.vec))
[stdout:1] 1.0
[stdout:3] 1.0
[stdout:2] 1.0
[stdout:0] 1.0

This inner product can be easily formed from local ips by summing up the local contributions. The required communication is summing up one double value.

%%px
localip = InnerProduct(f.vec.local_vec, gfu.vec.local_vec)
print ("local=", localip, "global=", comm.allreduce(localip))
[stdout:1] local= 0.36321061217922673 global= 1.0
[stdout:3] local= 0.2941827144102711 global= 1.0
[stdout:0] local= 0.0 global= 1.0
[stdout:2] local= 0.3426066734105022 global= 1.0

102.4. Matrix vector multiplication:#

Assume A is stored as distributed matrix, and u is a consistent vector. Then

r=Au=CΩiAiCΩiTu=CΩiAiui=CΩiri

We perform local matrix-vector products ri=Aiui, and interpret the results ri as local vectors of the global result in distributed storage. Here, no communication is needed at all.

102.5. Vector operations:#

Adding vectors of the same type is a purely local operation. One can simple add the local vectors, if both are either stored consistently, or distributed.

Distributed to Consistent conversion:

This operation requires global communication. All local values identified as the same global dof are summed up, and all processors get the same value.

Consistent to Distributed conversion:

This operation can be performed locally. It is not unique: One can divide a value by the number of processes sharing this dof, or just give the value to one process, e.g. the one with the smallest rank, and give 0 to all other processes.

%%px
gfu.vec.Distribute()
print (gfu.vec)
gfu.vec.Cumulate()
print (gfu.vec)
[stdout:0] DISTRIBUTED


CUMULATED
[stdout:1] DISTRIBUTED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1


CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
[stdout:3] DISTRIBUTED
       1
       0
       0
       1
       1
       1
       1
       1
       1
       0
       1
       1
       0
       0
       0
       0
       0


CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
[stdout:2] DISTRIBUTED
       1
       0
       1
       1
       1
       0
       1
       1
       1
       1
       1
       1
       0
       1
       0
       1
       1


CUMULATED
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
       1
c.shutdown(hub=True)