102. Consistent and Distributed Vectors#
CC Douglas, G Haase, U Langer: A tutorial on elliptic PDE solvers and their parallelization, SIAM (2003)
from ipyparallel import Cluster
c = await Cluster(engines="mpi").start_and_connect(n=4, activate=True)
c.ids
Starting 4 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
[0, 1, 2, 3]
%%px
from mpi4py import MPI
comm = MPI.COMM_WORLD
from ngsolve import *
mesh = Mesh(unit_square.GenerateMesh(maxh=0.2, comm=comm))
102.1. Consistent vectors:#
All local coefficient vectors of a GridFunction
store the same value. We call this consistent or cumulated representation.
%%px
fes = H1(mesh, order=1)
gfu = GridFunction(fes)
gfu.Set(1)
print ("parallel vector:", gfu.vec)
[stdout:2] parallel vector: CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
[stdout:0] parallel vector: CUMULATED
[stdout:1] parallel vector: CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
[stdout:3] parallel vector: CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
102.2. Distributed vectors and matrices#
Recall the finite element assembling of the matrix and the right hand side vector. One computes element-vectors
The matrices
They are represented by the dof nrs of the element, and are never formed as dense matrices.
Very similarly, in the distributed parallel framework, the right hand side vector and the matrix are formed by sub-domain wise sub-assembled vectors and matrices:
But now, only the sub-domain local vectors ParallelDofs
- class.
The values of the global vector
%%px
u,v = fes.TnT()
f = LinearForm(v*dx).Assemble()
print (f.vec)
[stdout:0] DISTRIBUTED
[stdout:2] DISTRIBUTED
0.00929141
0.0115905
0.0143129
0.0157114
0.012363
0.0183463
0.0327422
0.0232373
0.0382184
0.0151684
0.0134339
0.0339567
0.0187142
0.0148526
0.0133493
0.0228366
0.0344815
[stdout:1] DISTRIBUTED
0.00666667
0.00436698
0.0243206
0.0233311
0.00641114
0.0134663
0.0174519
0.0231253
0.0241589
0.0596323
0.0233791
0.00976904
0.0123374
0.030805
0.0421362
0.0232821
0.0185708
[stdout:3] DISTRIBUTED
0.00666667
0.00444576
0.00532115
0.0159095
0.019749
0.0263138
0.0185867
0.019659
0.0139698
0.0156347
0.0344221
0.0278865
0.0320692
0.0231229
0.00838466
0.0152942
0.00674705
If
These
102.3. Inner products:#
It is natural to form the Euclidean inner product of a distributed vector and a consistent vector:
The global inner product is the sum of inner products of the local vectors. It does not work as simple to compute the inner product of two consistent, or of two distributed vectors.
The ngsolve built-in inner product returns the same value as a sequential computation, and it returns the same value for every processor:
%%px
print (InnerProduct(f.vec,gfu.vec))
[stdout:1] 1.0
[stdout:3] 1.0
[stdout:2] 1.0
[stdout:0] 1.0
This inner product can be easily formed from local ips by summing up the local contributions. The required communication is summing up one double value.
%%px
localip = InnerProduct(f.vec.local_vec, gfu.vec.local_vec)
print ("local=", localip, "global=", comm.allreduce(localip))
[stdout:1] local= 0.36321061217922673 global= 1.0
[stdout:3] local= 0.2941827144102711 global= 1.0
[stdout:0] local= 0.0 global= 1.0
[stdout:2] local= 0.3426066734105022 global= 1.0
102.4. Matrix vector multiplication:#
Assume
We perform local matrix-vector products
102.5. Vector operations:#
Adding vectors of the same type is a purely local operation. One can simple add the local vectors, if both are either stored consistently, or distributed.
Distributed to Consistent conversion:
This operation requires global communication. All local values identified as the same global dof are summed up, and all processors get the same value.
Consistent to Distributed conversion:
This operation can be performed locally. It is not unique: One can divide a value by the number of processes sharing this dof, or just give the value to one process, e.g. the one with the smallest rank, and give 0 to all other processes.
%%px
gfu.vec.Distribute()
print (gfu.vec)
gfu.vec.Cumulate()
print (gfu.vec)
[stdout:0] DISTRIBUTED
CUMULATED
[stdout:1] DISTRIBUTED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
[stdout:3] DISTRIBUTED
1
0
0
1
1
1
1
1
1
0
1
1
0
0
0
0
0
CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
[stdout:2] DISTRIBUTED
1
0
1
1
1
0
1
1
1
1
1
1
0
1
0
1
1
CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
c.shutdown(hub=True)