# 102. Consistent and Distributed Vectors#

CC Douglas, G Haase, U Langer: *A tutorial on elliptic PDE solvers and their parallelization*, SIAM (2003)

```
from ipyparallel import Cluster
c = await Cluster(engines="mpi").start_and_connect(n=4, activate=True)
c.ids
```

```
Starting 4 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
```

```
[0, 1, 2, 3]
```

```
%%px
from mpi4py import MPI
comm = MPI.COMM_WORLD
from ngsolve import *
mesh = Mesh(unit_square.GenerateMesh(maxh=0.2, comm=comm))
```

## 102.1. Consistent vectors:#

All local coefficient vectors of a `GridFunction`

store the same value. We call this *consistent* or *cumulated* representation.

```
%%px
fes = H1(mesh, order=1)
gfu = GridFunction(fes)
gfu.Set(1)
print ("parallel vector:", gfu.vec)
```

```
[stdout:2] parallel vector: CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
```

```
[stdout:0] parallel vector: CUMULATED
```

```
[stdout:1] parallel vector: CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
```

```
[stdout:3] parallel vector: CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
```

## 102.2. Distributed vectors and matrices#

Recall the finite element assembling of the matrix and the right hand side vector. One computes element-vectors \(f_T\) and element-matrices \(A_T\) as the contribution from each single element to the integral, and assembles them together using connectivity matrices \(C_T\):

The matrices \(C_T\) are high rectangular matrices of shape \(\operatorname{dim}(V_h) \times \operatorname{dim}(V_T)\), and look like

They are represented by the dof nrs of the element, and are never formed as dense matrices.

Very similarly, in the distributed parallel framework, the right hand side vector and the matrix are formed by sub-domain wise sub-assembled vectors and matrices:

But now, only the sub-domain local vectors \(f_{\Omega_i}\) and local matrices \(A_{\Omega_i}\) are assembled from the element vectors and matrices. The connectivity between sub-domains is represented by the `ParallelDofs`

- class.

The values of the global vector \(f\) are understood as the sum of all processor-wise local entries of identified dofs. We call this type of vector storage *distributed*.

```
%%px
u,v = fes.TnT()
f = LinearForm(v*dx).Assemble()
print (f.vec)
```

```
[stdout:0] DISTRIBUTED
```

```
[stdout:2] DISTRIBUTED
0.00929141
0.0115905
0.0143129
0.0157114
0.012363
0.0183463
0.0327422
0.0232373
0.0382184
0.0151684
0.0134339
0.0339567
0.0187142
0.0148526
0.0133493
0.0228366
0.0344815
```

```
[stdout:1] DISTRIBUTED
0.00666667
0.00436698
0.0243206
0.0233311
0.00641114
0.0134663
0.0174519
0.0231253
0.0241589
0.0596323
0.0233791
0.00976904
0.0123374
0.030805
0.0421362
0.0232821
0.0185708
```

```
[stdout:3] DISTRIBUTED
0.00666667
0.00444576
0.00532115
0.0159095
0.019749
0.0263138
0.0185867
0.019659
0.0139698
0.0156347
0.0344221
0.0278865
0.0320692
0.0231229
0.00838466
0.0152942
0.00674705
```

If \(u\) is a global vector of a gridfunction, then the sub-domain wise local vectors are

These \(u_{\Omega_i}\) are stored as consistent vectors.

## 102.3. Inner products:#

It is natural to form the Euclidean inner product of a distributed vector and a consistent vector:

The global inner product is the sum of inner products of the local vectors. It does not work as simple to compute the inner product of two consistent, or of two distributed vectors.

The ngsolve built-in inner product returns the same value as a sequential computation, and it returns the same value for every processor:

```
%%px
print (InnerProduct(f.vec,gfu.vec))
```

```
[stdout:1] 1.0
```

```
[stdout:3] 1.0
```

```
[stdout:2] 1.0
```

```
[stdout:0] 1.0
```

This inner product can be easily formed from local ips by summing up the local contributions. The required communication is summing up one double value.

```
%%px
localip = InnerProduct(f.vec.local_vec, gfu.vec.local_vec)
print ("local=", localip, "global=", comm.allreduce(localip))
```

```
[stdout:1] local= 0.36321061217922673 global= 1.0
```

```
[stdout:3] local= 0.2941827144102711 global= 1.0
```

```
[stdout:0] local= 0.0 global= 1.0
```

```
[stdout:2] local= 0.3426066734105022 global= 1.0
```

## 102.4. Matrix vector multiplication:#

Assume \(A\) is stored as distributed matrix, and \(u\) is a consistent vector. Then

We perform local matrix-vector products \(r_i = A_i u_i\), and interpret the results \(r_i\) as local vectors of the global result in distributed storage. Here, no communication is needed at all.

## 102.5. Vector operations:#

**Adding** vectors of the same type is a purely local operation. One can simple add the local vectors, if both are either stored consistently, or distributed.

**Distributed to Consistent** conversion:

This operation requires global communication. All local values identified as the same global dof are summed up, and all processors get the same value.

**Consistent to Distributed** conversion:

This operation can be performed locally. It is not unique: One can divide a value by the number of processes sharing this dof, or just give the value to one process, e.g. the one with the smallest rank, and give 0 to all other processes.

```
%%px
gfu.vec.Distribute()
print (gfu.vec)
gfu.vec.Cumulate()
print (gfu.vec)
```

```
[stdout:0] DISTRIBUTED
CUMULATED
```

```
[stdout:1] DISTRIBUTED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
```

```
[stdout:3] DISTRIBUTED
1
0
0
1
1
1
1
1
1
0
1
1
0
0
0
0
0
CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
```

```
[stdout:2] DISTRIBUTED
1
0
1
1
1
0
1
1
1
1
1
1
0
1
0
1
1
CUMULATED
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
```

```
c.shutdown(hub=True)
```