Preconditioning

74. Preconditioning#

We call $C$ a preconditioner to the matrix $A$ if

$C^{-1} A \approx I$
the matrix-vector multiplication $w = C^{-1} r$ is cheap

One extreme case is $C = A$, where the first goal is optimally satisfied, but (in general) not the second. The opposite extreme is $C = I$.

If $A$ is an SPD matrix, we like to have an SPD preconditioner $C$. In this case, the quality of the approximation can be measured by the spectral bounds

\[ 0 < \gamma_1 \leq \frac{x^T A x }{x^T C x} \leq \gamma_2 \qquad \forall \, 0 \neq x \in {\mathbb R^n} \]

The Rayleigh quotient is bounded by $\gamma_1$ and $\gamma_2$ from below and from above. These spectral bounds are bounds for the eigenvalues $\lambda$ of the generalized eigenvalue problem

\[ A x = \lambda C x \]

If $\lambda_i$ is an eigenvalue with eigenvector $x_i$, then $x_i^T A x_i = \lambda_i x_i^T C x_i$, and thus $\lambda_i \in [\gamma_1, \gamma_2]$.

74.1. The preconditioned Richardson iteration#

We use the preconditioner to obtain the correction from the residuum:

\[ \qquad x^{k+1} = x^k + \alpha C^{-1} (b - A x^k) \]

The error is now propagated as

\[ e^{k+1} = M e^k = (I - \alpha C^{-1} A) e^k \]

If we could use the ideal preconditioner $C = A$, and set $\alpha = 1$, then $M = 0$, and we converge in one iteration.

The error-propagation matrix $M$ is self-adjoint in the energy inner product

\[\begin{align*} \left< M x, y \right>_A &= (A M x)^T y \\ & = \left\{ A (x - \alpha C^{-1} A x) \right\}^T y \\ & = x^T (A - \alpha A C^{-1} A) y \\ & = \left< x, M y \right>_A \end{align*}\]

as well as in the inner products

\[ \left< x, y \right>_C \qquad \text{and} \qquad \left< x, y \right>_{AC^{-1} A}. \]

The error is monotonically decreased in the corresponding norms. In particular the last one is practically interesting since it is computationally available:

\[\begin{eqnarray*} \| x - x^\ast \|_{AC^{-1} A}^2 & = & \| A (x - x^\ast) \|_{C^{-1}}^2 \\ & = & \| A x - b \|_{C^{-1}}^2 \end{eqnarray*}\]

With the residuum $r$ and preconditioned residuum $w$, i.e.

\[ r = b - A x \qquad \text{and} \qquad w = C^{-1} r \]

the error becomes

\[ \| x - x^\ast \|_{AC^{-1}A}^2 = r^T w \]

from ngsolve import *
mesh = Mesh(unit_square.GenerateMesh(maxh=0.1))
fes = H1(mesh, order=1)
u,v = fes.TnT()
a = BilinearForm(grad(u)*grad(v)*dx+10*u*v*dx).Assemble()
f = LinearForm(x*y*v*dx).Assemble()
gfu = GridFunction(fes)

A very simple preconditioner is the Jacobi-preconditioner

\[ C = \text{diag} A \]

If $A$ is SPD, then all diagonal entries are positive, and $C$ is SPD as well.

In NGSolve, we can obtain a Jacobi preconditioner as follows:

pre = a.mat.CreateSmoother()

The result is a linear operator pre providing the linear operation

\[ w := C^{-1} * r \]

preforming the vector iteration to obtain the largest eigenvalue of $C^{-1} A$:

hv = gfu.vec.CreateVector()
hv2 = gfu.vec.CreateVector()
hv3 = gfu.vec.CreateVector()
hv.SetRandom()
hv.data /= Norm(hv)
for k in range(20):
    hv2.data = a.mat * hv
    hv3.data = pre * hv2
    rho = Norm(hv3)
    print (rho)
    hv.data = 1/rho * hv3

5628463859901822
3004844587339315
3642450612653385
4017278282832695
428927912054942
451197927296262
4709697187344926
489456390475471
5072557909082451
5246072026789994
5415208146951107
5578609944298918
5734139487420609
587947333438726
6012589050513686
613209024712011
6237346368041081
632846787976177
6406169867162068
6471585575973842

Run the preconditioned Richardson iteration with diagonal preconditioning:

alpha = 1 / rho
r = f.vec.CreateVector()
w = f.vec.CreateVector()
gfu.vec[:] = 0

w.data = pre * f.vec
err0 = sqrt(InnerProduct(f.vec, w))
its = 0
errhist = []
while True:
    r.data = f.vec - a.mat * gfu.vec
    w.data = pre * r
    err = sqrt(InnerProduct(r,w))
    # print ("iteration", its, "res=", err)
    errhist.append (err)
    gfu.vec.data += alpha * w
    if err < 1e-8 * err0 or its > 10000: break
    its = its+1
print ("needed", its, "iterations")

needed 1234 iterations

import matplotlib.pyplot as plt
plt.yscale('log')
plt.plot (errhist);

../_images/3d3161855f5bcd3ae5c20390d136d7d639aa79cc3457ed270bb2a689cbdc4fd5.png

By situation is not considerably improved by the diagonal preconditioner. However, if we have a bilinear-form with variable coefficient, or large coefficients in the Robin - boundary condition such as

\[ A(u,v) = \int_\Omega \nabla u \nabla v \, dx + 10^8 \int_{\Gamma_R} u v \, ds, \]

the Jacobi preconditioner captures these parameters (experiment in excercise, theory soon).

74.2. The preconditioned gradient method#

To introduce the preconditioner into the gradient method we proceed as follows. Since $C$ is SPD, we are allowed to form its square-root

\[ C^{1/2} \]

as well as its inverse. The linear system $A x = b$ is equivalent to

\[ C^{-1/2} A C^{-1/2} \; C^{1/2} x = C^{-1/2} b. \]

With the definition of transformed quantities

\[ \tilde A = C^{-1/2} A C^{-1/2}, \qquad \tilde b = C^{-1/2} b, \qquad \tilde x = C^{1/2} x \]

we have the linear system

\[ \tilde A \tilde x = \tilde b. \]

The transformed matrix $\tilde A$ is SPD as well.

We apply the gradient method for the transformed system:

Given $\tilde x^0$
$\tilde r^0 = \tilde b - \tilde A \tilde x^0$
for $k = 0, 1, 2, \ldots$
$\qquad \tilde p = \tilde A \tilde r^k$
$\qquad \alpha = \frac{{\tilde r^k}^T \tilde r^k }{ {\tilde r^k}^T \tilde p}$
$\qquad \tilde x^{k+1} = \tilde x^k + \alpha \tilde r^k$
$\qquad \tilde r^{k+1} = \tilde r^k - \alpha \tilde p$

The error analysis of the gradient method provides

\[ \| \tilde x^\ast - \tilde x^{k+1} \|_{\tilde A} \leq \rho \| \tilde x^\ast - \tilde x^{k} \|_{\tilde A}, \]

and substituting back

\[ \| x^\ast - x^{k+1} \|_A \leq \rho \| x^\ast - x^{k} \|_A \]

Since we cannot compute with the transformed system we transform back via

\[ \tilde x^k = C^{1/2} x^k, \qquad \tilde r^k = C^{-1/2} r, \qquad \tilde p = C^{-1/2} p \]

and obtain

Given $x^0$
$C^{-1/2} r^0 = C^{-1/2} b - C^{-1/2} A C^{-1/2} C^{1/2} x^0$
for $k = 0, 1, 2, \ldots$
$\qquad C^{-1/2} p = C^{-1/2} A C^{-1/2} C^{-1/2} r^k$
$\qquad \alpha = \frac{\{C^{-1/2} r^k \}^T C^{-1/2} r^k \, }{ \, \{ C^{-1/2} r^k\}^T C^{-1/2} p}$
$\qquad C^{1/2} x^{k+1} = C^{1/2} x^k + \alpha C^{-1/2} r^k$
$\qquad C^{-1/2} r^{k+1} = C^{-1/2} r^k - \alpha C^{-1/2} p$

now we simplify, and introduce $w = C^{-1} r$

Given $x^0$
$r^0 = b - A x^0$
for $k = 0, 1, 2, \ldots$
$\qquad w = C^{-1} r^k$
$\qquad p = A w$
$\qquad \alpha = \frac{w^T r^k}{w^T p^k}$
$\qquad x^{k+1} = x^k + \alpha w$
$\qquad r^{k+1} = r^k - \alpha p$

r = f.vec.CreateVector()
w = f.vec.CreateVector()
p = f.vec.CreateVector()

gfu.vec[:] = 0
r.data = f.vec
w.data = pre*r
err0 = sqrt(InnerProduct(r,w))
its = 0
errhist = []
while True:
    w.data = pre*r
    p.data = a.mat * w
    err2 = InnerProduct(w,r)
    alpha = err2 / InnerProduct(w,p)

    # print ("iteration", its, "res=", sqrt(err2))
    errhist.append (sqrt(err2))
    gfu.vec.data += alpha * w
    r.data -= alpha * p
    if sqrt(err2) < 1e-8 * err0 or its > 10000: break
    its = its+1
print ("needed", its, "iterations")

needed 602 iterations

plt.yscale('log')
plt.plot(errhist);

../_images/5dafe1ca48f2535ab84c78a1ea7f693d0fe939534b7f08988dd9b22975246cbf.png

74.3. Jacobi and Gauss Seidel Preconditioners#

For a given residual $r$, the Jacobi preconditioner computes

\[ w = D^{-1} r \]

with $D = \text{diag} A$. If $r_i$ is the $i^{th}$ component of the residual, the correction step of the $i^{th}$ variable is $w_i = A_{ii}^{-1} r_i$. All these correction steps are computed for the same residual.

In contrast, the Gauss-Seidel iteration updates the $i^{th}$ variable from the residual computed from all up to date variables:

for $i = 1, \ldots, n$:
$\qquad \hat x_i = x_i + A_{ii}^{-1} \big( b_i - \sum_{j=1}^{i-1} A_{ij} \hat x_j - \sum_{j=i}^n A_{ij} x_j \big)$

The method depends on the enumeration of variables. E.g, one could loop backwards, and thus obtains the so-called backward Gauss-Seidel iteration.

There holds for every $1 \leq i \leq n$: $$ \sum_{j=1}^i A_{ij} \hat x_j = b_i - \sum_{j=i+1}^n A_{ij} x_j $$

If we split the matrix

\[ A = L + D + R \]

with a strictly lower triangular matrix $L$, a diagonal matrix $D$, and a strictly upper diagonal matrix $R$ we can write

\[ (L+D) \hat x = b - R x \]

or

\[ (L+D) \hat x = (L+D) x + b - A x \]

or

\[ \hat x = x + (L+D)^{-1} (b - A x) \]

This means we can interpret the Gauss-Seidel iteration as a preconditioned Richardson iteration, with preconditioner $C = L+D$. In general, this is not a symmetric preconditioner.

If we run the backward Gauss-Seidel iteration, the preconditioner becomes $C = D + R$. Now, we combine one step forward, with one step backward Gauss-Seidel:

\[\begin{eqnarray*} \tilde x & = & x + (L+D)^{-1} (b - A x) \\ \hat x & = & \tilde x + (D+R)^{-1} (b - A \tilde x) \end{eqnarray*}\]

we can rewrite it as

\[\begin{eqnarray*} \hat x & = & (D+R)^{-1} (b - L \tilde x) \\ & = & (D+R)^{-1} \big(b - L (L+D)^{-1} (b - R x) \big) \\ & = & (D+R)^{-1} \big( (L+D)(L+D)^{-1} b - L (L+D)^{-1} b \big) + \ldots x \\ & = & (D+R)^{-1} D (L+D)^{-1} b + \ldots x \\ & = & x + (D+R)^{-1} D (L+D)^{-1} (b - A x) \end{eqnarray*}\]

and observe that the combined forward-backward Gauss-Seidel iteration is a Richardson method with preconditioner

\[ C_{FBGS}^{-1} = (D+R)^{-1} D (L+D)^{-1}. \]

If $A$ is symmetric, then $(L+D)^T = D+R$, and $C_{FBGS}$ is symmetric as well. If $A$ is SPD, then we obtain

\[ x^T C_{FBGS} x \geq x^T A x \qquad \forall \, x \]

i.e. the spectral constant $\gamma_2 = 1$, and choosing a damping parameter $\alpha = 1$ is guaranteed to converge.

We prove this by calculation:

\[\begin{eqnarray*} C_{FBGS} & = & (L+D) D^{-1} (D+R) \\ & = & L + D + R + L D^{-1} R \\ & = & A + L D^{-1} R \end{eqnarray*}\]

Since $x^T L D^{-1} R x = (R x)^T D^{-1} (R x) = \| R x \|_{D^{-1}}^2 \geq 0$ we have proven that

\[ C_{FBGS} \geq A \]

The error propagation matrix of the FB - GS is

\[ M_{FBGS} = \underbrace{(I - (D+R)^{-1} A)}_{M_{BGS}} \underbrace{ ( I - (D+L)^{-1} A) }_{M_{FBS}}, \]

and has norm $\| M_{FBGS} \|_A < 1$. The error propagation matrices of the forward GS and the backward GS are $A$-adjoint to each other:

\[ \left< M_{FGS} x, y \right>_A = \left< x, M_{BGS} y \right>_A \]

Thus, a single forward (or single backward Gauss-Seidel) step is also convergenct:

\[ \left< M_{FGS} x, M_{FGS} x \right>_A = \left< M_{BGS} M_{FGS} x, x \right>_A \leq \| M_{FBGS} \|_A \| x \|_A^2 \]

Exercise: Repeat the experiments with the forward-backward Gauss-Seidel preconditioner. You get it via pre = a.mat.CreateSmoother(GS=True). What do you observe for the spectral radius of $C^{-1} A$ ?

Preconditioning

Contents

74. Preconditioning#

74.1. The preconditioned Richardson iteration#

74.2. The preconditioned gradient method#

74.3. Jacobi and Gauss Seidel Preconditioners#