75. The Chebyshev Method#
Let \(A\) be SPD, and \(C\) an SPD preconditioner. If we perform \(n\) steps of the Richardson iteration with damping parameter \(\tau\), the error \(e^n\) follows from the initial error \(e^0\) via
Now, we allow to use different damping parameters \(\tau_k\) in every iteration. Then
With the polynomial
we can write
We observe that \(p(.)\) is a polynomial of degree \(n\), such that \(p(0) = 1\). Let \((\lambda_i, z^i)\) be the eigen-system of \(C^{-1} A\), and expand errors with respect to the \(C\)-orthonormal eigen-basis:
The goal is now to find damping parameters \(\tau_1, \ldots \tau_n\) such that
is minimal. It is rarely feasible to work with the precise spectrum. But often we have bounds such that \(\sigma(C^{-1} A) \subset [\gamma_1, \gamma_2]\) with \(0 < \gamma_1 < \gamma_2\). Then, we can simplify the problem to optimize the polynomial such that
The solution to this min-max problem is given by Chebyshev polynomials.
75.1. Chebyshev polynomials#
Chebyshev polynomials (of the first kind) are defined via the three-term recurrence relation
Using induction and trigonometric addition formulas one easily shows that
and thus
def Cheby(n,x):
T,Told = x, 1+0*x # for vectorization
for i in range(n):
T,Told = 2*x*T-Told, T
return Told
%matplotlib widget
import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(figsize=(6, 4))
fig.canvas.toolbar_visible = True
fig.canvas.header_visible = False
ax.set_ylim([-3, 3])
ax.grid(True)
x = np.linspace(-1.2, 1.2, 500)
for k in [1, 2, 5, 20]:
ax.plot (x, Cheby(k,x), label='T'+str(k))
ax.legend();
We rescale
the argument such that \(\lambda = \gamma_1\) is mapped to \(-1\) and \(\lambda = \gamma_2\) is mapped to \(+1\),
and scale the range such that \(\widetilde T_n(0) = 1\):
def ScaledCheby(n,lam,gamma1, gamma2):
x = (-gamma2-gamma1)/(gamma2-gamma1)+2/(gamma2-gamma1)*lam
fac = 1/Cheby(n, (-gamma2-gamma1)/(gamma2-gamma1))
return fac*Cheby(n,x)
We compare the scaled Chebyshev polynomial to the error reduction of \(n\) steps Richardson iteration for the error component in the eigen-space corresponding to an eigen-value \(\lambda\): $\( (1 - \tau_{\text{opt}} \lambda)^n \qquad \text{with} \qquad \tau_\text{opt} = \frac{2}{\gamma_1 + \gamma_2} \)$
%matplotlib widget
import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(figsize=(6, 4))
fig.canvas.toolbar_visible = True
fig.canvas.header_visible = False
ax.set_ylim([-1.5, 1.5])
ax.grid(True)
# ax.set_title("")
gamma2 = 1
@widgets.interact(n=(0, 50, 1), gamma1=(0.01, 1, 0.01))
def update(n = 5, gamma1=0.1):
s1 = np.linspace(0, gamma1, 500)
s2 = np.linspace(gamma1, gamma2, 500)
for l in list(ax.lines): l.remove()
ax.plot(s1, ScaledCheby(n,s1,gamma1, gamma2), color='r', linestyle='dashed')
ax.plot(s2, ScaledCheby(n,s2,gamma1, gamma2), color='r', label='Cheby')
tauopt = 2/(gamma1+gamma2)
ax.plot(s1, (1-tauopt*s1)**n, color='b', linestyle='dashed')
ax.plot(s2, (1-tauopt*s2)**n, color='b', label='Richardson')
ax.legend();
The maximum of \(\widetilde T\) on the interval \([\gamma_1, \gamma_2]\) is
A representation of Chebyshev polynomials for \(x \not\in [-1,1]\) is
which is also easily proven by induction. From this we get
With the condition number \(\kappa = \gamma_2 / \gamma_1\) there is
To achieve an error reduction by \(\rho_n = \varepsilon\) one needs
iterations.
75.2. The Chebyshev iteration#
If we choose the optimal damping parameters \(\tau_i\) for \(n\) steps we obtain an error
or substituting back
defining \(t_n = T_{n} \big( \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} \big)\) and shifting the index we have
using the three-term recurrence we obtain
inserting also the recurrence for the \(t_n\) we have
Now, we use that the error is \(e^n = x^n - x^\ast\):
Most \(x^\ast\) cancel out, and thus
This is an algorithm to compute the iterates \(x^n\). The new iterate follows from the previous two, and from the (preconditioned) residual \(C^{-1} (b - A x^n)\)
Instead of working with three iterates \(x^k\), we can introduce the increment
Using \(x^{n+1} = x^n + d^n\) and \(x^{n-1} = x^n - d^{n-1}\) we obtain
By the three-term recurrence for \(t_n\) there is \(t_{n+1} = 2 \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} t_n - t_{n-1}\). Thus, \(x^n\) cancel out and we have an update formula for \(d\):
Finally, we define \(\rho_n = \frac{t_n}{t_{n+1}}\) which satisfies the recurrence \(\rho_n = \big( 2 \tfrac{-\gamma_1-\gamma_2}{\gamma2-\gamma1} - \rho_{n-1}\big)^{-1}\)
The final algorithm is taken from Yousef Saad: “Iterative Methods for Sparse Linear Systems”
Algorithm 12.1 on page 399:
def ChebyIteration(A, b, pre, gamma1, gamma2, tol=1e-8, maxit=200,
callback=None):
x = b.CreateVector()
r = b.CreateVector()
w = b.CreateVector()
d = b.CreateVector()
x[:] = 0
r.data = b - A*x
theta = (gamma1+gamma2)/2
delta = (gamma2-gamma1)/2
sigma1 = theta/delta
rho = 1/sigma1
w.data = pre*r
d.data = 1/theta * w
err0 = sqrt(InnerProduct(r,w))
for it in range(maxit):
x += d
r -= A * d
w.data = pre * r
err = sqrt(InnerProduct(w,r))
# print ("Iteration", it, "err=", err)
if callback: callback(it,err)
d.data *= rho
d.data += 2/delta * pre * r
rho = 1/(2*sigma1-rho)
d.data *= rho
if err < tol*err0: break
return x
from ngsolve import *
from netgen.geom2d import unit_square
mesh = Mesh(unit_square.GenerateMesh(maxh=0.1))
fes = H1(mesh, order=1)
u,v = fes.TnT()
a = BilinearForm(grad(u)*grad(v)*dx+10*u*v*dx).Assemble()
f = LinearForm(x*y*v*dx).Assemble()
gfu = GridFunction(fes)
pre = a.mat.CreateSmoother() # Jacobi preconditioner
# pre = IdentityMatrix(a.mat.height)
We use the ngsolve-function EigenValues_Preconditioner
to compute a few eigenvalues of \(C^{-1} A\) using the Lanczos method. We get a good approximation for the largest and smallest eigenvalues.
from ngsolve.la import EigenValues_Preconditioner
lams = list(EigenValues_Preconditioner(mat=a.mat, pre=pre))
gamma1, gamma2 = lams[0], lams[-1]
print (gamma1,gamma2)
0.023921286926102014 1.6757907507089311
errhist = []
gfu.vec.data = ChebyIteration(a.mat, f.vec, pre, gamma1, gamma2, maxit=500,
# callback=lambda it,err: print("it=",it,"err=",err)
callback = lambda it,err: errhist.append(err)
)
fig, ax = plt.subplots(figsize=(4, 3))
fig.canvas.header_visible = False
ax.set_yscale('log')
ax.plot(errhist)
plt.show();
from ngsolve.webgui import Draw
Draw (gfu);
Experiments:
modify the mesh-size
how sensitive is the method when under/over-estimating \(\gamma_1\) ?
and \(\gamma_2\) ?