The Chebyshev Method

75. The Chebyshev Method#

Let $A$ be SPD, and $C$ an SPD preconditioner. If we perform $n$ steps of the Richardson iteration with damping parameter $\tau$, the error $e^n$ follows from the initial error $e^0$ via

\[ e^n = (I - \tau C^{-1} A)^n e^0. \]

Now, we allow to use different damping parameters $\tau_k$ in every iteration. Then

\[ e^n = (I - \tau_n C^{-1} A) \cdots (I - \tau_2 C^{-1} A )(I - \tau_1 C^{-1} A) e^0. \]

With the polynomial

\[ p(\lambda) = \Pi_{i=1}^n (1 - \tau_i \lambda) \]

we can write

\[ e^n = p(C^{-1} A) \, e^0 \]

We observe that $p(.)$ is a polynomial of degree $n$, such that $p(0) = 1$. Let $(\lambda_i, z^i)$ be the eigen-system of $C^{-1} A$, and expand errors with respect to the $C$-orthonormal eigen-basis:

\[ \| e^n \|_C^2 = \sum_{i=1}^N (e^n_i)^2 = \sum_{i=1}^N \big( p(\lambda_i) {e^0_i} \big)^2 \leq \max_{\lambda_i \in \sigma(C^{-1} A)} p(\lambda_i)^2 \; \| e^0 \|_C^2 \]

The goal is now to find damping parameters $\tau_1, \ldots \tau_n$ such that

\[ \max_{\lambda_i \in \sigma(C^{-1} A)} | p(\lambda_i) | \]

is minimal. It is rarely feasible to work with the precise spectrum. But often we have bounds such that $\sigma(C^{-1} A) \subset [\gamma_1, \gamma_2]$ with $0 < \gamma_1 < \gamma_2$. Then, we can simplify the problem to optimize the polynomial such that

\[ \min_{p \in \text{Pol(n)} \atop p(0) = 1} \max_{\lambda \in [\gamma_1, \gamma_2]} | p(\lambda) | \]

The solution to this min-max problem is given by Chebyshev polynomials.

75.1. Chebyshev polynomials#

Chebyshev polynomials (of the first kind) are defined via the three-term recurrence relation

\[\begin{eqnarray*} T_0(x) & = & 1 \\ T_1(x) & = & x \\ T_{n+1}(x) & = & 2 x T_n(x) - T_{n-1}(x) \qquad n \geq 1 \end{eqnarray*}\]

Using induction and trigonometric addition formulas one easily shows that

\[ T_n(x) = \cos( n \arccos (x)) \qquad \text{for} \; x \in [-1,1] \]

and thus

\[ \sup_{x \in [-1,1]} T_n(x) = 1 \]

def Cheby(n,x):
    T,Told = x, 1+0*x     # for vectorization
    for i in range(n):
        T,Told = 2*x*T-Told, T
    return Told

%matplotlib widget

import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(6, 4))
fig.canvas.toolbar_visible = True
fig.canvas.header_visible = False
ax.set_ylim([-3, 3])
ax.grid(True)

x = np.linspace(-1.2, 1.2, 500)
for k in [1, 2, 5, 20]:
    ax.plot (x, Cheby(k,x), label='T'+str(k))
ax.legend();

We rescale

the argument such that $\lambda = \gamma_1$ is mapped to $-1$ and $\lambda = \gamma_2$ is mapped to $+1$,
and scale the range such that $\widetilde T_n(0) = 1$:

\[ \widetilde T_n(\lambda) = \frac{1}{T_n \left(\tfrac{-\gamma_1-\gamma_2}{\gamma_2-\gamma_1}\right)} T_n \left(\tfrac{2 \lambda -\gamma_1-\gamma_2}{\gamma_2-\gamma_1}\right) \]

def ScaledCheby(n,lam,gamma1, gamma2):
    x = (-gamma2-gamma1)/(gamma2-gamma1)+2/(gamma2-gamma1)*lam
    fac = 1/Cheby(n, (-gamma2-gamma1)/(gamma2-gamma1))
    return fac*Cheby(n,x)

We compare the scaled Chebyshev polynomial to the error reduction of $n$ steps Richardson iteration for the error component in the eigen-space corresponding to an eigen-value $\lambda$: $$ (1 - \tau_{\text{opt}} \lambda)^n \qquad \text{with} \qquad \tau_\text{opt} = \frac{2}{\gamma_1 + \gamma_2} $$

%matplotlib widget

import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(6, 4))
fig.canvas.toolbar_visible = True
fig.canvas.header_visible = False

ax.set_ylim([-1.5, 1.5])
ax.grid(True)
# ax.set_title("")
 
gamma2 = 1

@widgets.interact(n=(0, 50, 1), gamma1=(0.01, 1, 0.01))
def update(n = 5, gamma1=0.1):
    s1 = np.linspace(0, gamma1, 500)
    s2 = np.linspace(gamma1, gamma2, 500)
    for l in list(ax.lines): l.remove()
    ax.plot(s1, ScaledCheby(n,s1,gamma1, gamma2), color='r', linestyle='dashed')
    ax.plot(s2, ScaledCheby(n,s2,gamma1, gamma2), color='r', label='Cheby')
    tauopt = 2/(gamma1+gamma2)
    ax.plot(s1, (1-tauopt*s1)**n, color='b', linestyle='dashed')
    ax.plot(s2, (1-tauopt*s2)**n, color='b', label='Richardson')
    
ax.legend();

The maximum of $\widetilde T$ on the interval $[\gamma_1, \gamma_2]$ is

\[ \rho_n := \frac{1}{T_n \big(\tfrac{-\gamma_1-\gamma_2}{\gamma_2-\gamma_1}\big)} \]

A representation of Chebyshev polynomials for $x \not\in [-1,1]$ is

\[ T_n(x) = \frac{1}{2} \, \left[ \big(x+\sqrt{x^2-1}\big)^n + \big(x+\sqrt{x^2-1}\big)^{-n} \right] \]

which is also easily proven by induction. From this we get

\[ \rho_n = \frac{2 c^n}{1 + c^{2n}} \qquad \text{with} \qquad c = \frac{\sqrt{\gamma_2} - \sqrt{\gamma_1}} { \sqrt{\gamma_2}+\sqrt{\gamma_1}} \]

With the condition number $\kappa = \gamma_2 / \gamma_1$ there is

\[ c \approx 1 - \frac{2}{\sqrt{\kappa}} \qquad \text{and} \qquad \rho_n \approx \big( 1 - \tfrac{2}{\sqrt{\kappa}} \big) ^n \]

To achieve an error reduction by $\rho_n = \varepsilon$ one needs

\[ n \approx \log \varepsilon^{-1} \sqrt{\kappa} \]

iterations.

75.2. The Chebyshev iteration#

If we choose the optimal damping parameters $\tau_i$ for $n$ steps we obtain an error

\[ e^n = \widetilde T_n(C^{-1} A) e^0 \]

or substituting back

\[ e^n = \frac{1}{T_n\big(\tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1}\big)} T_n\big( \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} I + \tfrac{2}{\gamma_2-\gamma_1} C^{-1} A \big) e^0 \]

defining $t_n = T_{n} \big( \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} \big)$ and shifting the index we have

\[ t_{n+1} e^{n+1} = T_{n+1} \big( \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} I + \tfrac{2}{\gamma_2-\gamma_1} C^{-1} A \big) e^0 \]

using the three-term recurrence we obtain

\[\begin{eqnarray*} t_{n+1} e^{n+1} & = & 2 \big( \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} I + \tfrac{2}{\gamma_2-\gamma_1} C^{-1} A \big) T_n \big( \ldots \big) e^0 - T_{n-1} \big( \ldots \big) e^0 \\ & = & 2 \big( \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} I + \tfrac{2}{\gamma_2-\gamma_1} C^{-1} A \big) t_n e^n - t_{n-1} e^{n-1} \end{eqnarray*}\]

inserting also the recurrence for the $t_n$ we have

\[ \big( 2 \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} t_n - t_{n-1} \big) e^{n+1} = 2 \big( \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} I + \tfrac{2}{\gamma_2-\gamma_1} C^{-1} A \big) t_n e^n - t_{n-1} e^{n-1} \]

Now, we use that the error is $e^n = x^n - x^\ast$:

\[\begin{align*} \big( 2 \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} t_n - t_{n-1} \big) (x^{n+1} - x^\ast) =& 2 \big( \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} I + \tfrac{2}{\gamma_2-\gamma_1} C^{-1} A \big) t_n (x^n-x^\ast) \\ & - t_{n-1} (x^{n-1} - x^\ast) \end{align*}\]

Most $x^\ast$ cancel out, and thus

\[ t_{n+1} x^{n+1} = 2 \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} t_n x^n - t_{n-1} x^{n-1} + \tfrac{4 t_n}{\gamma_2 - \gamma_1} C^{-1} (A x^n - \underbrace{A x^\ast}_b) \]

This is an algorithm to compute the iterates $x^n$. The new iterate follows from the previous two, and from the (preconditioned) residual $C^{-1} (b - A x^n)$

Instead of working with three iterates $x^k$, we can introduce the increment

\[ d^n = x^{n+1}-x^n \]

Using $x^{n+1} = x^n + d^n$ and $x^{n-1} = x^n - d^{n-1}$ we obtain

\[ t_{n+1} (x^{n}+d^n) = 2 \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} t_n x^n - t_{n-1} (x^{n}-d^{n-1}) + \tfrac{4 t_n}{\gamma_2 - \gamma_1} C^{-1} (A x^n - b). \]

By the three-term recurrence for $t_n$ there is $t_{n+1} = 2 \tfrac{-\gamma_2-\gamma_1}{\gamma_2-\gamma_1} t_n - t_{n-1}$. Thus, $x^n$ cancel out and we have an update formula for $d$:

\[ t_{n+1} d^n = t_{n-1} d^{n-1} + \tfrac{4 t_n}{\gamma_2 - \gamma_1} C^{-1} (A x^n - b) \]

Finally, we define $\rho_n = \frac{t_n}{t_{n+1}}$ which satisfies the recurrence $\rho_n = \big( 2 \tfrac{-\gamma_1-\gamma_2}{\gamma2-\gamma1} - \rho_{n-1}\big)^{-1}$

The final algorithm is taken from Yousef Saad: “Iterative Methods for Sparse Linear Systems”
Algorithm 12.1 on page 399:

def ChebyIteration(A, b, pre, gamma1, gamma2, tol=1e-8, maxit=200,
                    callback=None):
    x = b.CreateVector()
    r = b.CreateVector()
    w = b.CreateVector()
    d = b.CreateVector()
   
    x[:] = 0
    r.data = b - A*x
    
    theta = (gamma1+gamma2)/2
    delta = (gamma2-gamma1)/2
    sigma1 = theta/delta
    rho = 1/sigma1
    w.data = pre*r
    d.data = 1/theta * w
    
    err0 = sqrt(InnerProduct(r,w))
    for it in range(maxit):
        x += d
        r -= A * d
        w.data = pre * r
        err = sqrt(InnerProduct(w,r))
        # print ("Iteration", it, "err=", err)
        if callback: callback(it,err)
        d.data *= rho
        d.data += 2/delta * pre * r
        rho = 1/(2*sigma1-rho)
        d.data *= rho
        if err < tol*err0: break
        
    return x

from ngsolve import *
from netgen.geom2d import unit_square
mesh = Mesh(unit_square.GenerateMesh(maxh=0.1))
fes = H1(mesh, order=1)
u,v = fes.TnT()
a = BilinearForm(grad(u)*grad(v)*dx+10*u*v*dx).Assemble()
f = LinearForm(x*y*v*dx).Assemble()
gfu = GridFunction(fes)
pre = a.mat.CreateSmoother()   # Jacobi preconditioner
# pre = IdentityMatrix(a.mat.height)

We use the ngsolve-function EigenValues_Preconditioner to compute a few eigenvalues of $C^{-1} A$ using the Lanczos method. We get a good approximation for the largest and smallest eigenvalues.

from ngsolve.la import EigenValues_Preconditioner
lams = list(EigenValues_Preconditioner(mat=a.mat, pre=pre))
gamma1, gamma2 = lams[0], lams[-1]
print (gamma1,gamma2)

0.023921286926102014 1.6757907507089311

errhist = []
gfu.vec.data = ChebyIteration(a.mat, f.vec, pre, gamma1, gamma2, maxit=500,
                               # callback=lambda it,err: print("it=",it,"err=",err) 
                               callback = lambda it,err: errhist.append(err)
                             )

fig, ax = plt.subplots(figsize=(4, 3))
fig.canvas.header_visible = False
ax.set_yscale('log')
ax.plot(errhist)
plt.show();

from ngsolve.webgui import Draw
Draw (gfu);

Experiments:

modify the mesh-size
how sensitive is the method when under/over-estimating $\gamma_1$ ?
and $\gamma_2$ ?

The Chebyshev Method

Contents

75. The Chebyshev Method#

75.1. Chebyshev polynomials#

75.2. The Chebyshev iteration#