Linear backward error analysis

Table of Contents

The figures might not be fully compatible with dark mode! It's a known issue I'm working on. Also, TikZ figures have recently been made compatible! If you're still seeing black figures in dark mode, you can open web tools (in Firefox) > Storage > Indexed DB > right-click TikzJax > Delete.

This post assumes you know the base idea behind backward error analysis. If you are not, you should first read my previous blog post.

As is often the case, a good preliminary simplification is the linear case. Let’s consider here a one-dimensional linear ODE, which is, for $\lambda \in \mathbb{R}$ or in $\mathbb{C}$, $$ \dot{u} =\lambda u. $$ In this case, the flow is the linear map $(t, u_0) \mapsto \varphi_t(u_0) = e^{t\lambda}u_0$, which also extends to higher dimensions when $\lambda$ is a matrix.¹

Explicit Euler #

The explicit Euler scheme can be summarily written $$ \Phi_h = \operatorname{id} + h\lambda . $$ For this simple case, the modified vector field $\widetilde{\lambda}_h$ and the corrected vector field $\mu_h$ are found easily.

Modified equations #

For modified equations, we look for $\widetilde{\lambda}_h$ such that $$ \widetilde{\varphi}_h = \Phi_h, \qquad\text{i.e.}\qquad\exp\bigl( h\widetilde{\lambda}_h \bigr) = \operatorname{id} + h\lambda . $$ Extending the logarithm to complex numbers,² we find $$ \widetilde{\lambda}_h = \frac{1}{h} \log(1 + h\lambda) . $$ Therefore, the modified equation is $$ \frac{\mathrm{d}}{\mathrm{d}t} \widetilde{u}_h = \frac{1}{h}\log(1 + h\lambda), \widetilde{u}_h . $$ Note that $\widetilde{\lambda}_h = \lambda + \mathcal{O}(h)$, which corresponds to the first order of convergence of the scheme: in finite time $\widetilde{u}_h(t) = u(t) + \mathcal{O}(h)$.

Link with stability
In the real case, if $h\lambda < -1$ then $\widetilde{\lambda}_h$ is complex, we write $h\widetilde{\lambda}_h = \log|1 + h\lambda| + i\pi$. If additionally $h\lambda < -2$, then $\Re(\widetilde{\lambda}_h) > 0$, so the method is unstable. Of course in the case $h\lambda > 0$, the method is also “unstable” in the standard sense, but not more than the original problem. In fact it is more stable in some sense, $\widetilde{\lambda}_h < \lambda$.

Corrected equations / Modifying integrators #

This time, we want the numerical solution of a corrected problem $$ \dot v_h = \mu_h \dot{v}_h $$ to coincide with the exact solution of $\dot u = \lambda u$. In other words, we want $$ 1 + h\mu_h = e^{h\lambda}, \quad\text{i.e.}\quad \mu_h = \frac{1}{h} (e^{h\lambda} - 1). $$ Then applying the Euler method with time-step $h$ to the problem in $v_h$ produces the exact solution, $v_n = u(nh)$.

Note that once the time-step is chosen for the corrected equation, it should not be changed! Look at the error of the method when correcting for $h = 0.1$ and then simulating with a time-step $\tau = 0.01$. Common belief would be that reducing the time-step increases accuracy. It does, but it increases accuracy for a different problem.

Error as a function of time-step (tau) at time (T = 1) with (lambda = 1) code

Application to the harmonic oscillator #

Here is a comparison of these two concepts applied to a harmonic oscillator, $$ \dot{u} = \Lambda u, \qquad \Lambda = J := \begin{pmatrix} 0 & -1 \ 1 & 0 \end{pmatrix} . $$

Figure for the harmonic oscillator with both the modified and corrected equations, $h = 0.1$ (CODE) #

Application to other schemes #

This result may be extended to any standard Runge-Kutta method in a fairly straightforward manner using the stability function. For a method of Butcher tableau $\begin{array}{c|c} c & A \\ \hline & b^{\sf T} \end{array}$ the stability function is given by the formula³ $$ r(z) = 1 + z b^{\sf T} (\mathrm{id} - zA)^{-1} \mathbb{1} $$ where $\mathbb{1} = (1, \dots,1)^{\sf T}$.

Modified and corrected equations #

For a Runge-Kutta method of stability function $z \mapsto r(z)$, the numerical scheme is exactly $$ \Phi_h = r(h\lambda) , $$ and $\widetilde{\lambda}_h$ is such that for all $n \in \mathbb{N}$, $e^{nh\,\widetilde{\lambda}_h} = (\Phi_h)^n = r(h\lambda)^n$. Evidently, $$ \widetilde{\lambda}_h = \frac{1}{h} \log\bigl( r(h\lambda) \bigr), \quad\text{i.e.}\quad \frac{\mathrm{d}}{\mathrm{d}t} \widetilde{u}_{n}(t) = \frac{1}{h} \log\bigl( r(h\lambda) \bigr) \widetilde{u}_{n}(t) . $$ As for corrected equations, $\mu_h$ may be obtained through the implicit relation $$ \frac{1}{h} \ln(r(h\mu_h)) = \lambda \quad\text{i.e.}\quad r(h\mu_h) = e^{h\lambda} $$ Here are two specific methods,

Implicit Euler: $\widetilde{\lambda}_h = -\frac{1}{h}\ln(1 - h\lambda)$, $\mu_h = \frac{1}{h}(1 - e^{-h\lambda})$.
Midpoint or Crank-Nicolson: $\widetilde{\lambda}_h = \frac{1}{h}\ln\left(\frac{1 + \frac{h}{2} \lambda}{1-\frac{h}{2}\lambda} \right)$, $\mu_h = \frac{2}{h} \frac{e^{h\lambda} - 1}{e^{h\lambda} + 1}$

If the ODE on $\widetilde{u}_n$ is well-posed, the following statements are equivalent:

The method is of order $q$ for linear problems;
The modified vector field matches the initial one up to order $q$, i.e. $\frac{1}{h} \ln\bigl(r(h\lambda)\bigr) = \lambda + \mathcal{O}(h^q)$;
The $q$-th first derivatives of $\widetilde{u}_h$ at $t = 0$ match those of the original equation.

Investigate

Check that these three properties are indeed satisfied for the above examples.
For a two-stage explicit method of order 2, find the order condition $b_1 + b_2 = 1$ and $b_2 a_{12} = 1/2$.
Try to prove this proposition!

Proof of the second property

For the second property, a direct calculation yields $$ r(z) = 1 + z\begin{pmatrix} b_{1} & b_{2} \end{pmatrix} \begin{pmatrix} 1 & 0 \\ z a_{12} & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = 1 + z(b_1 + b_2) + z^2 b_2 a_{12}, $$ from which we obtain the truncated series expansion $$ \frac{1}{h}\ln\bigl(r(h\lambda)\bigr) = \lambda(b_{1}+b_{2}) + h\lambda b_{2}a_{12} - \frac{h\lambda}{2}(b_{1}+b_{2}) + \mathcal{O}(h^2). $$ For the method to be of order 2, we need for all $\lambda$ $$ \begin{cases} \lambda(b_1 + b_2) = \lambda, \\ \lambda b_2 a_{12} - \frac{\lambda}{2}(b_1 + b_2) = 0, \end{cases} $$ i.e. $b_1 + b_2 = 1$ and $b_2 a_{12} = 1/2$.

The harmonic oscillator #

In case of real data, i.e. $u \in \mathbb{R}^2$, the previous multi-dimensional example involving $J = \begin{pmatrix} 0 & -1 \ 0 & 1 \end{pmatrix}$ may be identified with the complex problem $\dot z = iz$ with $z = u_1 + {\rm i} u_2$. Applying the previous identities with $\lambda = {\rm i}$, we find

Explicit Euler: $\widetilde{\lambda}_h = \frac{1}{2h} \log(1 + h^2) + \frac{\rm i}{h}\arctan(h)$
Implicit Euler: $\widetilde{\lambda}_h = \frac{-1}{2h} \log(1 + h^2) + \frac{\rm i}{h}\arctan(h)$,
Midpoint: $\widetilde{\lambda}_h = \frac{2{\rm i}}{h} \arctan(h/2)$.

For explicit Euler, $\Re(\widetilde{\lambda}_h) > 0$, therefore the method gains energy, meaning in this case that the module increases over time. For implicit Euler, $\Re(\widetilde{\lambda}_h) < 0$ so the method dissipates energy. For the midpoint method $\Re(\widetilde{\lambda}_h) = 0$, which indicates that energy is preserved.⁴ The method is said to be symplectic.

We can also find the corrected coefficients, respectively $\mu_h = \frac{1}{h}(e^{{\rm i}h}-1)$, $\mu_h = \frac{1}{h}(1 - e^{-{\rm i}h})$ and $\mu_h = \frac{2{\rm i}}{h} \tan(h/2)$. Again, only the midpoint method generates an imaginary coefficient.

The multi-dimensional case #

If $Z := h\Lambda$ is an operator of a $d$-dimensional space, a correct way to write the stability function would be $$ r(Z) = I_d + (I_d \otimes b^{\sf T}) \left( I_d \otimes I_s - Z \otimes A \right)^{-1} (Z \otimes \mathbb{1}_s) $$ where $\otimes$ is the Kronecker product, a useful tensor product for matrices.

Why the tensor product?
Given a Butcher tableau, the associated Runge-Kutta method is $$ \begin{cases} k_1 = h \Lambda \left( u_n + a_{11} k_1 + \dots + a_{1s} k_s \right) \\ \quad\vdots \\ k_s = h \Lambda \left( u_n + a_{s1} k_1 + \dots + a_{ss} k_s \right) \end{cases} $$ along with $u_{n+1} = u_n + b_1 k_1 + … + b_s k_s$. The vectors $k_i \in \mathbb{R}^d$ may be collected in a tensor $\mathbf{k} \in \mathbb{R}^d \otimes \mathbb{R}^s$, and the fixed-point may then be written $(I_d \otimes I_s - h\Lambda \otimes A){\bf k} = (h\Lambda u_n) \otimes \mathbb{1}_s$. The second part contracts the $\mathbb{R}^s$ part with $u_{n+1} = u_n + (I_d \otimes b^{\sf T}) {\bf k}$, where $(I_d \otimes b^{\sf T}){\bf k}$ is identified from $\mathbb{R}^d \otimes \mathbb{R}$ to $\mathbb{R}^d$.

This is cumbersome, so you’ll hopefully understand why I suck with the simple 1d notations. Just in case, here are the expressions for some common methods. $$ \begin{aligned} &\text{Exp. midpoint} && r(Z) = \operatorname{id} + Z \left( \operatorname{id} + {\textstyle\frac{1}{2}}Z \right) . \\ \\ &\text{Imp. midpoint} && r(Z) = \left( \operatorname{id} - {\textstyle\frac{1}{2}} Z \right)^{-1} (\operatorname{id} + {\textstyle\frac{1}{2}} Z) \\ \\ & \text{RK4} && r(Z) = \operatorname{id} + {\textstyle\frac{1}{6}}(K_1 + 2K_2 + 2K_3 + K_4 ) \\ &&& K_1 = Z,\ K_2 = Z \left( \operatorname{id} + {\textstyle\frac{1}{2}} K_1 \right),\ \\ &&& K_3 = Z \left( \operatorname{id} + {\textstyle\frac{1}{2}} K_2 \right), K_4 = Z \left( \operatorname{id} + K_3 \right) \end{aligned} $$

Reality check #

The non-linear case #

For a non-linear vector field, we must replace the exponential by the flow (morally, $\varphi_h = e^{hf}$), and there are now additional terms in the power series, $$ \begin{aligned} \varphi_h &= \operatorname{id} + hf + \frac{h^2}{2}f’f + \frac{h^3}{6}\bigl[ f’’(f,f) + f’f’f \bigr] \\ &+ \frac{h^4}{4!} \left[ f^{(3)}(f, f, f) + 3f’’(f, f’f) + f’f’’(f,f) + f’f’f’f \right] + \mathcal{O}(h^5) . \end{aligned} $$ The computations become much more complex, and in general these series are purely formal, meaning that they diverge even for very small $h$. Therefore the modified vector field $\widetilde{f}_h$ and the corrected vector field $g_h$ can only be computed up to some power of $h$, and the methods may be inaccurate for large time-steps.

We’re not supposed to know the flow #

Of course these are only examples to explain the reasoning of the method and why it has a chance to work. In practice, we do not know $e^{h\lambda}$ (or its equivalent, the flow $\varphi_h$ or $e^{h\Lambda}$), because if we did there would be no need for numerical methods. In the linear case, there is actually a lot of literature (which I’m not familiar with) on computing accurate high-order approximation of matrix exponentials. I don’t know if symplecticity is important for these applications.

I won’t discuss infinite dimension, i.e. PDEs, because everything is always more complicated for PDEs. ↩︎
In the multi-dimensional case $\dot u = \Lambda u$, we may use the notation $\ln(\mathrm{id} + h\Lambda) = \sum\limits_{n \geq 1} \frac{(-1)^{n+1}}{n}(h \Lambda)^n$, valid for $h |\Lambda| < 1$. For other cases, see for instance the Wikipedia page. ↩︎
See e.g. E. Hairer, G. Wanner, Solving ordinary differential equations II. Stiff and Differential-Algebraic Problems (1996) – Sec. IV.3 ↩︎
In the generic non-linear case, the energy would be preserved up to a $\mathcal{O}(h^2)$ term. ↩︎