Skip to main content
  1. Blog posts/
  2. A gentle introduction to backward error analysis/

Linear backward error analysis

·8 mins
The figures might not be fully compatible with dark mode! It's a known issue I'm working on. Also, TikZ figures have recently been made compatible! If you're still seeing black figures in dark mode, you can open web tools (in Firefox) > Storage > Indexed DB > right-click TikzJax > Delete.

This post assumes you know the base idea behind backward error analysis. If you are not, you should first read my previous blog post.

As is often the case, a good preliminary simplification is the linear case. Let’s consider here a one-dimensional linear ODE, which is, for λR\lambda \in \mathbb{R} or in C\mathbb{C}, u˙=λu. \dot{u} =\lambda u. In this case, the flow is the linear map (t,u0)φt(u0)=etλu0(t, u_0) \mapsto \varphi_t(u_0) = e^{t\lambda}u_0, which also extends to higher dimensions when λ\lambda is a matrix.1

I won’t discuss infinite dimension, i.e. PDEs, because everything is always more complicated for PDEs. 

Explicit Euler #

The explicit Euler scheme can be summarily written Φh=id+hλ. \Phi_h = \operatorname{id} + h\lambda . For this simple case, the modified vector field λ~h\widetilde{\lambda}_h and the corrected vector field μh\mu_h are found easily.

Modified equations #

For modified equations, we look for λ~h\widetilde{\lambda}_h such that φ~h=Φh,i.e.exp(hλ~h)=id+hλ. \widetilde{\varphi}_h = \Phi_h, \qquad\text{i.e.}\qquad\exp\bigl( h\widetilde{\lambda}_h \bigr) = \operatorname{id} + h\lambda . Extending the logarithm to complex numbers,2

In the multi-dimensional case u˙=Λu\dot u = \Lambda u, we may use the notation ln(id+hΛ)=n1(1)n+1n(hΛ)n\ln(\mathrm{id} + h\Lambda) = \sum\limits_{n \geq 1} \frac{(-1)^{n+1}}{n}(h \Lambda)^n, valid for hΛ<1h |\Lambda| < 1. For other cases, see for instance the Wikipedia page

we find λ~h=1hlog(1+hλ). \widetilde{\lambda}_h = \frac{1}{h} \log(1 + h\lambda) . Therefore, the modified equation is ddtu~h=1hlog(1+hλ),u~h. \frac{\mathrm{d}}{\mathrm{d}t} \widetilde{u}_h = \frac{1}{h}\log(1 + h\lambda), \widetilde{u}_h . Note that λ~h=λ+O(h)\widetilde{\lambda}_h = \lambda + \mathcal{O}(h), which corresponds to the first order of convergence of the scheme: in finite time u~h(t)=u(t)+O(h)\widetilde{u}_h(t) = u(t) + \mathcal{O}(h).

Link with stability
In the real case, if hλ<1h\lambda < -1 then λ~h\widetilde{\lambda}_h is complex, we write hλ~h=log1+hλ+iπh\widetilde{\lambda}_h = \log|1 + h\lambda| + i\pi. If additionally hλ<2h\lambda < -2, then (λ~h)>0\Re(\widetilde{\lambda}_h) > 0, so the method is unstable. Of course in the case hλ>0h\lambda > 0, the method is also “unstable” in the standard sense, but not more than the original problem. In fact it is more stable in some sense, λ~h<λ\widetilde{\lambda}_h < \lambda.

Corrected equations / Modifying integrators #

This time, we want the numerical solution of a corrected problem v˙h=μhv˙h \dot v_h = \mu_h \dot{v}_h to coincide with the exact solution of u˙=λu\dot u = \lambda u. In other words, we want 1+hμh=ehλ,i.e.μh=1h(ehλ1). 1 + h\mu_h = e^{h\lambda}, \quad\text{i.e.}\quad \mu_h = \frac{1}{h} (e^{h\lambda} - 1). Then applying the Euler method with time-step hh to the problem in vhv_h produces the exact solution, vn=u(nh)v_n = u(nh).

Note that once the time-step is chosen for the corrected equation, it should not be changed! Look at the error of the method when correcting for h=0.1h = 0.1 and then simulating with a time-step τ=0.01\tau = 0.01. Common belief would be that reducing the time-step increases accuracy. It does, but it increases accuracy for a different problem.

Error as a function of time-step (tau) at time (T = 1) with (lambda = 1) code

Application to the harmonic oscillator #

Here is a comparison of these two concepts applied to a harmonic oscillator, u˙=Λu,Λ=J:=(01 10). \dot{u} = \Lambda u, \qquad \Lambda = J := \begin{pmatrix} 0 & -1 \ 1 & 0 \end{pmatrix} .

Figure for the harmonic oscillator with both the modified and corrected equations, h=0.1h = 0.1 (CODE) #
Error as a function of time-step (tau) at time (T = 1) with (lambda = 1) code

Application to other schemes #

This result may be extended to any standard Runge-Kutta method in a fairly straightforward manner using the stability function. For a method of Butcher tableau cAbT\begin{array}{c|c} c & A \\ \hline & b^{\sf T} \end{array} the stability function is given by the formula3

See e.g. E. Hairer, G. Wanner, Solving ordinary differential equations II. Stiff and Differential-Algebraic Problems (1996) – Sec. IV.3 

r(z)=1+zbT(idzA)11 r(z) = 1 + z b^{\sf T} (\mathrm{id} - zA)^{-1} \mathbb{1} where 1=(1,,1)T\mathbb{1} = (1, \dots,1)^{\sf T}.

Modified and corrected equations #

For a Runge-Kutta method of stability function zr(z)z \mapsto r(z), the numerical scheme is exactly Φh=r(hλ), \Phi_h = r(h\lambda) , and λ~h\widetilde{\lambda}_h is such that for all nNn \in \mathbb{N}, enhλ~h=(Φh)n=r(hλ)ne^{nh\,\widetilde{\lambda}_h} = (\Phi_h)^n = r(h\lambda)^n. Evidently, λ~h=1hlog(r(hλ)),i.e.ddtu~n(t)=1hlog(r(hλ))u~n(t). \widetilde{\lambda}_h = \frac{1}{h} \log\bigl( r(h\lambda) \bigr), \quad\text{i.e.}\quad \frac{\mathrm{d}}{\mathrm{d}t} \widetilde{u}_{n}(t) = \frac{1}{h} \log\bigl( r(h\lambda) \bigr) \widetilde{u}_{n}(t) . As for corrected equations, μh\mu_h may be obtained through the implicit relation 1hln(r(hμh))=λi.e.r(hμh)=ehλ \frac{1}{h} \ln(r(h\mu_h)) = \lambda \quad\text{i.e.}\quad r(h\mu_h) = e^{h\lambda} Here are two specific methods,

  • Implicit Euler: λ~h=1hln(1hλ)\widetilde{\lambda}_h = -\frac{1}{h}\ln(1 - h\lambda), μh=1h(1ehλ)\mu_h = \frac{1}{h}(1 - e^{-h\lambda}).
  • Midpoint or Crank-Nicolson: λ~h=1hln(1+h2λ1h2λ)\widetilde{\lambda}_h = \frac{1}{h}\ln\left(\frac{1 + \frac{h}{2} \lambda}{1-\frac{h}{2}\lambda} \right), μh=2hehλ1ehλ+1\mu_h = \frac{2}{h} \frac{e^{h\lambda} - 1}{e^{h\lambda} + 1}

If the ODE on u~n\widetilde{u}_n is well-posed, the following statements are equivalent:

  • The method is of order qq for linear problems;
  • The modified vector field matches the initial one up to order qq, i.e. 1hln(r(hλ))=λ+O(hq)\frac{1}{h} \ln\bigl(r(h\lambda)\bigr) = \lambda + \mathcal{O}(h^q);
  • The qq-th first derivatives of u~h\widetilde{u}_h at t=0t = 0 match those of the original equation.

Investigate

  1. Check that these three properties are indeed satisfied for the above examples.
  2. For a two-stage explicit method of order 2, find the order condition b1+b2=1b_1 + b_2 = 1 and b2a12=1/2b_2 a_{12} = 1/2.
  3. Try to prove this proposition!
Proof of the second property For the second property, a direct calculation yields r(z)=1+z(b1b2)(10za121)(11)=1+z(b1+b2)+z2b2a12, r(z) = 1 + z\begin{pmatrix} b_{1} & b_{2} \end{pmatrix} \begin{pmatrix} 1 & 0 \\ z a_{12} & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = 1 + z(b_1 + b_2) + z^2 b_2 a_{12}, from which we obtain the truncated series expansion 1hln(r(hλ))=λ(b1+b2)+hλb2a12hλ2(b1+b2)+O(h2). \frac{1}{h}\ln\bigl(r(h\lambda)\bigr) = \lambda(b_{1}+b_{2}) + h\lambda b_{2}a_{12} - \frac{h\lambda}{2}(b_{1}+b_{2}) + \mathcal{O}(h^2). For the method to be of order 2, we need for all λ\lambda {λ(b1+b2)=λ,λb2a12λ2(b1+b2)=0, \begin{cases} \lambda(b_1 + b_2) = \lambda, \\ \lambda b_2 a_{12} - \frac{\lambda}{2}(b_1 + b_2) = 0, \end{cases} i.e. b1+b2=1b_1 + b_2 = 1 and b2a12=1/2b_2 a_{12} = 1/2.

The harmonic oscillator #

In case of real data, i.e. uR2u \in \mathbb{R}^2, the previous multi-dimensional example involving J=(01 01)J = \begin{pmatrix} 0 & -1 \ 0 & 1 \end{pmatrix} may be identified with the complex problem z˙=iz\dot z = iz with z=u1+iu2z = u_1 + {\rm i} u_2. Applying the previous identities with λ=i\lambda = {\rm i}, we find

  • Explicit Euler: λ~h=12hlog(1+h2)+iharctan(h)\widetilde{\lambda}_h = \frac{1}{2h} \log(1 + h^2) + \frac{\rm i}{h}\arctan(h)
  • Implicit Euler: λ~h=12hlog(1+h2)+iharctan(h)\widetilde{\lambda}_h = \frac{-1}{2h} \log(1 + h^2) + \frac{\rm i}{h}\arctan(h),
  • Midpoint: λ~h=2iharctan(h/2)\widetilde{\lambda}_h = \frac{2{\rm i}}{h} \arctan(h/2).

For explicit Euler, (λ~h)>0\Re(\widetilde{\lambda}_h) > 0, therefore the method gains energy, meaning in this case that the module increases over time. For implicit Euler, (λ~h)<0\Re(\widetilde{\lambda}_h) < 0 so the method dissipates energy. For the midpoint method (λ~h)=0\Re(\widetilde{\lambda}_h) = 0, which indicates that energy is preserved.4

In the generic non-linear case, the energy would be preserved up to a O(h2)\mathcal{O}(h^2) term. 

The method is said to be symplectic.

We can also find the corrected coefficients, respectively μh=1h(eih1)\mu_h = \frac{1}{h}(e^{{\rm i}h}-1), μh=1h(1eih)\mu_h = \frac{1}{h}(1 - e^{-{\rm i}h}) and μh=2ihtan(h/2)\mu_h = \frac{2{\rm i}}{h} \tan(h/2). Again, only the midpoint method generates an imaginary coefficient.

The multi-dimensional case #

If Z:=hΛZ := h\Lambda is an operator of a dd-dimensional space, a correct way to write the stability function would be r(Z)=Id+(IdbT)(IdIsZA)1(Z1s) r(Z) = I_d + (I_d \otimes b^{\sf T}) \left( I_d \otimes I_s - Z \otimes A \right)^{-1} (Z \otimes \mathbb{1}_s) where \otimes is the Kronecker product, a useful tensor product for matrices.

Why the tensor product?
Given a Butcher tableau, the associated Runge-Kutta method is {k1=hΛ(un+a11k1++a1sks)ks=hΛ(un+as1k1++assks) \begin{cases} k_1 = h \Lambda \left( u_n + a_{11} k_1 + \dots + a_{1s} k_s \right) \\ \quad\vdots \\ k_s = h \Lambda \left( u_n + a_{s1} k_1 + \dots + a_{ss} k_s \right) \end{cases} along with un+1=un+b1k1++bsksu_{n+1} = u_n + b_1 k_1 + … + b_s k_s. The vectors kiRdk_i \in \mathbb{R}^d may be collected in a tensor kRdRs\mathbf{k} \in \mathbb{R}^d \otimes \mathbb{R}^s, and the fixed-point may then be written (IdIshΛA)k=(hΛun)1s(I_d \otimes I_s - h\Lambda \otimes A){\bf k} = (h\Lambda u_n) \otimes \mathbb{1}_s. The second part contracts the Rs\mathbb{R}^s part with un+1=un+(IdbT)ku_{n+1} = u_n + (I_d \otimes b^{\sf T}) {\bf k}, where (IdbT)k(I_d \otimes b^{\sf T}){\bf k} is identified from RdR\mathbb{R}^d \otimes \mathbb{R} to Rd\mathbb{R}^d.

This is cumbersome, so you’ll hopefully understand why I suck with the simple 1d notations. Just in case, here are the expressions for some common methods. Exp. midpointr(Z)=id+Z(id+12Z).Imp. midpointr(Z)=(id12Z)1(id+12Z)RK4r(Z)=id+16(K1+2K2+2K3+K4)K1=Z, K2=Z(id+12K1), K3=Z(id+12K2),K4=Z(id+K3) \begin{aligned} &\text{Exp. midpoint} && r(Z) = \operatorname{id} + Z \left( \operatorname{id} + {\textstyle\frac{1}{2}}Z \right) . \\ \\ &\text{Imp. midpoint} && r(Z) = \left( \operatorname{id} - {\textstyle\frac{1}{2}} Z \right)^{-1} (\operatorname{id} + {\textstyle\frac{1}{2}} Z) \\ \\ & \text{RK4} && r(Z) = \operatorname{id} + {\textstyle\frac{1}{6}}(K_1 + 2K_2 + 2K_3 + K_4 ) \\ &&& K_1 = Z,\ K_2 = Z \left( \operatorname{id} + {\textstyle\frac{1}{2}} K_1 \right),\ \\ &&& K_3 = Z \left( \operatorname{id} + {\textstyle\frac{1}{2}} K_2 \right), K_4 = Z \left( \operatorname{id} + K_3 \right) \end{aligned}

Reality check #

The non-linear case #

For a non-linear vector field, we must replace the exponential by the flow (morally, φh=ehf\varphi_h = e^{hf}), and there are now additional terms in the power series, φh=id+hf+h22ff+h36[f’’(f,f)+fff]+h44![f(3)(f,f,f)+3f’’(f,ff)+ff’’(f,f)+ffff]+O(h5). \begin{aligned} \varphi_h &= \operatorname{id} + hf + \frac{h^2}{2}f’f + \frac{h^3}{6}\bigl[ f’’(f,f) + f’f’f \bigr] \\ &+ \frac{h^4}{4!} \left[ f^{(3)}(f, f, f) + 3f’’(f, f’f) + f’f’’(f,f) + f’f’f’f \right] + \mathcal{O}(h^5) . \end{aligned} The computations become much more complex, and in general these series are purely formal, meaning that they diverge even for very small hh. Therefore the modified vector field f~h\widetilde{f}_h and the corrected vector field ghg_h can only be computed up to some power of hh, and the methods may be inaccurate for large time-steps.

We’re not supposed to know the flow #

Of course these are only examples to explain the reasoning of the method and why it has a chance to work. In practice, we do not know ehλe^{h\lambda} (or its equivalent, the flow φh\varphi_h or ehΛe^{h\Lambda}), because if we did there would be no need for numerical methods. In the linear case, there is actually a lot of literature (which I’m not familiar with) on computing accurate high-order approximation of matrix exponentials. I don’t know if symplecticity is important for these applications.


  1. I won’t discuss infinite dimension, i.e. PDEs, because everything is always more complicated for PDEs. ↩︎

  2. In the multi-dimensional case u˙=Λu\dot u = \Lambda u, we may use the notation ln(id+hΛ)=n1(1)n+1n(hΛ)n\ln(\mathrm{id} + h\Lambda) = \sum\limits_{n \geq 1} \frac{(-1)^{n+1}}{n}(h \Lambda)^n, valid for hΛ<1h |\Lambda| < 1. For other cases, see for instance the Wikipedia page↩︎

  3. See e.g. E. Hairer, G. Wanner, Solving ordinary differential equations II. Stiff and Differential-Algebraic Problems (1996) – Sec. IV.3 ↩︎

  4. In the generic non-linear case, the energy would be preserved up to a O(h2)\mathcal{O}(h^2) term. ↩︎