The figures might not be fully compatible with dark mode! It's a known issue I'm working on. Also, TikZ figures have recently been made compatible! If you're still seeing black figures in dark mode, you can open web tools (in Firefox) > Storage > Indexed DB > right-click TikzJax > Delete.
This post assumes you know the base idea behind backward error analysis. If you are not, you should first read my previous blog post.
As is often the case, a good preliminary simplification is the linear case. Let’s consider here a one-dimensional linear ODE, which is, for λ∈R or in C,
u˙=λu.
In this case, the flow is the linear map (t,u0)↦φt(u0)=etλu0, which also extends to higher dimensions when λ is a matrix.1
I won’t discuss infinite dimension, i.e. PDEs, because everything is always more complicated for PDEs.
The explicit Euler scheme can be summarily written
Φh=id+hλ.
For this simple case, the modified vector field λh and the corrected vector field μh are found easily.
For modified equations, we look for λh such that
φh=Φh,i.e.exp(hλh)=id+hλ.
Extending the logarithm to complex numbers,2
In the multi-dimensional case u˙=Λu, we may use the notation ln(id+hΛ)=n≥1∑n(−1)n+1(hΛ)n, valid for h∣Λ∣<1. For other cases, see for instance
the Wikipedia page.
we find
λh=h1log(1+hλ).
Therefore, the modified equation is
dtduh=h1log(1+hλ),uh.
Note that λh=λ+O(h), which corresponds to the first order of convergence of the scheme: in finite time uh(t)=u(t)+O(h).
Link with stability
In the real case, if hλ<−1 then λh is complex, we write hλh=log∣1+hλ∣+iπ. If additionally hλ<−2, then ℜ(λh)>0, so the method is unstable. Of course in the case hλ>0, the method is also “unstable” in the standard sense, but not more than the original problem. In fact it is more stable in some sense, λh<λ.
This time, we want the numerical solution of a corrected problem
v˙h=μhv˙h
to coincide with the exact solution of u˙=λu. In other words, we want
1+hμh=ehλ,i.e.μh=h1(ehλ−1).
Then applying the Euler method with time-step h to the problem in vh produces the exact solution, vn=u(nh).
Note that once the time-step is chosen for the corrected equation, it should not be changed! Look at the error of the method when correcting for h=0.1 and then simulating with a time-step τ=0.01. Common belief would be that reducing the time-step increases accuracy. It does, but it increases accuracy for a different problem.
Error as a function of time-step (tau) at time (T = 1) with (lambda = 1) code
This result may be extended to any standard Runge-Kutta method in a fairly straightforward manner using the stability function. For a method of
Butcher tableaucAbT the stability function is given by the formula3
See e.g. E. Hairer, G. Wanner, Solving ordinary differential equations II. Stiff and Differential-Algebraic Problems (1996) – Sec. IV.3
For a Runge-Kutta method of stability function z↦r(z), the numerical scheme is exactly
Φh=r(hλ),
and λh is such that for all n∈N, enhλh=(Φh)n=r(hλ)n. Evidently,
λh=h1log(r(hλ)),i.e.dtdun(t)=h1log(r(hλ))un(t).
As for corrected equations, μh may be obtained through the implicit relation
h1ln(r(hμh))=λi.e.r(hμh)=ehλ
Here are two specific methods,
Implicit Euler:λh=−h1ln(1−hλ), μh=h1(1−e−hλ).
Midpoint or Crank-Nicolson:λh=h1ln(1−2hλ1+2hλ), μh=h2ehλ+1ehλ−1
If the ODE on un is well-posed, the following statements are equivalent:
The method is of order q for linear problems;
The modified vector field matches the initial one up to order q, i.e. h1ln(r(hλ))=λ+O(hq);
The q-th first derivatives of uh at t=0 match those of the original equation.
Investigate
Check that these three properties are indeed satisfied for the above examples.
For a two-stage explicit method of order 2, find the order condition b1+b2=1 and b2a12=1/2.
Try to prove this proposition!
Proof of the second property
For the second property, a direct calculation yields
r(z)=1+z(b1b2)(1za1201)(11)=1+z(b1+b2)+z2b2a12,
from which we obtain the truncated series expansion
h1ln(r(hλ))=λ(b1+b2)+hλb2a12−2hλ(b1+b2)+O(h2).
For the method to be of order 2, we need for all λ{λ(b1+b2)=λ,λb2a12−2λ(b1+b2)=0,
i.e. b1+b2=1 and b2a12=1/2.
In case of real data, i.e. u∈R2, the previous multi-dimensional example involving J=(0−101) may be identified with the complex problem z˙=iz with z=u1+iu2. Applying the previous identities with λ=i, we find
Explicit Euler:λh=2h1log(1+h2)+hiarctan(h)
Implicit Euler:λh=2h−1log(1+h2)+hiarctan(h),
Midpoint:λh=h2iarctan(h/2).
For explicit Euler, ℜ(λh)>0, therefore the method gains energy, meaning in this case that the module increases over time. For implicit Euler, ℜ(λh)<0 so the method dissipates energy. For the midpoint method ℜ(λh)=0, which indicates that energy is preserved.4
In the generic non-linear case, the energy would be preserved up to a O(h2) term.
The method is said to be symplectic.
We can also find the corrected coefficients, respectively μh=h1(eih−1), μh=h1(1−e−ih) and μh=h2itan(h/2). Again, only the midpoint method generates an imaginary coefficient.
If Z:=hΛ is an operator of a d-dimensional space, a correct way to write the stability function would be
r(Z)=Id+(Id⊗bT)(Id⊗Is−Z⊗A)−1(Z⊗1s)
where ⊗ is the Kronecker product, a useful tensor product for matrices.
Why the tensor product?
Given a Butcher tableau, the associated Runge-Kutta method is
⎩⎨⎧k1=hΛ(un+a11k1+⋯+a1sks)⋮ks=hΛ(un+as1k1+⋯+assks)
along with un+1=un+b1k1+…+bsks.
The vectors ki∈Rd may be collected in a tensor k∈Rd⊗Rs, and the fixed-point may then be written (Id⊗Is−hΛ⊗A)k=(hΛun)⊗1s. The second part contracts the Rs part with un+1=un+(Id⊗bT)k, where (Id⊗bT)k is identified from Rd⊗R to Rd.
This is cumbersome, so you’ll hopefully understand why I suck with the simple 1d notations. Just in case, here are the expressions for some common methods.
Exp. midpointImp. midpointRK4r(Z)=id+Z(id+21Z).r(Z)=(id−21Z)−1(id+21Z)r(Z)=id+61(K1+2K2+2K3+K4)K1=Z,K2=Z(id+21K1),K3=Z(id+21K2),K4=Z(id+K3)
For a non-linear vector field, we must replace the exponential by the flow (morally, φh=ehf), and there are now additional terms in the power series,
φh=id+hf+2h2f’f+6h3[f’’(f,f)+f’f’f]+4!h4[f(3)(f,f,f)+3f’’(f,f’f)+f’f’’(f,f)+f’f’f’f]+O(h5).
The computations become much more complex, and in general these series are purely formal, meaning that they diverge even for very small h. Therefore the modified vector field fh and the corrected vector field gh can only be computed up to some power of h, and the methods may be inaccurate for large time-steps.
Of course these are only examples to explain the reasoning of the method and why it has a chance to work. In practice, we do not know ehλ (or its equivalent, the flow φh or ehΛ), because if we did there would be no need for numerical methods. In the linear case, there is actually a lot of literature (which I’m not familiar with) on computing accurate high-order approximation of matrix exponentials. I don’t know if symplecticity is important for these applications.
I won’t discuss infinite dimension, i.e. PDEs, because everything is always more complicated for PDEs. ↩︎
In the multi-dimensional case u˙=Λu, we may use the notation ln(id+hΛ)=n≥1∑n(−1)n+1(hΛ)n, valid for h∣Λ∣<1. For other cases, see for instance
the Wikipedia page. ↩︎
See e.g. E. Hairer, G. Wanner, Solving ordinary differential equations II. Stiff and Differential-Algebraic Problems (1996) – Sec. IV.3 ↩︎
In the generic non-linear case, the energy would be preserved up to a O(h2) term. ↩︎