Skip to main content

Section 2.3 Convergence of Sequences

Recall that we defined and showed many examples of sequences in Section 1.3. The material there reviewed much of the main theorems from Calculus used to prove the convergence of sequences. In addition to that in this section we will cover the rates of convergence of sequences.
As we have seen so far, sequences and converges of sequences are very important in Numerical Analysis. We now spend a bit of time reviewing some of this material:

Definition 2.3.1.

A sequence is function \(a(n)\) from the positive (or nonnegative) integers into the real numbers.
Typically a sequence is written as \((a_{1},a_{2},a_{3}, \ldots)\) or \((a_{n})_{n=1}^{\infty}\) where \(a_{n}\) is the \(n\)th term of the sequence \(a\text{.}\) The most important aspect of a sequence is whether or not it converges, and if it converges, what is the value it converges to. Recall that the convergence of a sequence or its limit was explained in Section 1.3.

Subsection 2.3.1 Big “O” Notation

Typically for any algorithm, we want to know how fast it performs. The speed of it is often proportional to the number of operations as well as the size of the problem. Generally, this proportionality is what we are looking for.

Definition 2.3.2.

Let \((p_{n})\) be a sequence that converges to \(p\) and there exists another sequence \((\beta_{n})\) that converges to 0, and \(\lambda \gt 0\text{,}\) independent of \(n\) such that
\begin{equation} |p_{n}-p| \leq \lambda |\beta_{n}|\tag{2.3.1} \end{equation}
for all \(n\text{,}\) sufficiently large, then \((p_{n})\) is said converge to \(p\) with rate of convergence \(O(\beta_{n})\).

Example 2.3.3.

Let
\begin{equation*} p_{n} = \frac{2n+1}{n} \qquad q_{n} =\frac{8n^{3}-n}{4n^{3}}. \end{equation*}
Both sequences converge to \(2\text{,}\) however which one converges faster? We will write \(p_{n} = 2 + \alpha_{n}\) and \(q_{n} = 2 + \beta_{n}\text{,}\) where \(\alpha_{n}\) and \(\beta_{n}\) are the amounts each term differs from 2.
\begin{equation*} \begin{aligned}\alpha_{n} \amp = p_{n} - 2 = \frac{2n-1}{n}- 2 = \frac{2n-1 - 2n}{n}= \frac{1}{n}\\ \beta_{n} \amp = q_{n} - 2 = \frac{8n^{3}-n}{4n^{3}}- 2 = \frac{8n^{3}-2n - 2(4n^{3})}{4n^{3}}= -\frac{1}{2n^{2}}\end{aligned} \end{equation*}
Since both \(\alpha_{n}\) and \(\beta_{n}\) fit the form in (2.3.1), \(\{p_{n}\}\) converges to 2 with rate \(O(1/n)\) and constant \(1\text{,}\) whereas \(\{q_{n}\}\) converges to 2 with rate \(O(1/n^{2})\) and constant \(1/2\text{.}\)
Since \(1/n^{2}\) approaches 0 faster than \(1/n\text{,}\) we generally say that the \(O(1/n^{2})\) algorithm would be faster. However, the constant, \(\lambda\) is often is important in practical terms.
Recall that earlier we asked the question about how fast an algorithm converges. Often we will be interested only in the order of the convergence. That is, if a known algorithm converges with rate \(O(n^{2})\) can we find an algorithm that converges faster, perhaps with rate \(O(n)\) or \(O(n \ln n)\text{?}\)
The reason we generally are only concerned with the order of convergence is that as \(n\) get larges (that means the problem is hard to solve), we would like to find an algorithm that find the answer quickly. This is an easy way to compare algorithms.

Example 2.3.5.

Show using the theorem above that the sequence given by \(\displaystyle p_{n} = \frac{2n+3}{n 2^{n}}\) converges to zero with rate \(O(2^{-n})\text{.}\)
\begin{equation*} \lim_{n \rightarrow \infty}\frac{\frac{2n+3}{n2^{n}}}{2^{-n}} = \lim_{n \rightarrow \infty}\frac{2n+3}{n}= 2 \end{equation*}
And this means that \(k=2\text{.}\) By Theorem 2.3.4, the rate of convergence is \(O(2^{-n})\text{.}\)

Subsection 2.3.2 Comparing Order of Operations

We will see that common order of operations are \(O(n), O(n^2)\) and other powers, but often \(O(\ln n), O(n \ln n)\) and others in the form \(O(n^p \ln n)\text{.}\) As we have already alluded to that we desire faster algorithms which correspond to lower order.
It’s probably clear that \(O(n)\) is lower than \(O(n^2)\) or in general \(O(n^p)\) is lower than \(O(n^q)\) when \(p \lt q\text{.}\) However, where does \(O(n^p \ln n)\) fit?
If we look at some numerical data for these, we can make a table
Table 2.3.6. Comparative Growth of Functions
\(n\) \(\log(n)\) \(n \log(n)\) \(n \log^2(n)\) \(n^2\) \(n^2 \log(n)\)
1 0 0 0 1 0
10 1 10 10 100 100
100 2 200 400 10,000 20,000
1,000 3 3,000 9,000 1,000,000 3,000,000
100,000 5 500,000 2,500,000 \(10^{10}\) \(5 \times 10^{10}\)
This should give a sense of how fast these functions grow. Note that the order of the columns are from slowest to fastest. Throughout this course, there will be discussisons of the order of operation of a given algorithm and you may need to refer back to this table to get a sense of the “speed” of an algorithm.

Subsection 2.3.3 Order of operations and evaluating polynomials

Let’s look at how many operations it takes to evaluate a polynomial. In general, let’s look at the polynomial in general form:
\begin{equation*} P(x) = \sum_{k=0}^{n}a_{k} x^{k} \end{equation*}
If we evaluate the polynomial above, then if \(n\) is the order of the polynomial, then the number of function evaluations \(E(n)\) is:
\begin{equation*} E(n) = 1 + 1 + (1+2) + (1+3) + \cdots + (1 + n) \end{equation*}
where in each set of parentheses the 1 is for the multiplication of the coefficient with the power of \(x\) and the other is for the power of \(x\)—although than can be done more efficiently. Adding these up results in
\begin{equation*} E(n) = n + \sum_{k=1}^{n} i = n + \frac{n(n+1)}{2} \end{equation*}
Thus the rate of evaluating a polynomial in this manner is \(O(n^{2})\text{.}\) An alternative to this way is to write the polynomial above using Horner’s form (see Subsection 1.1.2):
\begin{equation*} P(x) = b_{0} + x(b_{1} + x(b_{2} + x (\cdots + x))) \end{equation*}
by factoring in an appropriate way. This evaluation of the polynomial can occur much more rapidly. That is:
\begin{equation*} \begin{aligned}E(1) \amp = 3 \\ E(n) \amp = E(n-1) + 2 \\ \amp = E(n-2) + 2 + 2 \\ \amp = E(1) + 2 + 2 + \cdots 2 \\ \amp = 3 + 2n\end{aligned} \end{equation*}
and thus this is an \(O(n)\) method. If one is careful the way that algorithms are executed, efficiencies can often be made.

Subsection 2.3.4 Order of Convergence

Here’s a big picture question. Given some problem that we want to solve, we want to find the answer to a very accurate level and we want to solve it in as few number of steps as possible.
In the next definition, we determine for a sequence how quickly we reach the answer. Let \(e_{n}\) be the error (distance from actual answer) at step \(n\text{.}\)

Definition 2.3.7.

Let \((p_{n})\) be a sequence that converges to a number \(p\text{.}\) Let \(e_{n} = p_{n}-p\) for \(n\geq0\text{.}\) If there exist positive constants \(\lambda\) and \(\alpha\) such that
\begin{equation*} \begin{aligned}\lim_{n \rightarrow \infty}\frac{|p_{n+1}-p|}{|p_{n}-p|^{\alpha}} \amp = \lim_{n \rightarrow \infty}\frac{|e_{n+1}|}{|e_{n}|^{\alpha}}= \lambda\end{aligned} \end{equation*}
Then \((p_{n})\) is said to converge to \(p\) of order \(\alpha\) with asymptotic error constant \(\lambda\).
If \(\alpha=1\text{,}\) we say that the sequence converges linearly. If \(\alpha=2\text{,}\) we say that the sequence converges quadratically, and if \(\alpha=3\text{,}\) we say that the sequence converges cubically. Note that \(\alpha\) does not need to be an integer.
Generally the larger the value of \(\alpha\text{,}\) the faster a sequence converges. The following shows if we start with the same error to begin with, the differences in the order.

Example 2.3.8.

Consider a case where \(\lambda=0.5\) and \(e_{0}=1\text{.}\) Evaluate \(e_{n+1}= \lambda e_{n}^{\alpha}\) for \(\alpha=1,2,\) or \(3\text{.}\)
Table 2.3.9. Table of errors for linear, quadratic and cubic convergence
linear quadratic cubic
\(e_{n+1}\approx 0.5 |e_{n}|\) \(e_{n+1}\approx 0.5 |e_{n}|^{2}\) \(e_{n+1}\approx 0.5 |e_{n}|^{3}\)
\(|e_{1}|\) \(0.5\) \(0.5\) \(0.5\)
\(|e_{2}|\) \(0.25\) \(0.125\) \(0.0625\)
\(|e_{3}|\) \(0.125\) \(7.8128 \times 10^{-3}\) \(1.2207 \times 10^{-4}\)
\(|e_{4}|\) \(0.0625\) \(3.0518 \times 10^{-5}\) \(9.0949 \times 10^{-13}\)
\(|e_{5}|\) \(0.03125\) \(4.6566 \times 10^{-10}\) \(3.7616 \times 10^{-37}\)
\(|e_{6}|\) \(0.015625\) \(1.0842 \times 10^{-19}\)
It appears from the above example, that the error shrinks fastest for cubic converging methods, and slowest for linear converging methods. The next example shows that this is essentially correct, but it is necessary to be careful.

Example 2.3.10.

Evaluate the linear converging sequence where \(e_{n+1}= 0.1 e_{n}\) and the quadratically converging sequence where \(\epsilon_{n+1}= 0.9 \epsilon_{n}^{2}\text{.}\) for \(n=1,2, \ldots, 9.\) Let \(e_{0}=1\) and \(\epsilon_{0}=1\text{.}\)
Table 2.3.11. Comparison of linear and quadratic convergence for difference coefficients
linear quadratic
n \(e_{n+1}= 0.1 e_{n}\) \(\epsilon_{n+1}= 0.9 \epsilon_{n}^{2}\)
\(1\) \(0.1\) \(0.9\)
\(2\) \(0.01\) \(0.81\)
\(3\) \(0.001\) \(0.729\)
\(4\) \(0.0001\) \(0.478\)
\(5\) \(0.00001\) \(0.206\)
\(6\) \(0.000001\) \(0.03815\)
\(7\) \(0.0000001\) \(0.00131\)
\(8\) \(0.00000001\) \(0.0000015\)
\(9\) \(10^{-9}\) \(2.14 \times 10^{-12}\)
As you can see from the table, it appears that in this case, the linear sequence is converging faster, however eventually the quadratically convergent one will surpass the linear one. In general if one has two methods: a linear and a quadratically convergent one, it is desirable to use the quadratically convergent one.

Subsection 2.3.5 Numerically Determining the order of convergence

From the above examples, we can see that it is important to determine the order of convergence and also find the asymptotic error constant, or \(\lambda\) in the formula above.
We simply use the definition to determine the order:
\begin{equation*} \lambda=\lim_{n \rightarrow \infty} \frac{e_{n+1}}{e_{n}^{\alpha}} \end{equation*}
We’ll take a sufficiently large value of \(n\text{,}\) but practically, this could be less than 10. When the limit is nearly converged, then
\begin{equation*} \lambda \approx \frac{e_{n+1}}{e_{n}^{\alpha}} \approx \frac{e_{n+2}}{e_{n+1}^{\alpha}} \end{equation*}
Taking the equation (we’ll assume it’s equal) on the right, then
\begin{equation*} \begin{aligned} \frac{e^{\alpha}_{n+1}}{e^{\alpha}_n} \amp \approx \frac{e_{n+2}}{e_{n+1}} \\ \left( \frac{e_{n+1}}{e_n}\right)^{\alpha} \amp \approx \frac{e_{n+2}}{e_{n+1}} \\ \alpha \ln \left( \frac{e_{n+1}}{e_n}\right) \amp \approx \ln \left(\frac{e_{n+2}}{e_{n+1}}\right) \\ \alpha \amp \approx \frac{\ln(e_{n+2}/e_{n+1})}{\ln(e_{n+1}/e_n)} \end{aligned} \end{equation*}

Example 2.3.12.

Find the order of convergence of the YASRA in Example 2.2.10.
First, build the following table of errors.
Table 2.3.13. Table of errors for YASRA
\(n\) \(x_{n}\) \(e_{n}=|\sqrt{65}-x_{n}|\)
\(0\) \(8\) \(6.225774829854\cdot 10^{-2}\)
\(1\) \(8.06250000000000\) \(2.422517014503\cdot 10^{-4}\)
\(2\) \(8.06225585937500\) \(1.888923549652\cdot 10^{-6}\)
\(3\) \(8.06225776672363\) \(1.842508316013\cdot10^{-8}\)
\(4\) \(8.06225774809718\) \(2.013683321761\cdot10^{-10}\)
\(5\) \(8.06225774830090\) \(2.358481020952\cdot10^{-12}\)
\(6\) \(8.06225774829852\) \(2.894257120155\cdot10^{-14}\)
Looking at the errors, they appear to be dropping in a linear way (each is about 1/100 of the previous one), but let’s use the formula above:
Table 2.3.14. Using a table to estimate the convergence rate of YASRA
\(n\) \(\ln(e_{n+2}/e_{n+1})/\ln(e_{n+1}/e_n)\)
0 0.8747372133045498
1 0.9538685373847787
2 0.975439938281658
3 0.9846245750612836
4 0.9936913420698572
From this analysis, it appears that in fact, this method converges linearly. The asymptote error constant is:
\begin{equation*} \lambda = \frac{e_{n+1}}{e_n} = \frac{2.894257120155\cdot10^{-14}}{2.358481020952\cdot10^{-12}} \approx 0.01227 \end{equation*}
Next, we compare the rate of convergence of Newton’s method from Example 2.2.7.

Example 2.3.15.

Determine the order of convergence of Example 2.2.7, which uses Newton’s method to find \(\sqrt{65}\text{.}\)
Solution.
First, calculate the error, \(e_{n}\) of each step.
Table 2.3.16. Table of Errors for Newton’s Method
\(n\) \(x_{n}\) \(e_{n}=|\sqrt{65}-x_{n}|\)
\(0\) \(65\)
\(1\) \(33\) \(24.937742251701\)
\(2\) \(17.4848484848484848484848484848\) \(9.4225907365499351\)
\(3\) \(10.6011764088020587154036027520\) \(2.5389186605035090\)
\(4\) \(8.36628570312537381937516420214\) \(0.3040279548268241\)
\(5\) \(8.06778188413350275809690744609\) \(0.0055241358349531\)
\(6\) \(8.06225963952944565783908293551\) \(1.89123089600 \cdot 10^{-7}\)
\(7\) \(8.06225774829877147319984891234\) \(2.2182083323568204 \cdot 10^{-13}\)
\(8\) \(8.06225774829854965236661323336\) \(3.06 \cdot 10^{-27}\)
This case is faster than linear. Each step seems to double the number of digits of accuracy you get. This is a sign of quadratic convergence. And in this table, we will show that it appears to be so.
Table 2.3.17. Table that show Newton’s method is quadratic convergence
\(n\) \(\ln(e_{n+2}/e_{n+1})/\ln(e_{n+1}/e_n)\)
0 1.178900923244053
1 1.3473842470980333
2 1.6184379407788283
3 1.8884480463362425
4 1.990935253721586
5 1.9997878352631038
From this analysis, it appears that in fact, this method converges linearly. The asymptote error constant is:
\begin{equation*} \lambda = \frac{e_{n+1}}{e_n^2} = \frac{3.06 \cdot 10^{-27}}{(2.2182083323568204 \cdot 10^{-13})^2} \approx 0.06208 \end{equation*}

Subsection 2.3.6 Determining Convergence Rates Theoretically

We saw in Example 2.3.15 that Newton’s method for the square root appears to converge quadratically. In this section we will show that is true.

Example 2.3.18.

Show that the fixed point iteration
\begin{equation*} x_{n+1} = \frac{x_{n}}{2}+ \frac{a}{2x_{n}}= \frac{x_{n}^{2} +a}{2x_{n}} \end{equation*}
converges quadratically. Recall that this method was presented earlier for finding \(\sqrt{a}\text{:}\)
Solution.
The errors \(e_{n+1}\) and \(e_{n}\) are given by:
\begin{equation*} \begin{aligned}e_{n+1} \amp = x_{n+1}- \sqrt{a} \amp e_{n} \amp = x_{n} - \sqrt{a}\\ \amp = \frac{x_{n}^{2} +a}{2x_{n}}- \sqrt{a}\end{aligned} \end{equation*}
And if we find the ratio \(e_{n+1}/e_{n}^{2}\text{,}\) we arrive at:
\begin{equation*} \frac{e_{n+1}}{e_{n}^{2}} = \frac{\frac{x_{n}^{2} +a}{2x_{n}} - \sqrt{a} }{(x_{n} - \sqrt{a})^{2}} \end{equation*}
rearranging and simplifying,
\begin{equation*} = \frac{x_{n}^{2} +a -2x_{n}\sqrt{a}}{2x_{n}(x_{n}^{2} -2x_{n} \sqrt{a}+ a)}= \frac{1}{2x_{n}}. \end{equation*}
Taking the limit
\begin{equation*} \begin{aligned}\lambda = \lim_{n \rightarrow \infty}\frac{e_{n+1}}{e_{n}^{2}}= \frac{1}{2\sqrt{a}}\end{aligned} \end{equation*}
which is approximately \(0.062\) when \(a=65\text{.}\) This shows that the numerical method of finding rates and asymptotic error constants in Example 2.3.15 is indeed correct.

Example 2.3.19.

Show that the YASRA analyzed in Example 2.2.10 and Example 2.3.12 is linearly convergent and find \(\lambda\text{.}\)
Solution.
Since we used Taylor’s Series to find this algorithm, we will use the Taylor Remainder Theorem (Theorem 1.6.4) to find an error bound. Let \([a,b]=[64,65]\) and \(c=65\text{,}\) then \(e_{n}=e_{n}(c)=|f(c)-P_{n}(c)|\) be the error of the approximation.
\begin{equation*} \begin{aligned}e_{n} \amp = |f(c)-P_{n}(c)| \leq \max_{x \in [a,b]}|f^{(n+1)}(x)| \frac{|b-a|}{(n+1)!}\\ \amp = \max_{x \in [64,65]}|(-1)^{n+2}\frac{(2n-1)!!}{2^{n+1}}x^{-(2n+1)/2}| \frac{1}{(n+1)!}\end{aligned} \end{equation*}
The function is decreasing, so the maximum occurs at the left endpoint, that is when \(x=64\text{.}\) For convenience we will call this \(c\text{.}\)
\begin{equation*} = \frac{(2n-1)!! 64^{-(2n+1)/2}}{(n+1)! 2^{n+1}}= \frac{(2n-1)!! c^{-(2n+1)/2}}{(n+1)! 2^{n+1}} \end{equation*}
and now taking the limit,
\begin{equation*} \begin{aligned}\lim_{n \rightarrow \infty}\frac{e_{n+1}}{e_{n}} \amp = \lim_{n \rightarrow \infty}\frac{\frac{(2n+1)!! c^{-(2n+3)/2}}{2^{n+2}(n+2)!} }{\frac{(2n-1)!! c^{-(2n+1)/2}}{2^{n+1}(n+1)!} }\\ \amp = \lim_{n \rightarrow \infty}\frac{(2n+1)c^{-1}}{2(n+2)}= \frac{1}{c}\end{aligned} \end{equation*}
So this gives \(\lambda=1/c\) and when we took \(c=64\text{,}\) this gives \(\lambda = 1/64 = 0.015625\text{,}\) close to the value found in Example 2.3.12.