Skip to main content

Applied Mathematics

Section 5.4 Singular Value Decomposition

The singular value decomposition, also known as the SVD, is yet another way to factor a matrix. In short, a matrix \(A\) can be written \(A=U \Sigma V^{\intercal}\text{,}\) where \(U\) and \(V\) are orthogonal matrices and \(\Sigma\) is a diagonal matrix. These can be thought of as a rotation, followed by a scaling, followed by another rotation. It has numerous applications as we will see at the end of this section.
We will show that it is possible to factor any \(m\) by \(n\) matrix \(A\) in this form and also determine how to find the matrices \(U\text{,}\) \(\Sigma\) and \(V\text{.}\) Consider a real \(m\) by \(n\) matrix \(A\text{.}\) We will consider \(U\text{,}\) an \(m\) by \(m\) matrix, \(\Sigma\text{,}\) an \(m\) by \(n\) matrix and \(V\) a \(n\) by \(n\) matrix.
The general proof is not presented here. Instead, assume that the matrix can be factored as in the Singular Value Decomposition, then
\begin{align} AA^{\intercal} \amp = U \Sigma V^{\intercal} (U \Sigma V^{\intercal})^{\intercal}\notag\\ \amp = U \Sigma V^{\intercal} V \Sigma^{\intercal} U^{\intercal}\notag\\ \text{and since $V^{\intercal}V=I$, then }\notag\\ AA^{\intercal}\amp = U \Sigma \Sigma^{\intercal} U^{\intercal}.\tag{5.4.2} \end{align}
The astute reader will notice that (5.4.2) shows that the matrices \(AA^{\intercal}\) and \(\Sigma \Sigma^{\intercal}\) are similar and thus have the same eigenvalues. Recall also that \(U\) will be the eigenvectors of \(AA^{\intercal}\text{.}\) These properties are used to actually find \(U\) and \(\Sigma\text{.}\) Also, if you can find either \(U\) or \(V\text{,}\) then finding the other is easier by solving ((5.4.1)). If we know \(V\text{,}\) then
\begin{align*} U \Sigma V^{\intercal} \amp = A\\ \Sigma V^{\intercal} \amp = U^{-1} A \end{align*}
and recall that since \(U\) is an orthogonal matrix that \(U^{-1}=U^{\intercal}\text{.}\) This is now left-multiplied by \(\Sigma^{\intercal}\) in case the original matrix is not square, then \(\Sigma\) is not square and the inverse is not defined.
\begin{align} \Sigma^{\intercal} \Sigma V^{\intercal} \amp = \Sigma^{\intercal} U^{\intercal} A\notag\\ V^{\intercal} \amp = (\Sigma^{\intercal} \Sigma)^{-1} \Sigma^{\intercal} U^{\intercal} A \notag\\ V \amp = ((\Sigma^{\intercal} \Sigma)^{-1} \Sigma^{\intercal} U^{\intercal} A)^{\intercal}\notag\\ V \amp = A^{\intercal} U \Sigma ((\Sigma^{\intercal} \Sigma)^{-1})^{\intercal}\tag{5.4.3} \end{align}
where the third property of TheoremΒ 5.1.2 was used. Similarly, if one solves for \(U\text{:}\)
\begin{align} U \Sigma V^{\intercal} \amp = A \notag\\ U \Sigma V^{\intercal}V \amp = AV \notag\\ U \Sigma \amp = A V \notag\\ U \Sigma \Sigma^{\intercal} \amp = A V \Sigma^{\intercal} \notag\\ U \amp = A V \Sigma^{\intercal} (\Sigma \Sigma^{\intercal})^{-1} \tag{5.4.4} \end{align}
where \(V^{-1}=V^{\intercal}\) is used on the 3rd step since \(V\) is an orthogonal matrix.
Two examples of finding the SVD are presented below. The first is for a square matrix and the second is for a non-square matrix. We first start with the SVD of a 2 by 2 matrix with the following example.

Example 5.4.2.

Find \(U, \Sigma,\) and \(V\) of the SVD for the matrix
\begin{equation*} A = \begin{bmatrix} -1 \amp 2 \\ 1 \amp 2 \end{bmatrix} \end{equation*}
Solution.
As explained above, the columns of \(U\) are the eigenvectors of \(AA^{\intercal}\) and
\begin{equation*} A A^{\intercal} = \begin{bmatrix} 5 \amp 3 \\ 3 \amp 5 \end{bmatrix} \end{equation*}
The eigenvalues are found by
\begin{equation*} \det(A-\lambda I) = \begin{vmatrix} 5 - \lambda \amp 3 \\ 3 \amp 5- \lambda \end{vmatrix} = (5-\lambda)^2 - 9 = 0 \end{equation*}
with solutions \(\lambda = 2, 8\text{.}\) If \(\lambda = 2\text{,}\) then the solution to
\begin{equation*} (AA^{\intercal}-2I)\vec{v}_1 = \begin{bmatrix} 3 \amp 3 \\ 3 \amp 3 \end{bmatrix}\vec{v}_1 = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation*}
is \(\vec{v}_1 = [1,-1]^{\intercal}\text{.}\) And if \(\lambda = 8\text{,}\) then the solution to
\begin{equation*} (AA^{\intercal}-8I) \vec{v}_2 = \begin{bmatrix} -3 \amp 3 \\ 3 \amp -3 \end{bmatrix} \vec{v}_2 = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation*}
is \(\vec{v}_2 = [1,1]^{\intercal}\text{.}\) Thus the columns of \(U\) are the two eigenvectors scaled to be unit vectors,
\begin{equation*} U = \begin{bmatrix} 1/\sqrt{2} \amp 1/\sqrt{2} \\ -1/\sqrt{2} \amp 1/\sqrt{2} \end{bmatrix} \end{equation*}
The singular values in the matrix \(\Sigma\) are the square roots of the eigenvalues of \(AA^{\intercal}\) and the matrix is
\begin{equation*} \Sigma = \begin{bmatrix} \sqrt{2} \amp 0 \\ 0 \amp \sqrt{8} \end{bmatrix} \end{equation*}
To find the matrix \(V\text{,}\) we will use ((5.4.3)). Since
\begin{equation*} \Sigma^{\intercal} \Sigma = \begin{bmatrix} 2 \amp 0 \\ 0 \amp 8 \end{bmatrix} \end{equation*}
\begin{equation*} (\Sigma^{\intercal} \Sigma)^{-1} = \begin{bmatrix} 1/2 \amp 0 \\ 0 \amp 1/8 \end{bmatrix}, \end{equation*}
then using (5.4.3),
\begin{align*} V \amp = A^{\intercal} U \Sigma ((\Sigma^{\intercal} \Sigma)^{-1})^{\intercal} \\ \amp = \begin{bmatrix} -1 \amp 1 \\ 2 \amp 2 \end{bmatrix} \begin{bmatrix} 1/\sqrt{2} \amp 1/\sqrt{2} \\ -1/\sqrt{2} \amp 1/\sqrt{2} \end{bmatrix} \begin{bmatrix} \sqrt{2} \amp 0 \\ 0 \amp \sqrt{8} \end{bmatrix} \begin{bmatrix} 1/2 \amp 0 \\ 0 \amp 1/8 \end{bmatrix} \\ \amp = \begin{bmatrix} -1 \amp 0 \\ 0 \amp 1 \end{bmatrix} \end{align*}
It can be checked that indeed \(A=U\Sigma V^{\intercal}\text{.}\)
The next example finds the SVD of a 2 by 3 matrix, showing how to handle a non-square matrix.

Example 5.4.3.

Find the SVD of the matrix
\begin{equation*} A = \begin{bmatrix} 2 \amp -3 \amp -3 \\ 2 \amp 3 \amp 3 \end{bmatrix} \end{equation*}
Solution.
In this example, we will first find the columns of \(V\) by the eigenvectors of \(A^{\intercal}A\text{.}\) Generally, if the original matrix is not square, one should select the larger matrix between \(AA^{\intercal}\) and \(A^{\intercal}A\text{.}\)
\begin{equation*} A^{\intercal}A = \begin{bmatrix} 8 \amp 0 \amp 0 \\ 0 \amp 18 \amp 18 \\ 0 \amp 18 \amp 18 \end{bmatrix} \end{equation*}
and the eigenvalues of \(A^{\intercal}A\) is found by
\begin{align*} \det(A^{\intercal}A-\lambda I) \amp = \begin{vmatrix} 8-\lambda \amp 0 \amp 0 \\ 0 \amp 18-\lambda \amp 18 \\ 0 \amp 18 \amp 18-\lambda \end{vmatrix} = (8-\lambda)\bigl((18-\lambda)^2-18^2\bigr)\\ \amp = (8-\lambda)(\lambda^2-36\lambda) = 0 \end{align*}
which has the solutions \(\lambda=8,36,0\text{.}\) Thus the singular values result in
\begin{equation*} \Sigma = \begin{bmatrix} \sqrt{8} \amp 0 \amp 0 \\ 0 \amp 6 \amp 0 \end{bmatrix} \end{equation*}
and recall that this matrix is always the same size as the original matrix, \(A\text{.}\) Also, you will get zero eigenvalues whenever the matrix is not square. The corresponding eigenvectors are
\begin{align*} \vec{v}_1 \amp = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \amp \vec{v}_2 \amp = \begin{bmatrix} 0 \\ 1 \\1 \end{bmatrix} \amp \vec{v}_3 \amp = \begin{bmatrix} 0 \\ -1 \\ 1 \end{bmatrix} \end{align*}
And the matrix \(V\) with the columns above scaled to make them unit vectors is
\begin{equation*} V = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 1/\sqrt{2} \amp -1/\sqrt{2} \\ 0 \amp 1/\sqrt{2} \amp 1/\sqrt{2} \end{bmatrix} \end{equation*}
To find \(U\text{,}\) we will use ((5.4.4)) and note that
\begin{equation*} (\Sigma \Sigma^{\intercal})^{-1} = \begin{bmatrix} 1/8 \amp 0 \\ 0 \amp 1/36 \end{bmatrix} \end{equation*}
and now using ((5.4.4))
\begin{align*} U \amp = A V \Sigma^{\intercal} (\Sigma \Sigma^{\intercal})^{-1} \\ \amp = \begin{bmatrix} 2 \amp -3 \amp -3 \\ 2 \amp 3 \amp 3 \end{bmatrix} \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp 1/\sqrt{2} \amp -1/\sqrt{2} \\ 0 \amp 1/\sqrt{2} \amp 1/\sqrt{2} \end{bmatrix}\begin{bmatrix} \sqrt{8} \amp 0 \\ 0 \amp 6 \\ 0 \amp 0 \end{bmatrix}\begin{bmatrix} 1/8 \amp 0 \\ 0 \amp 1/36 \end{bmatrix} \\ \amp = \begin{bmatrix} \sqrt{2}/2 \amp \sqrt{2}/2 \\ \sqrt{2}/2 \amp -\sqrt{2}/2 \end{bmatrix} \end{align*}
and it can be checked that \(U\Sigma V^{\intercal} = A\text{.}\)
It is noted that the above two examples result in fairly nice matrices \(U,V\) and \(\Sigma\text{.}\) This very rarely happens. In addition, as the size of the matrices grow, the eigenvalues generally cannot be found exactly. Therefore for matrices of sizes larger than 3 by 3, numerical techniques are used to find the factors.

Subsection 5.4.1 The geometry of the SVD

This section will explain in geometric terms what the SVD generates for an example that can be plotted in 2D. First, let’s consider 100 points that are scattered around the origin and perhaps they look like the following:
Figure 5.4.4. Some Random Points
the matrix is a 2 by 100 matrix with each column the \(x\) and \(y\) value of each point, we can perform an SVD of this. The results for \(U\) and the 2 diagonal elements of \(\Sigma\) (denoted \(\sigma_1\) and \(\sigma_2\)) are
\begin{align*} \sigma_1 \amp = 54.51177 \amp \sigma_2 \amp = 5.576430 \amp U \amp = \begin{bmatrix} -0.953478 \amp -0.30146\\ -0.30146 \amp 0.953478 \end{bmatrix} \end{align*}
where these have been done using software (not by hand). If we plot the two vectors in the columns of \(U\) with relative sizes of those found in \(\Sigma\text{,}\) you see in the plot above that the larger vector points in the direction of stretch of the points and the second vector is orthogonal. The relative lengths are the relative stretches of the points. The geometry of the columns in the \(V\) matrix is more difficult to understand in this situation because each of the vectors are length 100. If the original matrix had 3 rows, then each column is a point in 3 dimensions. The 3 columns in \(U\) would again show the stretching of the points in 3 orthogonal directions with the stretching factor in the \(\Sigma\) matrix.

Subsection 5.4.2 Finding a best-fit line and plane using SVD

You might notice from above that perhaps we can use the SVD to find a line to a set of points and this is true. We will use a simpler example to show how this work and how it is similar to other fitting techniques. We can also use a similar technique to find the best-fit plane through points in 3 dimensions. We first start with a simple example of points in the plan. Consider the points (1,2), (2,2), (3,5), (6,7). We wish to fit a line that best fits the points.
Figure 5.4.5. Best fit points
First, we will find the center of these points by just finding the average of the \(x\) and \(y\) values.
\begin{align*} \overline{x} \amp = \frac{1+2+3+6}{4} = 3 \amp \overline{y} \amp = \frac{2+2+5+7}{4} = 4 \end{align*}
We now generate a new data set where each \(x\) and \(y\) value is adjusted by the means above. The new data set is \(\{(-2,-2),(-1,-2),(0,1),(3,3)\}\text{.}\) We then use these values to find
\begin{equation*} A = \begin{bmatrix} -2 \amp -1 \amp 0 \amp 3 \\ -2 \amp -2 \amp 1 \amp 3 \end{bmatrix} \end{equation*}
As above, the singular values in \(\Sigma\) and the matrix \(U\) will be most important. As above, the matrix \(U\) have the eigenvectors of \(AA^{\intercal}\) with the eigenvalues as the diagonal elements of \(\Sigma\text{.}\)
\begin{equation*} AA^{\intercal} = \begin{bmatrix} 14 \amp 15 \\ 15 \amp 18 \end{bmatrix} \end{equation*}
Again, using software, the diagonal elements of \(\Sigma\text{,}\) labelled \(\sigma_1\) and \(\sigma_2\) and the matrix \(U\) is
\begin{align*} \sigma_1 \amp = 5.5796 \amp \sigma_2 \amp = 0.93126 \amp U \amp = \begin{bmatrix} -0.658724 \amp -0.752384 \\ -0.752384 \amp 0.658725 \end{bmatrix} \end{align*}
The prominent eigenvector is the one corresponding to the larger singular value (the first one), so the eigenvalue to use is the first column of \(U\text{.}\) The slope of the line will be the \(y\)-component over the \(x\)-component (rise over run) or
\begin{equation*} m = \frac{-0.752384}{-0.658724} = 1.1421830 \end{equation*}
Lastly, to find the line, we will use the point-slope form with the average values (or center point) of \((3,4)\text{:}\)
\begin{align*} y-4 \amp = 1.1428130 (x-3)\\ y \amp = 1.1428130 x + 0.571561 \end{align*}
and the following is a plot of the data and the line:
Figure 5.4.6. Best fit line using SVD
And as a comparison, if we use least-squares to find the line, the result is
\begin{equation*} m = 1.0714x+ 0.285714 \end{equation*}
The reason for the difference is because each method minimizes something different. The least-squares line is the one that minimizes the vertical distance between the data and the line. The SVD best-fit line produces a line that minimizes the square of the orthogonal distance to all the points.