6.5: The Method of Least Squares (2024)

Last updated
Save as PDF

Page ID: 79396

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\)

\( \newcommand{\vectorC}[1]{\textbf{#1}}\)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}}\)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}\)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{#1}}} \)

Learning Objectives

Learn examples of best-fit problems.
Learn to turn a best-fit problem into a least-squares problem.
Recipe: find a least-squares solution (two ways). solution.
Picture: geometry of a least-squares solution.
Vocabulary words: least-squares solution.

In this section, we answer the following important question:

Suppose that \(Ax=b\) does not have a solution. What is the best approximate solution?

For our purposes, the best approximate solution is called the least-squares solution. We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.

Least-Squares Solutions

We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation \(Ax=b\).

Definition \(\PageIndex{1}\): Least-Squares Solution

Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^m \). A least-squares solution of the matrix equation \(Ax=b\) is a vector \(\hat x\) in \(\mathbb{R}^n \) such that

\[ \text{dist}(b,\,A\hat x) \leq \text{dist}(b,\,Ax) \nonumber \]

for all other vectors \(x\) in \(\mathbb{R}^n \).

Recall that \(\text{dist}(v,w) = \|v-w\|\) is the distance, Definition 6.1.2in Section 6.1, between the vectors \(v\) and \(w\). The term “least squares” comes from the fact that \(\text{dist}(b,Ax) = \|b-A\hat x\|\) is the square root of the sum of the squares of the entries of the vector \(b-A\hat x\). So a least-squares solution minimizes the sum of the squares of the differences between the entries of \(A\hat x\) and \(b\). In other words, a least-squares solution solves the equation \(Ax=b\) as closely as possible, in the sense that the sum of the squares of the difference \(b-Ax\) is minimized.

Least Squares: Picture

Suppose that the equation \(Ax=b\) is inconsistent. Recall from Note 2.3.6in Section 2.3that the column space of \(A\) is the set of all other vectors \(c\) such that \(Ax=c\) is consistent. In other words, \(\text{Col}(A)\) is the set of all vectors of the form \(Ax.\) Hence, the closest vector, Note 6.3.1 in Section 6.3, of the form \(Ax\) to \(b\) is the orthogonal projection of \(b\) onto \(\text{Col}(A)\). This is denoted \(b_{\text{Col}(A)}\text{,}\) following Definition6.3.1 in Section 6.3.

Figure \(\PageIndex{1}\)

Note \(\PageIndex{1}\)

A least-squares solution of \(Ax=b\) is a solution \(\hat x\) of the consistent equation \(Ax=b_{\text{Col}(A)}\)

Note \(\PageIndex{2}\)

If \(Ax=b\) is consistent, then \(b_{\text{Col}(A)} = b\text{,}\) so that a least-squares solution is the same as a usual solution.

Where is \(\hat x\) in this picture? If \(v_1,v_2,\ldots,v_n\) are the columns of \(A\text{,}\) then

\[ A\hat x = A\left(\begin{array}{c}\hat{x}_1 \\ \hat{x}_2 \\ \vdots \\ \hat{x}_{n}\end{array}\right)= \hat x_1v_1 + \hat x_2v_2 + \cdots + \hat x_nv_n. \nonumber \]

Hence the entries of \(\hat x\) are the “coordinates” of \(b_{\text{Col}(A)}\) with respect to the spanning set \(\{v_1,v_2,\ldots,v_m\}\) of \(\text{Col}(A)\). (They are honest \(\mathcal{B}\)-coordinates if the columns of \(A\) are linearly independent.)

Figure \(\PageIndex{2}\)

We learned to solve this kind of orthogonal projection problem in Section 6.3.

Theorem \(\PageIndex{1}\)

Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^m \). The least-squares solutions of \(Ax=b\) are the solutions of the matrix equation

\[ A^TAx = A^Tb \nonumber \]

Proof: By Theorem6.3.2in Section 6.3, if \(\hat x\) is a solution of the matrix equation \(A^TAx = A^Tb\text{,}\) then \(A\hat x\) is equal to \(b_{\text{Col}(A)}\). We argued above that a least-squares solution of \(Ax=b\) is a solution of \(Ax = b_{\text{Col}(A)}.\)

In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:

Recipe 1: Compute a Least-Squares Solution

Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^n \). Here is a method for computing a least-squares solution of \(Ax=b\text{:}\)

Compute the matrix \(A^TA\) and the vector \(A^Tb\).
Form the augmented matrix for the matrix equation \(A^TAx = A^Tb\text{,}\) and row reduce.
This equation is always consistent, and any solution \(\hat x\) is a least-squares solution.

To reiterate: once you have found a least-squares solution \(\hat x\) of \(Ax=b\text{,}\) then \(b_{\text{Col}(A)}\) is equal to \(A\hat x\).

Example \(\PageIndex{1}\)

Find the least-squares solutions of \(Ax=b\) where:

\[ A = \left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right)\qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

What quantity is being minimized?

Solution

We have

\[ A^TA = \left(\begin{array}{ccc}0&1&2\\1&1&1\end{array}\right)\left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) = \left(\begin{array}{cc}5&3\\3&3\end{array}\right) \nonumber \]

and

Example \(\PageIndex{2}\)

Find the least-squares solutions of \(Ax=b\) where:

\[ A = \left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right)\qquad b = \left(\begin{array}{c}1\\0\\-1\end{array}\right). \nonumber \]

Solution

We have

\[ A^T A = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right)= \left(\begin{array}{cc}5&-1\\-1&5\end{array}\right)\nonumber \]

and

\[ A^T b = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{c}1\\0\\-1\end{array}\right)= \left(\begin{array}{c}2\\-2\end{array}\right). \nonumber \]

We form an augmented matrix and row reduce:

\[ \left(\begin{array}{cc|c}5&-1&2\\-1&5&-2\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{cc|c}1&0&1/3\\0&1&-1/3\end{array}\right). \nonumber \]

Therefore, the only least-squares solution is \(\hat x = \frac 13{1\choose -1}.\)

The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of \(A\) are linearly dependent, then \(Ax=b_{\text{Col}(A)}\) has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of Corollary6.3.1 in Section 6.3.

Theorem \(\PageIndex{2}\)

Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^m \). The following are equivalent:

\(Ax=b\) has a unique least-squares solution.
The columns of \(A\) are linearly independent.
\(A^TA\) is invertible.

In this case, the least-squares solution is

\[ \hat x = (A^TA)^{-1} A^Tb. \nonumber \]

Proof: The set of least-squares solutions of \(Ax=b\) is the solution set of the consistent equation \(A^TAx=A^Tb\text{,}\) which is a translate of the solution set of the hom*ogeneous equation \(A^TAx=0\). Since \(A^TA\) is a square matrix, the equivalence of 1 and 3 follows from Theorem5.1.1 in Section 5.1. The set of least squares-solutions is also the solution set of the consistent equation \(Ax = b_{\text{Col}(A)}\text{,}\) which has a unique solution if and only if the columns of \(A\) are linearly independent by Recipe: Checking Linear Independence in Section 2.5.

Example \(\PageIndex{3}\): Infinitely many least-squares solutions

Find the least-squares solutions of \(Ax=b\) where:

\[ A = \left(\begin{array}{ccc}1&0&1\\1&1&-1\\1&2&-3\end{array}\right)\qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

Solution

We have

\[ A^TA = \left(\begin{array}{ccc}3&3&-3\\3&5&-7\\-3&-7&11\end{array}\right)\qquad A^Tb = \left(\begin{array}{c}6\\0\\6\end{array}\right). \nonumber \]

We form an augmented matrix and row reduce:

\[ \left(\begin{array}{ccc|c}3&3&-3&6\\3&5&-7&0\\-3&-7&11&6\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&1&5\\0&1&-2&-3\\0&0&0&0\end{array}\right). \nonumber \]

The free variable is \(x_3\text{,}\) so the solution set is

\[\left\{\begin{array}{rrrrr}x_1 &=& -x_3 &+& 5\\x_2 &=& 2x_3 &-& 3\\x_3 &=& x_3&{}&{}\end{array}\right. \quad\xrightarrow[\text{vector form}]{\text{parametric}}\quad \hat{x}=\left(\begin{array}{c}x_1\\x_2\\x_3\end{array}\right)=x_3\left(\begin{array}{c}-1\\2\\1\end{array}\right)+\left(\begin{array}{c}5\\-3\\0\end{array}\right).\nonumber\]

For example, taking \(x_3 = 0\) and \(x_3=1\) gives the least-squares solutions

\[ \hat x = \left(\begin{array}{c}5\\-3\\0\end{array}\right)\quad\text{and}\quad\hat x =\left(\begin{array}{c}4\\-1\\1\end{array}\right). \nonumber \]

Geometrically, we see that the columns \(v_1,v_2,v_3\) of \(A\) are coplanar:

Figure \(\PageIndex{7}\)

Therefore, there are many ways of writing \(b_{\text{Col}(A)}\) as a linear combination of \(v_1,v_2,v_3\).

As usual, calculations involving projections become easier in the presence of an orthogonal set. Indeed, if \(A\) is an \(m\times n\) matrix with orthogonal columns \(u_1,u_2,\ldots,u_m\text{,}\) then we can use the Theorem6.4.1 in Section 6.4to write

\[ b_{\text{Col}(A)} = \frac{b\cdot u_1}{u_1\cdot u_1}\,u_1 + \frac{b\cdot u_2}{u_2\cdot u_2}\,u_2 + \cdots + \frac{b\cdot u_m}{u_m\cdot u_m}\,u_m = A\left(\begin{array}{c}(b\cdot u_1)/(u_1\cdot u_1) \\ (b\cdot u_2)/(u_2\cdot u_2) \\ \vdots \\ (b\cdot u_m)/(u_m\cdot u_m)\end{array}\right). \nonumber \]

Note that the least-squares solution is unique in this case, since an orthogonal set is linearly independent, Fact 6.4.1 in Section 6.4.

Recipe 2: Compute a Least-Squares Solution

Let \(A\) be an \(m\times n\) matrix with orthogonal columns \(u_1,u_2,\ldots,u_m\text{,}\) and let \(b\) be a vector in \(\mathbb{R}^n \). Then the least-squares solution of \(Ax=b\) is the vector

\[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \ldots,\; \frac{b\cdot u_m}{u_m\cdot u_m} \right). \nonumber \]

This formula is particularly useful in the sciences, as matrices with orthogonal columns often arise in nature.

Example \(\PageIndex{4}\)

Find the least-squares solution of \(Ax=b\) where:

\[ A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right) \qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber \]

Solution

Let \(u_1,u_2,u_3\) be the columns of \(A\). These form an orthogonal set, so

\[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \frac{b\cdot u_3}{u_3\cdot u_3} \right) = \left(\frac{-3}{2},\;\frac{-3}{2},\;\frac{8}{4}\right) = \left(-\frac32,\;-\frac32,\;2\right). \nonumber \]

Compare Example6.4.8in Section 6.4.

Best-Fit Problems

In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.

Example \(\PageIndex{5}\): Best-Fit Line

Suppose that we have measured three data points

\[ (0,6),\quad (1,0),\quad (2,0), \nonumber \]

and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

Figure \(\PageIndex{9}\)

The general equation for a (non-vertical) line is

Example \(\PageIndex{6}\): Interactive: Best-fit line

Example \(\PageIndex{7}\): Best-fit parabola

Find the parabola that best approximates the data points

\[ (-1,\,1/2),\quad(1,\,-1),\quad(2,\,-1/2),\quad(3,\,2). \nonumber \]

Figure \(\PageIndex{13}\)

What quantity is being minimized?

Solution

The general equation for a parabola is

\[ y = Bx^2 + Cx + D. \nonumber \]

If the four points were to lie on this parabola, then the following equations would be satisfied:

\[\begin{align} \frac{1}{2}&=B(-1)^2+C(-1)+D\nonumber \\ -1&=B(1)^2+C(1)+D\nonumber \\ -\frac{1}{2}&=B(2)^2+C(2)+D\label{eq:2} \\ 2&=B(3)^2+C(3)+D.\nonumber\end{align}\]

We treat this as a system of equations in the unknowns \(B,C,D\). In matrix form, we can write this as \(Ax=b\) for

\[ A = \left(\begin{array}{ccc}1&-1&1\\1&1&1\\4&2&1\\9&3&1\end{array}\right)\qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right)\qquad b = \left(\begin{array}{c}1/2 \\ -1\\-1/2 \\ 2\end{array}\right). \nonumber \]

We find a least-squares solution by multiplying both sides by the transpose:

\[ A^TA = \left(\begin{array}{ccc}99&35&15\\35&15&5\\15&5&4\end{array}\right)\qquad A^Tb = \left(\begin{array}{c}31/2\\7/2\\1\end{array}\right), \nonumber \]

then forming an augmented matrix and row reducing:

\[ \left(\begin{array}{ccc|c}99&35&15&31/2 \\ 35&15&5&7/2 \\ 15&5&4&1\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&0&53/88 \\ 0&1&0&-379/440 \\ 0&0&1&-41/44\end{array}\right)\implies \hat x = \left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44 \end{array}\right). \nonumber \]

The best-fit parabola is

\[ y = \frac{53}{88}x^2 - \frac{379}{440}x - \frac{41}{44}. \nonumber \]

Multiplying through by \(88\text{,}\) we can write this as

\[ 88y = 53x^2 - \frac{379}{5}x - 82. \nonumber \]

Figure \(\PageIndex{14}\)

Now we consider what exactly the parabola \(y = f(x)\) is minimizing. The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\). The vector \(b\) is the left-hand side of \(\eqref{eq:2}\), and

\[A\hat{x}=\left(\begin{array}{c} \frac{53}{88}(-1)^2-\frac{379}{440}(-1)-\frac{41}{44} \\ \frac{53}{88}(1)^2-\frac{379}{440}(1)-\frac{41}{44} \\ \frac{53}{88}(2)^2-\frac{379}{440}(2)-\frac{41}{44} \\ \frac{53}{88}(3)^2-\frac{379}{440}(3)-\frac{41}{44}\end{array}\right)=\left(\begin{array}{c}f(-1) \\ f(1) \\ f(2) \\ f(3)\end{array}\right).\nonumber\]

In other words, \(A\hat x\) is the vector whose entries are the \(y\)-coordinates of the graph of the parabola at the values of \(x\) we specified in our data points, and \(b\) is the vector whose entries are the \(y\)-coordinates of those data points. The difference \(b-A\hat x\) is the vertical distance of the graph from the data points:

Figure \(\PageIndex{15}\)

\[\color{blue}{b-A\hat{x}=\left(\begin{array}{c}1/2 \\ -1 \\ -1/2 \\ 2\end{array}\right)-A\left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44\end{array}\right)=\left(\begin{array}{c}-7/220 \\ 21/110 \\ -14/55 \\ 21/220\end{array}\right)}\nonumber\]

The best-fit parabola minimizes the sum of the squares of these vertical distances.

Example \(\PageIndex{8}\): Best-fit linear function

Find the linear function \(f(x,y)\) that best approximates the following data:

\[ \begin{array}{r|r|c} x & y & f(x,y) \\\hline 1 & 0 & 0 \\ 0 & 1 & 1 \\ -1 & 0 & 3 \\ 0 & -1 & 4 \end{array} \nonumber \]

What quantity is being minimized?

Solution

The general equation for a linear function in two variables is

\[ f(x,y) = Bx + Cy + D. \nonumber \]

We want to solve the following system of equations in the unknowns \(B,C,D\text{:}\)

\[\begin{align} B(1)+C(0)+D&=0 \nonumber \\ B(0)+C(1)+D&=1 \nonumber \\ B(-1)+C(0)+D&=3\label{eq:3} \\ B(0)+C(-1)+D&=4\nonumber\end{align}\]

In matrix form, we can write this as \(Ax=b\) for

\[ A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right)\qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right)\qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber \]

We observe that the columns \(u_1,u_2,u_3\) of \(A\) are orthogonal, so we can use Recipe 2: Compute a Least-Squares Solution:

\[ A^TA = \left(\begin{array}{ccc}2&0&0\\0&2&0\\0&0&4\end{array}\right)\qquad A^Tb = \left(\begin{array}{c}-3\\-3\\8\end{array}\right). \nonumber \] The matrix \(A^TA\) is diagonal (do you see why that happened?), so it is easy to solve the equation \(A^TAx = A^Tb\text{:}\)

\[ \left(\begin{array}{cccc}2&0&0&-3 \\ 0&2&0&-3 \\ 0&0&4&8\end{array}\right) \xrightarrow{\text{RREF}} \left(\begin{array}{cccc}1&0&0&-3/2 \\ 0&1&0&-3/2 \\ 0&0&1&2\end{array}\right)\implies \hat x = \left(\begin{array}{c}-3/2 \\ -3/2 \\ 2\end{array}\right).\nonumber \] Therefore, the best-fit linear equation is

\[ f(x,y) = -\frac 32x - \frac32y + 2. \nonumber \]

Here is a picture of the graph of \(f(x,y)\text{:}\)

Figure \(\PageIndex{17}\)

Now we consider what quantity is being minimized by the function \(f(x,y)\). The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\). The vector \(b\) is the right-hand side of \(\eqref{eq:3}\), and

\[A\hat{x}=\left(\begin{array}{rrrrr}-\frac32(1)&-& \frac32(0)&+& 2\\-\frac32(0)&-& \frac32(1) &+& 2\\-\frac32(-1) &-& \frac32(0) &+& 2\\-\frac32(0) &-& \frac32(-1) &+& 2\end{array}\right)=\left(\begin{array}{c}f(1,0) \\ f(0,1) \\ f(-1,0) \\ f(0,-1)\end{array}\right).\nonumber\]

In other words, \(A\hat x\) is the vector whose entries are the values of \(f\) evaluated on the points \((x,y)\) we specified in our data table, and \(b\) is the vector whose entries are the desired values of \(f\) evaluated at those points. The difference \(b-A\hat x\) is the vertical distance of the graph from the data points, as indicated in the above picture. The best-fit linear function minimizes the sum of these vertical distances.

All of the above examples have the following form: some number of data points \((x,y)\) are specified, and we want to find a function

\[ y = B_1g_1(x) + B_2g_2(x) + \cdots + B_mg_m(x) \nonumber \]

that best approximates these points, where \(g_1,g_2,\ldots,g_m\) are fixed functions of \(x\). Indeed, in the best-fit line example we had \(g_1(x)=x\) and \(g_2(x)=1\text{;}\) in the best-fit parabola example we had \(g_1(x)=x^2\text{,}\) \(g_2(x)=x\text{,}\) and \(g_3(x)=1\text{;}\) and in the best-fit linear function example we had \(g_1(x_1,x_2)=x_1\text{,}\) \(g_2(x_1,x_2)=x_2\text{,}\) and \(g_3(x_1,x_2)=1\) (in this example we take \(x\) to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns \(B_1,B_2,\ldots,B_m\)—once we evaluate the \(g_i\text{,}\) they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of \(y = f(x)\) to our original data points.

To emphasize that the nature of the functions \(g_i\) really is irrelevant, consider the following example.

Example \(\PageIndex{9}\): Best-fit trigonometric function

What is the best-fit function of the form

\[ y=B+C\cos(x)+D\sin(x)+E\cos(2x)+F\sin(2x)+G\cos(3x)+H\sin(3x) \nonumber \]

passing through the points

\[ \left(\begin{array}{c}-4\\ -1\end{array}\right),\;\left(\begin{array}{c}-3\\ 0\end{array}\right),\; \left(\begin{array}{c}-2\\ -1.5\end{array}\right),\; \left(\begin{array}{c}-1\\ .5\end{array}\right),\; \left(\begin{array}{c}0\\1\end{array}\right),\; \left(\begin{array}{c}1\\-1\end{array}\right),\; \left(\begin{array}{c}2\\-.5\end{array}\right),\; \left(\begin{array}{c}3\\2\end{array}\right),\; \left(\begin{array}{c}4 \\-1\end{array}\right)? \nonumber \]

Figure \(\PageIndex{19}\)

Solution

We want to solve the system of equations

\[\begin{array}{rrrrrrrrrrrrrrr}-1 &=& B &+& C\cos(-4) &+& D\sin(-4) &+& E\cos(-8) &+& F\sin(-8) &+& G\cos(-12) &+& H\sin(-12)\\0 &=& B &+& C\cos(-3) &+& D\sin(-3) &+& E\cos(-6) &+& F\sin(-6) &+& G\cos(-9) &+& H\sin(-9)\\-1.5 &=& B &+& C\cos(-2) &+& D\sin(-2) &+& E\cos(-4) &+& F\sin(-4) &+& G\cos(-6) &+& H\sin(-6) \\ 0.5 &=& B &+& C\cos(-1) &+& D\sin(-1) &+& E\cos(-2) &+& F\sin(-2) &+& G\cos(-3) &+& H\sin(-3)\\1 &=& B &+& C\cos(0) &+& D\sin(0) &+& E\cos(0) &+& F\sin(0) &+& G\cos(0) &+& H\sin(0)\\-1 &=& B &+& C\cos(1) &+& D\sin(1) &+& E\cos(2) &+& F\sin(2) &+& G\cos(3) &+& H\sin(3)\\-0.5 &=& B &+& C\cos(2) &+& D\sin(2) &+& E\cos(4) &+& F\sin(4) &+& G\cos(6) &+& H\sin(6)\\2 &=& B &+& C\cos(3) &+& D\sin(3) &+& E\cos(6) &+& F\sin(6) &+& G\cos(9) &+& H\sin(9)\\-1 &=& B &+& C\cos(4) &+& D\sin(4) &+& E\cos(8) &+& F\sin(8) &+& G\cos(12) &+& H\sin(12).\end{array}\nonumber\]

All of the terms in these equations are numbers, except for the unknowns \(B,C,D,E,F,G,H\text{:}\)

\[\begin{array}{rrrrrrrrrrrrrrr}-1 &=& B &-&0.6536C&+& 0.7568D &-& 0.1455E &-& 0.9894F &+& 0.8439G &+& 0.5366H\\0&=& B &-& 0.9900C &-& 0.1411D &+& 0.9602E &+& 0.2794F &-& 0.9111G&-& 0.4121H\\-1.5 &=& B &-& 0.4161C &-& 0.9093D &-& 0.6536E &+& 0.7568F &+& 0.9602G &+& 0.2794H\\0.5 &=& B &+& 0.5403C &-& 0.8415D &-& 0.4161E &-& 0.9093F &-& 0.9900G &-& 0.1411H\\1&=&B&+&C&{}&{}&+&E&{}&{}&+&G&{}&{}\\-1 &=& B &+& 0.5403C &+& 0.8415D &-& 0.4161E &+& 0.9093F &-& 0.9900G &+& 0.1411H\\-0.5&=& B &-& 0.4161C &+& 0.9093D &-& 0.6536E &-& 0.7568F &+& 0.9602G &-& 0.2794H\\2 &=& B &-& 0.9900C &+& 0.1411D &+& 0.9602E &-& 0.2794F &-& 0.9111G &+& 0.4121H\\-1 &=& B &-& 0.6536C &-& 0.7568D &-& 0.1455E &+& 0.9894F &+& 0.8439G &-& 0.5366H.\end{array}\nonumber\]

Hence we want to solve the least-squares problem

\[\left(\begin{array}{rrrrrrr}1 &-0.6536& 0.7568& -0.1455& -0.9894& 0.8439 &0.5366\\1& -0.9900& -0.1411 &0.9602 &0.2794& -0.9111& -0.4121\\1& -0.4161& -0.9093& -0.6536& 0.7568& 0.9602& 0.2794\\1& 0.5403& -0.8415&-0.4161& -0.9093& -0.9900 &-0.1411\\1& 1& 0& 1& 0& 1& 0\\1& 0.5403& 0.8415& -0.4161& 0.9093& -0.9900 &0.1411\\1& -0.4161& 0.9093& -0.6536& -0.7568& 0.9602& -0.2794\\1& -0.9900 &0.1411 &0.9602& -0.2794& -0.9111& 0.4121\\1& -0.6536& -0.7568& -0.1455& 0.9894 &0.8439 &-0.5366\end{array}\right)\left(\begin{array}{c}B\\C\\D\\E\\F\\G\\H\end{array}\right)=\left(\begin{array}{c}-1\\0\\-1.5\\0.5\\1\\-1\\-0.5\\2\\-1\end{array}\right).\nonumber\]

We find the least-squares solution with the aid of a computer:

\[\hat{x}\approx\left(\begin{array}{c}-0.1435 \\0.2611 \\-0.2337\\ 1.116\\ -0.5997\\ -0.2767 \\0.1076\end{array}\right).\nonumber\]

Therefore, the best-fit function is

\[ \begin{split} y \amp\approx -0.1435 + 0.2611\cos(x) -0.2337\sin(x) + 1.116\cos(2x) -0.5997\sin(2x) \\ \amp\qquad\qquad -0.2767\cos(3x) + 0.1076\sin(3x). \end{split} \nonumber \]

Figure \(\PageIndex{20}\)

As in the previous examples, the best-fit function minimizes the sum of the squares of the vertical distances from the graph of \(y = f(x)\) to the data points.

The next example has a somewhat different flavor from the previous ones.

Example \(\PageIndex{10}\): Best-fit ellipse

Find the best-fit ellipse through the points

\[ (0,2),\, (2,1),\, (1,-1),\, (-1,-2),\, (-3,1),\, (-1,-1). \nonumber \]

Figure \(\PageIndex{22}\)

What quantity is being minimized?

Solution

The general equation for an ellipse (actually, for a nondegenerate conic section) is

\[ x^2 + By^2 + Cxy + Dx + Ey + F = 0. \nonumber \]

This is an implicit equation: the ellipse is the set of all solutions of the equation, just like the unit circle is the set of solutions of \(x^2+y^2=1.\) To say that our data points lie on the ellipse means that the above equation is satisfied for the given values of \(x\) and \(y\text{:}\)

\[\label{eq:4} \begin{array}{rrrrrrrrrrrrl} (0)^2 &+& B(2)^2 &+& C(0)(2)&+& D(0) &+& E(2)&+& F&=& 0 \\ (2)^2 &+& B(1)^2 &+& C(2)(1)&+& D(2)&+&E(1)&+&F&=& 0 \\ (1)^2&+& B(-1)^2&+&C(1)(-1)&+&D(1)&+&E(-1)&+&F&=&0 \\ (-1)^2&+&B(-2)^2&+&C(-1)(2)&+&D(-1)&+&E(-2)&+&F&=&0 \\ (-3)^2&+&B(1)^2&+&C(-3)(1)&+&D(-3)&+&E(1)&+&F&=&0 \\ (-1)^2&+&B(-1)^2&+&C(-1)(-1)&+&D(-1)&+&E(-1)&+&F&=&0.\end{array}\]

To put this in matrix form, we move the constant terms to the right-hand side of the equals sign; then we can write this as \(Ax=b\) for

\[A=\left(\begin{array}{ccccc}4&0&0&2&1\\1&2&2&1&1\\1&-1&1&-1&1 \\ 4&2&-1&-2&1 \\ 1&-3&-3&1&1 \\ 1&1&-1&-1&1\end{array}\right)\quad x=\left(\begin{array}{c}B\\C\\D\\E\\F\end{array}\right)\quad b=\left(\begin{array}{c}0\\-4\\-1\\-1\\-9\\-1\end{array}\right).\nonumber\]

We compute

\[ A^TA = \left(\begin{array}{ccccc}36&7&-5&0&12 \\ 7&19&9&-5&1 \\ -5&9&16&1&-2 \\ 0&-5&1&12&0 \\ 12&1&-2&0&6\end{array}\right) \qquad A^T b = \left(\begin{array}{c}-19\\17\\20\\-9\\-16\end{array}\right). \nonumber \]

We form an augmented matrix and row reduce:

\[\left(\begin{array}{ccccc|c} 36 &7& -5& 0& 12&-19\\7& 19& 9& -5& 1& 17\\-5& 9& 16& 1& -2& 20\\0& -5& 1&12& 0& -9\\12& 1& -2& 0& 6& -16\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccccc|c}1& 0& 0& 0& 0& 405/266\\0& 1& 0& 0& 0& -89/133\\0& 0& 1& 0& 0& 201/133\\0& 0& 0& 1& 0& -123/266\\0& 0& 0& 0& 1& -687/133\end{array}\right).\nonumber\]

The least-squares solution is

\[\hat{x}=\left(\begin{array}{c}405/266\\ -89/133\\ 201/133\\ -123/266\\ -687/133\end{array}\right),\nonumber\]

so the best-fit ellipse is

\[ x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} = 0. \nonumber \]

Multiplying through by \(266\text{,}\) we can write this as

\[ 266 x^2 + 405 y^2 - 178 xy + 402 x - 123 y - 1374 = 0. \nonumber \]

Figure \(\PageIndex{23}\)

Now we consider the question of what quantity is minimized by this ellipse. The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\text{,}\) or equivalently, of \(A\hat x-b\). The vector \(-b\) contains the constant terms of the left-hand sides of \(\eqref{eq:4}\), and

\[A\hat{x}=\left(\begin{array}{rrrrrrrrr} \frac{405}{266}(2)^2 &-& \frac{89}{133}(0)(2)&+&\frac{201}{133}(0)&-&\frac{123}{266}(2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-& \frac{89}{133}(2)(1)&+&\frac{201}{133}(2)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2 &-&\frac{89}{133}(1)(-1)&+&\frac{201}{133}(1)&-&\frac{123}{266}(-1)&-&\frac{687}{133} \\ \frac{405}{266}(-2)^2&-&\frac{89}{133}(-1)(-2)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-&\frac{89}{133}(-3)(1)&+&\frac{201}{133}(-3)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2&-&\frac{89}{133}(-1)(-1)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-1)&-&\frac{687}{133}\end{array}\right)\nonumber\]

contains the rest of the terms on the left-hand side of \(\eqref{eq:4}\). Therefore, the entries of \(A\hat x-b\) are the quantities obtained by evaluating the function

\[ f(x,y) = x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} \nonumber \]

on the given data points.

If our data points actually lay on the ellipse defined by \(f(x,y)=0\text{,}\) then evaluating \(f(x,y)\) on our data points would always yield zero, so \(A\hat x-b\) would be the zero vector. This is not the case; instead, \(A\hat x-b\) contains the actual values of \(f(x,y)\) when evaluated on our data points. The quantity being minimized is the sum of the squares of these values:

\[ \begin{split} \amp\text{minimized} = \\ \amp\quad f(0,2)^2 + f(2,1)^2 + f(1,-1)^2 + f(-1,-2)^2 + f(-3,1)^2 + f(-1,-1)^2. \end{split} \nonumber \]

One way to visualize this is as follows. We can put this best-fit problem into the framework of Example \(\PageIndex{8}\)by asking to find an equation of the form

\[ f(x,y) = x^2 + By^2 + Cxy + Dx + Ey + F \nonumber \]

which best approximates the data table

\[ \begin{array}{r|r|c} x & y & f(x,y) \\\hline 0 & 2 & 0 \\ 2 & 1 & 0 \\ 1 & -1 & 0 \\ -1 & -2 & 0 \\ -3 & 1 & 0 \\ -1 & -1 & 0\rlap. \end{array} \nonumber \]

The resulting function minimizes the sum of the squares of the vertical distances from these data points \((0,2,0),\,(2,1,0),\,\ldots\text{,}\) which lie on the \(xy\)-plane, to the graph of \(f(x,y)\).

Note \(\PageIndex{3}\)

Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.

FAQs

What is the formula for least squares? ›

The least-squares regression line equation has two common forms: y = mx + b and y = a + bx. In a least-squares regression for y = mx + b, m = N ∑ ( x y ) − ∑ x ∑ y N ∑ ( x 2 ) − ( ∑ x ) 2 and b = ∑ y − m ∑ x N , where N is the number of data points, while x and y are the coordinates of the data points.

Show Me More ›

How do you solve the least square solution? ›

Here is a method for computing a least-squares solution of Ax = b :

Compute the matrix A T A and the vector A T b .
Form the augmented matrix for the matrix equation A T Ax = A T b , and row reduce.
This equation is always consistent, and any solution K x is a least-squares solution.

View Details ›

What is the least square method rule? ›

The method of least squares assumes that the best fit curve of a given type is the curve that has the minimal sum of deviations, i.e., least square error from a given set of data. According to the method of least squares, the best fitting curve has the property that ∑ 1 n e i 2 = ∑ 1 n [ y i − f ( x i ) ] 2 is minimum.

Keep Reading ›

How to calculate ordinary least squares? ›

Formula: The formula for the OLS method can be expressed as: min ∑ i = 1 n ( y i − ( a + b x ) ) 2 Here: refers to the observed value of the dependent variable, is the value of the independent variable, and are parameters to be estimated that represent the intercept and slope of the regression line respectively.

Get More Info Here ›

What is the method of least square called so? ›

The least-squares method can be defined as a statistical method that is used to find the equation of the line of best fit related to the given data. This method is called so as it aims at reducing the sum of squares of deviations as much as possible. The line obtained from such a method is called a regression line.

View Details ›

How are least square means calculated? ›

After the mean for each cell is calculated, the least squares means are simply the average of these means. The LS Mean for both treatment groups are identical.

Learn More ›

What is the method of least squares quizlet? ›

The method of least squares is a statistical way of separating a mixed cost into fixed costs and variable costs by fitting a line to a set of data using two points that are selected by judgment. c. The method of least squares separates mixed cost into fixed and variable costs by using the low and high data points.

What is total least square method? ›

The least squares and total least squares methods assess the fitting accuracy in different ways: the least squares method minimizes the sum of the squared vertical distances from the data points to the fitting line, while the total least squares method minimizes the sum of the squared orthogonal distances from the data ...

See Details ›

What is the method of least squares MLS? ›

In computer graphics, the moving least squares method is useful for reconstructing a surface from a set of points. Often it is used to create a 3D surface from a point cloud through either downsampling or upsampling.

What is the least square correction method? ›

Least-squares adjustment is a model for the solution of an overdetermined system of equations based on the principle of least squares of observation residuals. It is used extensively in the disciplines of surveying, geodesy, and photogrammetry—the field of geomatics, collectively.

What is the least squares estimation technique? ›

The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals (a residual being the difference between an observed value and the fitted value provided by a model) made in the results of each individual equation.

Explore More ›

What is the least squares adjustment method? ›

A least-squares adjustment uses statistical analysis to estimate the most likely coordinates for connected points in a measurement in a network. In ArcGIS Pro, a least-squares adjustment can be run on a parcel fabric using the following tools: Analyze Parcels By Least Squares Adjustment tool.

Find Out More ›

How to calculate trend value by least square method? ›

Calculate the mean of the x and y values.
Calculate the slope of the line of best fit using the formula: slope = (nΣ(xy) - ΣxΣy) / (nΣ(x^2) - (Σx)^2)
Calculate the y-intercept of the line of best fit using the formula: ...
Use the slope and y-intercept to calculate the trend values for each x value using the formula:

What is the method of generalized least squares? ›

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model.