6.5: The Method of Least Squares (2024)

  1. Last updated
  2. Save as PDF
  • Page ID
    79396
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\)

    \( \newcommand{\vectorC}[1]{\textbf{#1}}\)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}}\)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}\)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{#1}}} \)

    Learning Objectives
    1. Learn examples of best-fit problems.
    2. Learn to turn a best-fit problem into a least-squares problem.
    3. Recipe: find a least-squares solution (two ways). solution.
    4. Picture: geometry of a least-squares solution.
    5. Vocabulary words: least-squares solution.

    In this section, we answer the following important question:

    Suppose that \(Ax=b\) does not have a solution. What is the best approximate solution?

    For our purposes, the best approximate solution is called the least-squares solution. We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.

    Least-Squares Solutions

    We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation \(Ax=b\).

    Definition \(\PageIndex{1}\): Least-Squares Solution

    Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^m \). A least-squares solution of the matrix equation \(Ax=b\) is a vector \(\hat x\) in \(\mathbb{R}^n \) such that

    \[ \text{dist}(b,\,A\hat x) \leq \text{dist}(b,\,Ax) \nonumber \]

    for all other vectors \(x\) in \(\mathbb{R}^n \).

    Recall that \(\text{dist}(v,w) = \|v-w\|\) is the distance, Definition 6.1.2in Section 6.1, between the vectors \(v\) and \(w\). The term “least squares” comes from the fact that \(\text{dist}(b,Ax) = \|b-A\hat x\|\) is the square root of the sum of the squares of the entries of the vector \(b-A\hat x\). So a least-squares solution minimizes the sum of the squares of the differences between the entries of \(A\hat x\) and \(b\). In other words, a least-squares solution solves the equation \(Ax=b\) as closely as possible, in the sense that the sum of the squares of the difference \(b-Ax\) is minimized.

    Least Squares: Picture

    Suppose that the equation \(Ax=b\) is inconsistent. Recall from Note 2.3.6in Section 2.3that the column space of \(A\) is the set of all other vectors \(c\) such that \(Ax=c\) is consistent. In other words, \(\text{Col}(A)\) is the set of all vectors of the form \(Ax.\) Hence, the closest vector, Note 6.3.1 in Section 6.3, of the form \(Ax\) to \(b\) is the orthogonal projection of \(b\) onto \(\text{Col}(A)\). This is denoted \(b_{\text{Col}(A)}\text{,}\) following Definition6.3.1 in Section 6.3.

    6.5: The Method of Least Squares (1)

    Figure \(\PageIndex{1}\)

    Note \(\PageIndex{1}\)

    A least-squares solution of \(Ax=b\) is a solution \(\hat x\) of the consistent equation \(Ax=b_{\text{Col}(A)}\)

    Note \(\PageIndex{2}\)

    If \(Ax=b\) is consistent, then \(b_{\text{Col}(A)} = b\text{,}\) so that a least-squares solution is the same as a usual solution.

    Where is \(\hat x\) in this picture? If \(v_1,v_2,\ldots,v_n\) are the columns of \(A\text{,}\) then

    \[ A\hat x = A\left(\begin{array}{c}\hat{x}_1 \\ \hat{x}_2 \\ \vdots \\ \hat{x}_{n}\end{array}\right)= \hat x_1v_1 + \hat x_2v_2 + \cdots + \hat x_nv_n. \nonumber \]

    Hence the entries of \(\hat x\) are the “coordinates” of \(b_{\text{Col}(A)}\) with respect to the spanning set \(\{v_1,v_2,\ldots,v_m\}\) of \(\text{Col}(A)\). (They are honest \(\mathcal{B}\)-coordinates if the columns of \(A\) are linearly independent.)

    6.5: The Method of Least Squares (2)

    Figure \(\PageIndex{2}\)

    6.5: The Method of Least Squares (3)

    We learned to solve this kind of orthogonal projection problem in Section 6.3.

    Theorem \(\PageIndex{1}\)

    Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^m \). The least-squares solutions of \(Ax=b\) are the solutions of the matrix equation

    \[ A^TAx = A^Tb \nonumber \]

    Proof

    By Theorem6.3.2in Section 6.3, if \(\hat x\) is a solution of the matrix equation \(A^TAx = A^Tb\text{,}\) then \(A\hat x\) is equal to \(b_{\text{Col}(A)}\). We argued above that a least-squares solution of \(Ax=b\) is a solution of \(Ax = b_{\text{Col}(A)}.\)

    In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:

    Recipe 1: Compute a Least-Squares Solution

    Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^n \). Here is a method for computing a least-squares solution of \(Ax=b\text{:}\)

    1. Compute the matrix \(A^TA\) and the vector \(A^Tb\).
    2. Form the augmented matrix for the matrix equation \(A^TAx = A^Tb\text{,}\) and row reduce.
    3. This equation is always consistent, and any solution \(\hat x\) is a least-squares solution.

    To reiterate: once you have found a least-squares solution \(\hat x\) of \(Ax=b\text{,}\) then \(b_{\text{Col}(A)}\) is equal to \(A\hat x\).

    Example \(\PageIndex{1}\)

    Find the least-squares solutions of \(Ax=b\) where:

    \[ A = \left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right)\qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

    What quantity is being minimized?

    Solution

    We have

    \[ A^TA = \left(\begin{array}{ccc}0&1&2\\1&1&1\end{array}\right)\left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) = \left(\begin{array}{cc}5&3\\3&3\end{array}\right) \nonumber \]

    and

    \[ A^T b = \left(\begin{array}{ccc}0&1&2\\1&1&1\end{array}\right)\left(\begin{array}{c}6\\0\\0\end{array}\right)= \left(\begin{array}{c}0\\6\end{array}\right). \nonumber \]

    We form an augmented matrix and row reduce:

    \[ \left(\begin{array}{cc|c}5&3&0\\3&3&6\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{cc|c}1&0&-3\\0&1&5\end{array}\right). \nonumber \]

    Therefore, the only least-squares solution is \(\hat x = {-3\choose 5}.\)

    This solution minimizes the distance from \(A\hat x\) to \(b\text{,}\) i.e., the sum of the squares of the entries of \(b-A\hat x = b-b_{\text{Col}(A)} = b_{\text{Col}(A)^\perp}\). In this case, we have

    \[||b-A\hat{x}||=\left|\left|\left(\begin{array}{c}6\\0\\0\end{array}\right)-\left(\begin{array}{c}5\\2\\-1\end{array}\right)\right|\right|=\left|\left|\left(\begin{array}{c}1\\-2\\1\end{array}\right)\right|\right|=\sqrt{1^2+(-2)^2+1^2}=\sqrt{6}.\nonumber\]

    Therefore, \(b_{\text{Col}(A)} = A\hat x\) is \(\sqrt 6\) units from \(b.\)

    In the following picture, \(v_1,v_2\) are the columns of \(A\text{:}\)

    6.5: The Method of Least Squares (4)

    Figure \(\PageIndex{4}\)

    6.5: The Method of Least Squares (5)

    Example \(\PageIndex{2}\)

    Find the least-squares solutions of \(Ax=b\) where:

    \[ A = \left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right)\qquad b = \left(\begin{array}{c}1\\0\\-1\end{array}\right). \nonumber \]

    Solution

    We have

    \[ A^T A = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right)= \left(\begin{array}{cc}5&-1\\-1&5\end{array}\right)\nonumber \]

    and

    \[ A^T b = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{c}1\\0\\-1\end{array}\right)= \left(\begin{array}{c}2\\-2\end{array}\right). \nonumber \]

    We form an augmented matrix and row reduce:

    \[ \left(\begin{array}{cc|c}5&-1&2\\-1&5&-2\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{cc|c}1&0&1/3\\0&1&-1/3\end{array}\right). \nonumber \]

    Therefore, the only least-squares solution is \(\hat x = \frac 13{1\choose -1}.\)

    6.5: The Method of Least Squares (6)

    The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of \(A\) are linearly dependent, then \(Ax=b_{\text{Col}(A)}\) has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of Corollary6.3.1 in Section 6.3.

    Theorem \(\PageIndex{2}\)

    Let \(A\) be an \(m\times n\) matrix and let \(b\) be a vector in \(\mathbb{R}^m \). The following are equivalent:

    1. \(Ax=b\) has a unique least-squares solution.
    2. The columns of \(A\) are linearly independent.
    3. \(A^TA\) is invertible.

    In this case, the least-squares solution is

    \[ \hat x = (A^TA)^{-1} A^Tb. \nonumber \]

    Proof

    The set of least-squares solutions of \(Ax=b\) is the solution set of the consistent equation \(A^TAx=A^Tb\text{,}\) which is a translate of the solution set of the hom*ogeneous equation \(A^TAx=0\). Since \(A^TA\) is a square matrix, the equivalence of 1 and 3 follows from Theorem5.1.1 in Section 5.1. The set of least squares-solutions is also the solution set of the consistent equation \(Ax = b_{\text{Col}(A)}\text{,}\) which has a unique solution if and only if the columns of \(A\) are linearly independent by Recipe: Checking Linear Independence in Section 2.5.

    Example \(\PageIndex{3}\): Infinitely many least-squares solutions

    Find the least-squares solutions of \(Ax=b\) where:

    \[ A = \left(\begin{array}{ccc}1&0&1\\1&1&-1\\1&2&-3\end{array}\right)\qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

    Solution

    We have

    \[ A^TA = \left(\begin{array}{ccc}3&3&-3\\3&5&-7\\-3&-7&11\end{array}\right)\qquad A^Tb = \left(\begin{array}{c}6\\0\\6\end{array}\right). \nonumber \]

    We form an augmented matrix and row reduce:

    \[ \left(\begin{array}{ccc|c}3&3&-3&6\\3&5&-7&0\\-3&-7&11&6\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&1&5\\0&1&-2&-3\\0&0&0&0\end{array}\right). \nonumber \]

    The free variable is \(x_3\text{,}\) so the solution set is

    \[\left\{\begin{array}{rrrrr}x_1 &=& -x_3 &+& 5\\x_2 &=& 2x_3 &-& 3\\x_3 &=& x_3&{}&{}\end{array}\right. \quad\xrightarrow[\text{vector form}]{\text{parametric}}\quad \hat{x}=\left(\begin{array}{c}x_1\\x_2\\x_3\end{array}\right)=x_3\left(\begin{array}{c}-1\\2\\1\end{array}\right)+\left(\begin{array}{c}5\\-3\\0\end{array}\right).\nonumber\]

    For example, taking \(x_3 = 0\) and \(x_3=1\) gives the least-squares solutions

    \[ \hat x = \left(\begin{array}{c}5\\-3\\0\end{array}\right)\quad\text{and}\quad\hat x =\left(\begin{array}{c}4\\-1\\1\end{array}\right). \nonumber \]

    Geometrically, we see that the columns \(v_1,v_2,v_3\) of \(A\) are coplanar:

    6.5: The Method of Least Squares (7)

    Figure \(\PageIndex{7}\)

    Therefore, there are many ways of writing \(b_{\text{Col}(A)}\) as a linear combination of \(v_1,v_2,v_3\).

    6.5: The Method of Least Squares (8)

    As usual, calculations involving projections become easier in the presence of an orthogonal set. Indeed, if \(A\) is an \(m\times n\) matrix with orthogonal columns \(u_1,u_2,\ldots,u_m\text{,}\) then we can use the Theorem6.4.1 in Section 6.4to write

    \[ b_{\text{Col}(A)} = \frac{b\cdot u_1}{u_1\cdot u_1}\,u_1 + \frac{b\cdot u_2}{u_2\cdot u_2}\,u_2 + \cdots + \frac{b\cdot u_m}{u_m\cdot u_m}\,u_m = A\left(\begin{array}{c}(b\cdot u_1)/(u_1\cdot u_1) \\ (b\cdot u_2)/(u_2\cdot u_2) \\ \vdots \\ (b\cdot u_m)/(u_m\cdot u_m)\end{array}\right). \nonumber \]

    Note that the least-squares solution is unique in this case, since an orthogonal set is linearly independent, Fact 6.4.1 in Section 6.4.

    Recipe 2: Compute a Least-Squares Solution

    Let \(A\) be an \(m\times n\) matrix with orthogonal columns \(u_1,u_2,\ldots,u_m\text{,}\) and let \(b\) be a vector in \(\mathbb{R}^n \). Then the least-squares solution of \(Ax=b\) is the vector

    \[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \ldots,\; \frac{b\cdot u_m}{u_m\cdot u_m} \right). \nonumber \]

    This formula is particularly useful in the sciences, as matrices with orthogonal columns often arise in nature.

    Example \(\PageIndex{4}\)

    Find the least-squares solution of \(Ax=b\) where:

    \[ A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right) \qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber \]

    Solution

    Let \(u_1,u_2,u_3\) be the columns of \(A\). These form an orthogonal set, so

    \[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \frac{b\cdot u_3}{u_3\cdot u_3} \right) = \left(\frac{-3}{2},\;\frac{-3}{2},\;\frac{8}{4}\right) = \left(-\frac32,\;-\frac32,\;2\right). \nonumber \]

    Compare Example6.4.8in Section 6.4.

    Best-Fit Problems

    In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.

    Example \(\PageIndex{5}\): Best-Fit Line

    Suppose that we have measured three data points

    \[ (0,6),\quad (1,0),\quad (2,0), \nonumber \]

    and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

    6.5: The Method of Least Squares (9)

    Figure \(\PageIndex{9}\)

    The general equation for a (non-vertical) line is

    \[ y = Mx + B. \nonumber \]

    If our three data points were to lie on this line, then the following equations would be satisfied:

    \[\begin{align}6&=M\cdot 0+B\nonumber \\ 0&=M\cdot 1+B\label{eq:1} \\ 0&=M\cdot 2+B.\nonumber\end{align}\]

    In order to find the best-fit line, we try to solve the above equations in the unknowns \(M\) and \(B\). As the three points do not actually lie on a line, there is no actual solution, so instead we compute a least-squares solution.

    Putting our linear equations into matrix form, we are trying to solve \(Ax=b\) for

    \[ A = \left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right)\qquad x = \left(\begin{array}{c}M\\B\end{array}\right)\qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

    We solved this least-squares problem in Example \(\PageIndex{1}\): the only least-squares solution to \(Ax=b\) is \(\hat x = {M\choose B} = {-3\choose 5}\text{,}\) so the best-fit line is

    \[ y = -3x + 5. \nonumber \]

    6.5: The Method of Least Squares (10)

    Figure \(\PageIndex{10}\)

    What exactly is the line \(y= f(x) = -3x+5\) minimizing? The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\). The vector \(b\) is the left-hand side of \(\eqref{eq:1}\), and

    \[ A\left(\begin{array}{c}-3\\5\end{array}\right)= \left(\begin{array}{c}-3(0)+5\\-3(1)+5\\-3(2)+5\end{array}\right)= \left(\begin{array}{c}f(0)\\f(1)\\f(2)\end{array}\right). \nonumber \]

    In other words, \(A\hat x\) is the vector whose entries are the \(y\)-coordinates of the graph of the line at the values of \(x\) we specified in our data points, and \(b\) is the vector whose entries are the \(y\)-coordinates of those data points. The difference \(b-A\hat x\) is the vertical distance of the graph from the data points:

    6.5: The Method of Least Squares (11)

    Figure \(\PageIndex{11}\)

    \[\color{blue}{b-A\hat{x}=\left(\begin{array}{c}6\\0\\0\end{array}\right)-A\left(\begin{array}{c}-3\\5\end{array}\right)=\left(\begin{array}{c}-1\\2\\-1\end{array}\right)}\nonumber\]

    The best-fit line minimizes the sum of the squares of these vertical distances.

    Example \(\PageIndex{6}\): Interactive: Best-fit line

    6.5: The Method of Least Squares (12)

    Example \(\PageIndex{7}\): Best-fit parabola

    Find the parabola that best approximates the data points

    \[ (-1,\,1/2),\quad(1,\,-1),\quad(2,\,-1/2),\quad(3,\,2). \nonumber \]

    6.5: The Method of Least Squares (13)

    Figure \(\PageIndex{13}\)

    What quantity is being minimized?

    Solution

    The general equation for a parabola is

    \[ y = Bx^2 + Cx + D. \nonumber \]

    If the four points were to lie on this parabola, then the following equations would be satisfied:

    \[\begin{align} \frac{1}{2}&=B(-1)^2+C(-1)+D\nonumber \\ -1&=B(1)^2+C(1)+D\nonumber \\ -\frac{1}{2}&=B(2)^2+C(2)+D\label{eq:2} \\ 2&=B(3)^2+C(3)+D.\nonumber\end{align}\]

    We treat this as a system of equations in the unknowns \(B,C,D\). In matrix form, we can write this as \(Ax=b\) for

    \[ A = \left(\begin{array}{ccc}1&-1&1\\1&1&1\\4&2&1\\9&3&1\end{array}\right)\qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right)\qquad b = \left(\begin{array}{c}1/2 \\ -1\\-1/2 \\ 2\end{array}\right). \nonumber \]

    We find a least-squares solution by multiplying both sides by the transpose:

    \[ A^TA = \left(\begin{array}{ccc}99&35&15\\35&15&5\\15&5&4\end{array}\right)\qquad A^Tb = \left(\begin{array}{c}31/2\\7/2\\1\end{array}\right), \nonumber \]

    then forming an augmented matrix and row reducing:

    \[ \left(\begin{array}{ccc|c}99&35&15&31/2 \\ 35&15&5&7/2 \\ 15&5&4&1\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&0&53/88 \\ 0&1&0&-379/440 \\ 0&0&1&-41/44\end{array}\right)\implies \hat x = \left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44 \end{array}\right). \nonumber \]

    The best-fit parabola is

    \[ y = \frac{53}{88}x^2 - \frac{379}{440}x - \frac{41}{44}. \nonumber \]

    Multiplying through by \(88\text{,}\) we can write this as

    \[ 88y = 53x^2 - \frac{379}{5}x - 82. \nonumber \]

    6.5: The Method of Least Squares (14)

    Figure \(\PageIndex{14}\)

    Now we consider what exactly the parabola \(y = f(x)\) is minimizing. The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\). The vector \(b\) is the left-hand side of \(\eqref{eq:2}\), and

    \[A\hat{x}=\left(\begin{array}{c} \frac{53}{88}(-1)^2-\frac{379}{440}(-1)-\frac{41}{44} \\ \frac{53}{88}(1)^2-\frac{379}{440}(1)-\frac{41}{44} \\ \frac{53}{88}(2)^2-\frac{379}{440}(2)-\frac{41}{44} \\ \frac{53}{88}(3)^2-\frac{379}{440}(3)-\frac{41}{44}\end{array}\right)=\left(\begin{array}{c}f(-1) \\ f(1) \\ f(2) \\ f(3)\end{array}\right).\nonumber\]

    In other words, \(A\hat x\) is the vector whose entries are the \(y\)-coordinates of the graph of the parabola at the values of \(x\) we specified in our data points, and \(b\) is the vector whose entries are the \(y\)-coordinates of those data points. The difference \(b-A\hat x\) is the vertical distance of the graph from the data points:

    6.5: The Method of Least Squares (15)

    Figure \(\PageIndex{15}\)

    \[\color{blue}{b-A\hat{x}=\left(\begin{array}{c}1/2 \\ -1 \\ -1/2 \\ 2\end{array}\right)-A\left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44\end{array}\right)=\left(\begin{array}{c}-7/220 \\ 21/110 \\ -14/55 \\ 21/220\end{array}\right)}\nonumber\]

    The best-fit parabola minimizes the sum of the squares of these vertical distances.

    6.5: The Method of Least Squares (16)

    Example \(\PageIndex{8}\): Best-fit linear function

    Find the linear function \(f(x,y)\) that best approximates the following data:

    \[ \begin{array}{r|r|c} x & y & f(x,y) \\\hline 1 & 0 & 0 \\ 0 & 1 & 1 \\ -1 & 0 & 3 \\ 0 & -1 & 4 \end{array} \nonumber \]

    What quantity is being minimized?

    Solution

    The general equation for a linear function in two variables is

    \[ f(x,y) = Bx + Cy + D. \nonumber \]

    We want to solve the following system of equations in the unknowns \(B,C,D\text{:}\)

    \[\begin{align} B(1)+C(0)+D&=0 \nonumber \\ B(0)+C(1)+D&=1 \nonumber \\ B(-1)+C(0)+D&=3\label{eq:3} \\ B(0)+C(-1)+D&=4\nonumber\end{align}\]

    In matrix form, we can write this as \(Ax=b\) for

    \[ A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right)\qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right)\qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber \]

    We observe that the columns \(u_1,u_2,u_3\) of \(A\) are orthogonal, so we can use Recipe 2: Compute a Least-Squares Solution:

    \[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \frac{b\cdot u_3}{u_3\cdot u_3} \right) = \left(\frac{-3}{2},\;\frac{-3}{2},\;\frac{8}{4}\right) = \left(-\frac32,\;-\frac32,\;2\right). \nonumber \] We find a least-squares solution by multiplying both sides by the transpose:

    \[ A^TA = \left(\begin{array}{ccc}2&0&0\\0&2&0\\0&0&4\end{array}\right)\qquad A^Tb = \left(\begin{array}{c}-3\\-3\\8\end{array}\right). \nonumber \] The matrix \(A^TA\) is diagonal (do you see why that happened?), so it is easy to solve the equation \(A^TAx = A^Tb\text{:}\)

    \[ \left(\begin{array}{cccc}2&0&0&-3 \\ 0&2&0&-3 \\ 0&0&4&8\end{array}\right) \xrightarrow{\text{RREF}} \left(\begin{array}{cccc}1&0&0&-3/2 \\ 0&1&0&-3/2 \\ 0&0&1&2\end{array}\right)\implies \hat x = \left(\begin{array}{c}-3/2 \\ -3/2 \\ 2\end{array}\right).\nonumber \] Therefore, the best-fit linear equation is

    \[ f(x,y) = -\frac 32x - \frac32y + 2. \nonumber \]

    Here is a picture of the graph of \(f(x,y)\text{:}\)

    6.5: The Method of Least Squares (17)

    Figure \(\PageIndex{17}\)

    Now we consider what quantity is being minimized by the function \(f(x,y)\). The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\). The vector \(b\) is the right-hand side of \(\eqref{eq:3}\), and

    \[A\hat{x}=\left(\begin{array}{rrrrr}-\frac32(1)&-& \frac32(0)&+& 2\\-\frac32(0)&-& \frac32(1) &+& 2\\-\frac32(-1) &-& \frac32(0) &+& 2\\-\frac32(0) &-& \frac32(-1) &+& 2\end{array}\right)=\left(\begin{array}{c}f(1,0) \\ f(0,1) \\ f(-1,0) \\ f(0,-1)\end{array}\right).\nonumber\]

    In other words, \(A\hat x\) is the vector whose entries are the values of \(f\) evaluated on the points \((x,y)\) we specified in our data table, and \(b\) is the vector whose entries are the desired values of \(f\) evaluated at those points. The difference \(b-A\hat x\) is the vertical distance of the graph from the data points, as indicated in the above picture. The best-fit linear function minimizes the sum of these vertical distances.

    6.5: The Method of Least Squares (18)

    All of the above examples have the following form: some number of data points \((x,y)\) are specified, and we want to find a function

    \[ y = B_1g_1(x) + B_2g_2(x) + \cdots + B_mg_m(x) \nonumber \]

    that best approximates these points, where \(g_1,g_2,\ldots,g_m\) are fixed functions of \(x\). Indeed, in the best-fit line example we had \(g_1(x)=x\) and \(g_2(x)=1\text{;}\) in the best-fit parabola example we had \(g_1(x)=x^2\text{,}\) \(g_2(x)=x\text{,}\) and \(g_3(x)=1\text{;}\) and in the best-fit linear function example we had \(g_1(x_1,x_2)=x_1\text{,}\) \(g_2(x_1,x_2)=x_2\text{,}\) and \(g_3(x_1,x_2)=1\) (in this example we take \(x\) to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns \(B_1,B_2,\ldots,B_m\)—once we evaluate the \(g_i\text{,}\) they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of \(y = f(x)\) to our original data points.

    To emphasize that the nature of the functions \(g_i\) really is irrelevant, consider the following example.

    Example \(\PageIndex{9}\): Best-fit trigonometric function

    What is the best-fit function of the form

    \[ y=B+C\cos(x)+D\sin(x)+E\cos(2x)+F\sin(2x)+G\cos(3x)+H\sin(3x) \nonumber \]

    passing through the points

    \[ \left(\begin{array}{c}-4\\ -1\end{array}\right),\;\left(\begin{array}{c}-3\\ 0\end{array}\right),\; \left(\begin{array}{c}-2\\ -1.5\end{array}\right),\; \left(\begin{array}{c}-1\\ .5\end{array}\right),\; \left(\begin{array}{c}0\\1\end{array}\right),\; \left(\begin{array}{c}1\\-1\end{array}\right),\; \left(\begin{array}{c}2\\-.5\end{array}\right),\; \left(\begin{array}{c}3\\2\end{array}\right),\; \left(\begin{array}{c}4 \\-1\end{array}\right)? \nonumber \]

    6.5: The Method of Least Squares (19)

    Figure \(\PageIndex{19}\)

    Solution

    We want to solve the system of equations

    \[\begin{array}{rrrrrrrrrrrrrrr}-1 &=& B &+& C\cos(-4) &+& D\sin(-4) &+& E\cos(-8) &+& F\sin(-8) &+& G\cos(-12) &+& H\sin(-12)\\0 &=& B &+& C\cos(-3) &+& D\sin(-3) &+& E\cos(-6) &+& F\sin(-6) &+& G\cos(-9) &+& H\sin(-9)\\-1.5 &=& B &+& C\cos(-2) &+& D\sin(-2) &+& E\cos(-4) &+& F\sin(-4) &+& G\cos(-6) &+& H\sin(-6) \\ 0.5 &=& B &+& C\cos(-1) &+& D\sin(-1) &+& E\cos(-2) &+& F\sin(-2) &+& G\cos(-3) &+& H\sin(-3)\\1 &=& B &+& C\cos(0) &+& D\sin(0) &+& E\cos(0) &+& F\sin(0) &+& G\cos(0) &+& H\sin(0)\\-1 &=& B &+& C\cos(1) &+& D\sin(1) &+& E\cos(2) &+& F\sin(2) &+& G\cos(3) &+& H\sin(3)\\-0.5 &=& B &+& C\cos(2) &+& D\sin(2) &+& E\cos(4) &+& F\sin(4) &+& G\cos(6) &+& H\sin(6)\\2 &=& B &+& C\cos(3) &+& D\sin(3) &+& E\cos(6) &+& F\sin(6) &+& G\cos(9) &+& H\sin(9)\\-1 &=& B &+& C\cos(4) &+& D\sin(4) &+& E\cos(8) &+& F\sin(8) &+& G\cos(12) &+& H\sin(12).\end{array}\nonumber\]

    All of the terms in these equations are numbers, except for the unknowns \(B,C,D,E,F,G,H\text{:}\)

    \[\begin{array}{rrrrrrrrrrrrrrr}-1 &=& B &-&0.6536C&+& 0.7568D &-& 0.1455E &-& 0.9894F &+& 0.8439G &+& 0.5366H\\0&=& B &-& 0.9900C &-& 0.1411D &+& 0.9602E &+& 0.2794F &-& 0.9111G&-& 0.4121H\\-1.5 &=& B &-& 0.4161C &-& 0.9093D &-& 0.6536E &+& 0.7568F &+& 0.9602G &+& 0.2794H\\0.5 &=& B &+& 0.5403C &-& 0.8415D &-& 0.4161E &-& 0.9093F &-& 0.9900G &-& 0.1411H\\1&=&B&+&C&{}&{}&+&E&{}&{}&+&G&{}&{}\\-1 &=& B &+& 0.5403C &+& 0.8415D &-& 0.4161E &+& 0.9093F &-& 0.9900G &+& 0.1411H\\-0.5&=& B &-& 0.4161C &+& 0.9093D &-& 0.6536E &-& 0.7568F &+& 0.9602G &-& 0.2794H\\2 &=& B &-& 0.9900C &+& 0.1411D &+& 0.9602E &-& 0.2794F &-& 0.9111G &+& 0.4121H\\-1 &=& B &-& 0.6536C &-& 0.7568D &-& 0.1455E &+& 0.9894F &+& 0.8439G &-& 0.5366H.\end{array}\nonumber\]

    Hence we want to solve the least-squares problem

    \[\left(\begin{array}{rrrrrrr}1 &-0.6536& 0.7568& -0.1455& -0.9894& 0.8439 &0.5366\\1& -0.9900& -0.1411 &0.9602 &0.2794& -0.9111& -0.4121\\1& -0.4161& -0.9093& -0.6536& 0.7568& 0.9602& 0.2794\\1& 0.5403& -0.8415&-0.4161& -0.9093& -0.9900 &-0.1411\\1& 1& 0& 1& 0& 1& 0\\1& 0.5403& 0.8415& -0.4161& 0.9093& -0.9900 &0.1411\\1& -0.4161& 0.9093& -0.6536& -0.7568& 0.9602& -0.2794\\1& -0.9900 &0.1411 &0.9602& -0.2794& -0.9111& 0.4121\\1& -0.6536& -0.7568& -0.1455& 0.9894 &0.8439 &-0.5366\end{array}\right)\left(\begin{array}{c}B\\C\\D\\E\\F\\G\\H\end{array}\right)=\left(\begin{array}{c}-1\\0\\-1.5\\0.5\\1\\-1\\-0.5\\2\\-1\end{array}\right).\nonumber\]

    We find the least-squares solution with the aid of a computer:

    \[\hat{x}\approx\left(\begin{array}{c}-0.1435 \\0.2611 \\-0.2337\\ 1.116\\ -0.5997\\ -0.2767 \\0.1076\end{array}\right).\nonumber\]

    Therefore, the best-fit function is

    \[ \begin{split} y \amp\approx -0.1435 + 0.2611\cos(x) -0.2337\sin(x) + 1.116\cos(2x) -0.5997\sin(2x) \\ \amp\qquad\qquad -0.2767\cos(3x) + 0.1076\sin(3x). \end{split} \nonumber \]

    6.5: The Method of Least Squares (20)

    Figure \(\PageIndex{20}\)

    As in the previous examples, the best-fit function minimizes the sum of the squares of the vertical distances from the graph of \(y = f(x)\) to the data points.

    6.5: The Method of Least Squares (21)

    The next example has a somewhat different flavor from the previous ones.

    Example \(\PageIndex{10}\): Best-fit ellipse

    Find the best-fit ellipse through the points

    \[ (0,2),\, (2,1),\, (1,-1),\, (-1,-2),\, (-3,1),\, (-1,-1). \nonumber \]

    6.5: The Method of Least Squares (22)

    Figure \(\PageIndex{22}\)

    What quantity is being minimized?

    Solution

    The general equation for an ellipse (actually, for a nondegenerate conic section) is

    \[ x^2 + By^2 + Cxy + Dx + Ey + F = 0. \nonumber \]

    This is an implicit equation: the ellipse is the set of all solutions of the equation, just like the unit circle is the set of solutions of \(x^2+y^2=1.\) To say that our data points lie on the ellipse means that the above equation is satisfied for the given values of \(x\) and \(y\text{:}\)

    \[\label{eq:4} \begin{array}{rrrrrrrrrrrrl} (0)^2 &+& B(2)^2 &+& C(0)(2)&+& D(0) &+& E(2)&+& F&=& 0 \\ (2)^2 &+& B(1)^2 &+& C(2)(1)&+& D(2)&+&E(1)&+&F&=& 0 \\ (1)^2&+& B(-1)^2&+&C(1)(-1)&+&D(1)&+&E(-1)&+&F&=&0 \\ (-1)^2&+&B(-2)^2&+&C(-1)(2)&+&D(-1)&+&E(-2)&+&F&=&0 \\ (-3)^2&+&B(1)^2&+&C(-3)(1)&+&D(-3)&+&E(1)&+&F&=&0 \\ (-1)^2&+&B(-1)^2&+&C(-1)(-1)&+&D(-1)&+&E(-1)&+&F&=&0.\end{array}\]

    To put this in matrix form, we move the constant terms to the right-hand side of the equals sign; then we can write this as \(Ax=b\) for

    \[A=\left(\begin{array}{ccccc}4&0&0&2&1\\1&2&2&1&1\\1&-1&1&-1&1 \\ 4&2&-1&-2&1 \\ 1&-3&-3&1&1 \\ 1&1&-1&-1&1\end{array}\right)\quad x=\left(\begin{array}{c}B\\C\\D\\E\\F\end{array}\right)\quad b=\left(\begin{array}{c}0\\-4\\-1\\-1\\-9\\-1\end{array}\right).\nonumber\]

    We compute

    \[ A^TA = \left(\begin{array}{ccccc}36&7&-5&0&12 \\ 7&19&9&-5&1 \\ -5&9&16&1&-2 \\ 0&-5&1&12&0 \\ 12&1&-2&0&6\end{array}\right) \qquad A^T b = \left(\begin{array}{c}-19\\17\\20\\-9\\-16\end{array}\right). \nonumber \]

    We form an augmented matrix and row reduce:

    \[\left(\begin{array}{ccccc|c} 36 &7& -5& 0& 12&-19\\7& 19& 9& -5& 1& 17\\-5& 9& 16& 1& -2& 20\\0& -5& 1&12& 0& -9\\12& 1& -2& 0& 6& -16\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccccc|c}1& 0& 0& 0& 0& 405/266\\0& 1& 0& 0& 0& -89/133\\0& 0& 1& 0& 0& 201/133\\0& 0& 0& 1& 0& -123/266\\0& 0& 0& 0& 1& -687/133\end{array}\right).\nonumber\]

    The least-squares solution is

    \[\hat{x}=\left(\begin{array}{c}405/266\\ -89/133\\ 201/133\\ -123/266\\ -687/133\end{array}\right),\nonumber\]

    so the best-fit ellipse is

    \[ x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} = 0. \nonumber \]

    Multiplying through by \(266\text{,}\) we can write this as

    \[ 266 x^2 + 405 y^2 - 178 xy + 402 x - 123 y - 1374 = 0. \nonumber \]

    6.5: The Method of Least Squares (23)

    Figure \(\PageIndex{23}\)

    Now we consider the question of what quantity is minimized by this ellipse. The least-squares solution \(\hat x\) minimizes the sum of the squares of the entries of the vector \(b-A\hat x\text{,}\) or equivalently, of \(A\hat x-b\). The vector \(-b\) contains the constant terms of the left-hand sides of \(\eqref{eq:4}\), and

    \[A\hat{x}=\left(\begin{array}{rrrrrrrrr} \frac{405}{266}(2)^2 &-& \frac{89}{133}(0)(2)&+&\frac{201}{133}(0)&-&\frac{123}{266}(2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-& \frac{89}{133}(2)(1)&+&\frac{201}{133}(2)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2 &-&\frac{89}{133}(1)(-1)&+&\frac{201}{133}(1)&-&\frac{123}{266}(-1)&-&\frac{687}{133} \\ \frac{405}{266}(-2)^2&-&\frac{89}{133}(-1)(-2)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-&\frac{89}{133}(-3)(1)&+&\frac{201}{133}(-3)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2&-&\frac{89}{133}(-1)(-1)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-1)&-&\frac{687}{133}\end{array}\right)\nonumber\]

    contains the rest of the terms on the left-hand side of \(\eqref{eq:4}\). Therefore, the entries of \(A\hat x-b\) are the quantities obtained by evaluating the function

    \[ f(x,y) = x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} \nonumber \]

    on the given data points.

    If our data points actually lay on the ellipse defined by \(f(x,y)=0\text{,}\) then evaluating \(f(x,y)\) on our data points would always yield zero, so \(A\hat x-b\) would be the zero vector. This is not the case; instead, \(A\hat x-b\) contains the actual values of \(f(x,y)\) when evaluated on our data points. The quantity being minimized is the sum of the squares of these values:

    \[ \begin{split} \amp\text{minimized} = \\ \amp\quad f(0,2)^2 + f(2,1)^2 + f(1,-1)^2 + f(-1,-2)^2 + f(-3,1)^2 + f(-1,-1)^2. \end{split} \nonumber \]

    One way to visualize this is as follows. We can put this best-fit problem into the framework of Example \(\PageIndex{8}\)by asking to find an equation of the form

    \[ f(x,y) = x^2 + By^2 + Cxy + Dx + Ey + F \nonumber \]

    which best approximates the data table

    \[ \begin{array}{r|r|c} x & y & f(x,y) \\\hline 0 & 2 & 0 \\ 2 & 1 & 0 \\ 1 & -1 & 0 \\ -1 & -2 & 0 \\ -3 & 1 & 0 \\ -1 & -1 & 0\rlap. \end{array} \nonumber \]

    The resulting function minimizes the sum of the squares of the vertical distances from these data points \((0,2,0),\,(2,1,0),\,\ldots\text{,}\) which lie on the \(xy\)-plane, to the graph of \(f(x,y)\).

    6.5: The Method of Least Squares (24)

    Note \(\PageIndex{3}\)

    Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.

    6.5: The Method of Least Squares (2024)

    FAQs

    What is the formula for least squares? ›

    The least-squares regression line equation has two common forms: y = mx + b and y = a + bx. In a least-squares regression for y = mx + b, m = N ∑ ( x y ) − ∑ x ∑ y N ∑ ( x 2 ) − ( ∑ x ) 2 and b = ∑ y − m ∑ x N , where N is the number of data points, while x and y are the coordinates of the data points.

    How do you solve the least square solution? ›

    Here is a method for computing a least-squares solution of Ax = b :
    1. Compute the matrix A T A and the vector A T b .
    2. Form the augmented matrix for the matrix equation A T Ax = A T b , and row reduce.
    3. This equation is always consistent, and any solution K x is a least-squares solution.

    What is the least square method rule? ›

    The method of least squares assumes that the best fit curve of a given type is the curve that has the minimal sum of deviations, i.e., least square error from a given set of data. According to the method of least squares, the best fitting curve has the property that ∑ 1 n e i 2 = ∑ 1 n [ y i − f ( x i ) ] 2 is minimum.

    How to calculate ordinary least squares? ›

    Formula: The formula for the OLS method can be expressed as: min ∑ i = 1 n ( y i − ( a + b x ) ) 2 Here: refers to the observed value of the dependent variable, is the value of the independent variable, and are parameters to be estimated that represent the intercept and slope of the regression line respectively.

    What is the method of least square called so? ›

    The least-squares method can be defined as a statistical method that is used to find the equation of the line of best fit related to the given data. This method is called so as it aims at reducing the sum of squares of deviations as much as possible. The line obtained from such a method is called a regression line.

    How are least square means calculated? ›

    After the mean for each cell is calculated, the least squares means are simply the average of these means. The LS Mean for both treatment groups are identical.

    What is the method of least squares quizlet? ›

    The method of least squares is a statistical way of separating a mixed cost into fixed costs and variable costs by fitting a line to a set of data using two points that are selected by judgment. c. The method of least squares separates mixed cost into fixed and variable costs by using the low and high data points.

    What is total least square method? ›

    The least squares and total least squares methods assess the fitting accuracy in different ways: the least squares method minimizes the sum of the squared vertical distances from the data points to the fitting line, while the total least squares method minimizes the sum of the squared orthogonal distances from the data ...

    What is the method of least squares MLS? ›

    In computer graphics, the moving least squares method is useful for reconstructing a surface from a set of points. Often it is used to create a 3D surface from a point cloud through either downsampling or upsampling.

    What is the least square correction method? ›

    Least-squares adjustment is a model for the solution of an overdetermined system of equations based on the principle of least squares of observation residuals. It is used extensively in the disciplines of surveying, geodesy, and photogrammetry—the field of geomatics, collectively.

    What is the least squares estimation technique? ›

    The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals (a residual being the difference between an observed value and the fitted value provided by a model) made in the results of each individual equation.

    What is the least squares adjustment method? ›

    A least-squares adjustment uses statistical analysis to estimate the most likely coordinates for connected points in a measurement in a network. In ArcGIS Pro, a least-squares adjustment can be run on a parcel fabric using the following tools: Analyze Parcels By Least Squares Adjustment tool.

    How to calculate trend value by least square method? ›

    1. Calculate the mean of the x and y values.
    2. Calculate the slope of the line of best fit using the formula: slope = (nΣ(xy) - ΣxΣy) / (nΣ(x^2) - (Σx)^2)
    3. Calculate the y-intercept of the line of best fit using the formula: ...
    4. Use the slope and y-intercept to calculate the trend values for each x value using the formula:

    What is the method of generalized least squares? ›

    In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model.

    Top Articles
    Latest Posts
    Article information

    Author: Dean Jakubowski Ret

    Last Updated:

    Views: 6174

    Rating: 5 / 5 (70 voted)

    Reviews: 93% of readers found this page helpful

    Author information

    Name: Dean Jakubowski Ret

    Birthday: 1996-05-10

    Address: Apt. 425 4346 Santiago Islands, Shariside, AK 38830-1874

    Phone: +96313309894162

    Job: Legacy Sales Designer

    Hobby: Baseball, Wood carving, Candle making, Jigsaw puzzles, Lacemaking, Parkour, Drawing

    Introduction: My name is Dean Jakubowski Ret, I am a enthusiastic, friendly, homely, handsome, zealous, brainy, elegant person who loves writing and wants to share my knowledge and understanding with you.