# Least Squares

# Least squares and the normal equations

# inconsistent systems of equations

A system of equations with no solution is called inconsistent

If we choose this "closeness" to mean close in Euclidean distance ,there is a straightforward algorithm for finding the closest $x$ , This special $x$ will be called the least squares solution

🌟 Normal equations for least squares

given the inconsistent system

Ax = b

solve

A^TA\bar{x} = A^Tb

proof

\begin{align} & (b - A\bar{x}) \perp \{Ax|x\in R^n\} \\ \\ & (Ax)^T(b-A\bar{x}) = 0 \\ \\ & x^TA^T(b-A\bar{x}) = 0 \text{ for all x in } R^n \\ \\ & A^T(b-A\bar{x})=0 \\ \\ & A^TA\bar{x} = A^Tb \end{align}

r = b - A\bar{x}

2-norm

||r||_2 = \sqrt{r_1^2+\cdots+r_m^2}

squared error

SE = r_1^2 + \cdots + r_m^2

root mean squared error

RMSE = \sqrt{SE/m} = \sqrt{(r_1^2+\cdots +r_m^2)/m}

# Fitting models to data

Fitting data by least squares

choose a model such as $y = c_1 + c_2 t$
force the model to fit the data , each data point creates an equation whose unknowns are the parameters such as $c_1$ and $c_2$ in the line model. This results in a system $Ax = b$ ,where the unknown $x$ represents the unknown parameters
solve the normal equations

# A survey of models

# periodic data

model example

y = c_1 + c_2 \cos 2\pi t + c_3\sin 2\pi t + c_4 \cos 4\pi t

# Data linearization

# exponential model

y = c_1 e ^{c_2t}

cannot be directly bit by least squares because $c_2$ does not appear linearly in the model equation

"linearizing" the model

\ln y = \ln(c_1e^{c_2t}) = \ln c_1 + c_2t

the original least squares problem was to fit the data , find the $c_1,c_2$ that minimize

(c_1e^{c_2t_1}-y_1)^2 + \cdots + (c_1e^{c_2t_m} - y_m)^2

for now

(\ln c_1 + c_2t_1 - \ln y_1)

# power law model

y = c_1 t^{c_2}

\ln y = \ln c_1 + c_2 \ln t

it is important to realize that model linearization changes the least squares problem. The solution obtained will minimize the RMSE with respect to the linearized problem,not necessarily the original problem

# QR factorization

# Gram-Schmidt orthogonalization and least squares

\begin{align} y_j &= A_j - q_1(q_1^TA_j) - q_2 (q_2^T A_j) - \cdots - q_{j-1}(q_{j-1}^T A_j) \\ q_j &= \frac{y_j}{||y_j||_2} \\ (A_1|\cdots|A_n) &= (q_1| \cdots |q_n)\begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ & r_{22} & \cdots & r_{2n} \\ & & \ddots & \vdots \\ & & & r_{nn} \end{bmatrix} \end{align}

geometry of gram-schmidt

y_2 = A_2 - q_1(\frac{q_1^TA_2}{|q_1|}) = A_2 - q_1(q_1^TA_2) \\ q_2 = \frac{y_2}{||y_2||_2}

a square matrix $Q$ is orthogonal if $Q^T = Q^{-1}$

← Interpolation Numerical Differentiation and Integration →