# Least Squares

# Least squares and the normal equations

# inconsistent systems of equations

A system of equations with no solution is called inconsistent

If we choose this "closeness" to mean close in Euclidean distance ,there is a straightforward algorithm for finding the closest xx, This special xx will be called the least squares solution

🌟 Normal equations for least squares

given the inconsistent system

Ax=b Ax = b

solve

ATAxˉ=ATb A^TA\bar{x} = A^Tb

proof

(bAxˉ){AxxRn}(Ax)T(bAxˉ)=0xTAT(bAxˉ)=0 for all x in RnAT(bAxˉ)=0ATAxˉ=ATb \begin{align} & (b - A\bar{x}) \perp \{Ax|x\in R^n\} \\ \\ & (Ax)^T(b-A\bar{x}) = 0 \\ \\ & x^TA^T(b-A\bar{x}) = 0 \text{ for all x in } R^n \\ \\ & A^T(b-A\bar{x})=0 \\ \\ & A^TA\bar{x} = A^Tb \end{align}
r=bAxˉ r = b - A\bar{x}

2-norm

r2=r12++rm2 ||r||_2 = \sqrt{r_1^2+\cdots+r_m^2}

squared error

SE=r12++rm2 SE = r_1^2 + \cdots + r_m^2

root mean squared error

RMSE=SE/m=(r12++rm2)/m RMSE = \sqrt{SE/m} = \sqrt{(r_1^2+\cdots +r_m^2)/m}

# Fitting models to data

Fitting data by least squares

  • choose a model such as y=c1+c2ty = c_1 + c_2 t
  • force the model to fit the data , each data point creates an equation whose unknowns are the parameters such as c1c_1 and c2c_2 in the line model. This results in a system Ax=bAx = b ,where the unknown xx represents the unknown parameters
  • solve the normal equations

# A survey of models

# periodic data

model example

y=c1+c2cos2πt+c3sin2πt+c4cos4πt y = c_1 + c_2 \cos 2\pi t + c_3\sin 2\pi t + c_4 \cos 4\pi t

# Data linearization

# exponential model

y=c1ec2t y = c_1 e ^{c_2t}

cannot be directly bit by least squares because c2c_2 does not appear linearly in the model equation

"linearizing" the model

lny=ln(c1ec2t)=lnc1+c2t \ln y = \ln(c_1e^{c_2t}) = \ln c_1 + c_2t

the original least squares problem was to fit the data , find the c1,c2c_1,c_2 that minimize

(c1ec2t1y1)2++(c1ec2tmym)2 (c_1e^{c_2t_1}-y_1)^2 + \cdots + (c_1e^{c_2t_m} - y_m)^2

for now

(lnc1+c2t1lny1) (\ln c_1 + c_2t_1 - \ln y_1)

# power law model

y=c1tc2 y = c_1 t^{c_2}
lny=lnc1+c2lnt \ln y = \ln c_1 + c_2 \ln t

it is important to realize that model linearization changes the least squares problem. The solution obtained will minimize the RMSE with respect to the linearized problem,not necessarily the original problem

# QR factorization

# Gram-Schmidt orthogonalization and least squares

yj=Ajq1(q1TAj)q2(q2TAj)qj1(qj1TAj)qj=yjyj2(A1An)=(q1qn)[r11r12r1nr22r2nrnn] \begin{align} y_j &= A_j - q_1(q_1^TA_j) - q_2 (q_2^T A_j) - \cdots - q_{j-1}(q_{j-1}^T A_j) \\ q_j &= \frac{y_j}{||y_j||_2} \\ (A_1|\cdots|A_n) &= (q_1| \cdots |q_n)\begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ & r_{22} & \cdots & r_{2n} \\ & & \ddots & \vdots \\ & & & r_{nn} \end{bmatrix} \end{align}

geometry of gram-schmidt

y2=A2q1(q1TA2q1)=A2q1(q1TA2)q2=y2y22 y_2 = A_2 - q_1(\frac{q_1^TA_2}{|q_1|}) = A_2 - q_1(q_1^TA_2) \\ q_2 = \frac{y_2}{||y_2||_2}

a square matrix QQ is orthogonal if QT=Q1Q^T = Q^{-1}