Least Squares Least squares and the normal equations inconsistent systems of equations A system of equations with no solution is called inconsistent
If we choose this "closeness" to mean close in Euclidean distance ,there is a straightforward algorithm for finding the closest x x x , This special x x x will be called the least squares solution
🌟 Normal equations for least squares
given the inconsistent system
solve A T A x ˉ = A T b
A^TA\bar{x} = A^Tb
A T A x ˉ = A T b
proof
( b − A x ˉ ) ⊥ { A x ∣ x ∈ R n } ( A x ) T ( b − A x ˉ ) = 0 x T A T ( b − A x ˉ ) = 0 for all x in R n A T ( b − A x ˉ ) = 0 A T A x ˉ = A T b
\begin{align}
&
(b - A\bar{x}) \perp \{Ax|x\in R^n\} \\ \\
&
(Ax)^T(b-A\bar{x}) = 0 \\ \\
&
x^TA^T(b-A\bar{x}) = 0 \text{ for all x in } R^n \\ \\
&
A^T(b-A\bar{x})=0 \\ \\
&
A^TA\bar{x} = A^Tb
\end{align}
( b − A x ˉ ) ⊥ { A x ∣ x ∈ R n } ( A x ) T ( b − A x ˉ ) = 0 x T A T ( b − A x ˉ ) = 0 for all x in R n A T ( b − A x ˉ ) = 0 A T A x ˉ = A T b r = b − A x ˉ
r = b - A\bar{x}
r = b − A x ˉ 2-norm
∣ ∣ r ∣ ∣ 2 = r 1 2 + ⋯ + r m 2
||r||_2 = \sqrt{r_1^2+\cdots+r_m^2}
∣∣ r ∣ ∣ 2 = r 1 2 + ⋯ + r m 2 squared error S E = r 1 2 + ⋯ + r m 2
SE = r_1^2 + \cdots + r_m^2
SE = r 1 2 + ⋯ + r m 2
root mean squared error
R M S E = S E / m = ( r 1 2 + ⋯ + r m 2 ) / m
RMSE = \sqrt{SE/m} = \sqrt{(r_1^2+\cdots +r_m^2)/m}
RMSE = SE / m = ( r 1 2 + ⋯ + r m 2 ) / m Fitting models to data Fitting data by least squares
choose a model such as y = c 1 + c 2 t y = c_1 + c_2 t y = c 1 + c 2 t force the model to fit the data , each data point creates an equation whose unknowns are the parameters such as c 1 c_1 c 1 and c 2 c_2 c 2 in the line model. This results in a system A x = b Ax = b A x = b ,where the unknown x x x represents the unknown parameters solve the normal equations A survey of models periodic data model example
y = c 1 + c 2 cos 2 π t + c 3 sin 2 π t + c 4 cos 4 π t
y = c_1 + c_2 \cos 2\pi t + c_3\sin 2\pi t + c_4 \cos 4\pi t
y = c 1 + c 2 cos 2 π t + c 3 sin 2 π t + c 4 cos 4 π t Data linearization exponential model y = c 1 e c 2 t
y = c_1 e ^{c_2t}
y = c 1 e c 2 t cannot be directly bit by least squares because c 2 c_2 c 2 does not appear linearly in the model equation
"linearizing" the model
ln y = ln ( c 1 e c 2 t ) = ln c 1 + c 2 t
\ln y = \ln(c_1e^{c_2t}) = \ln c_1 + c_2t
ln y = ln ( c 1 e c 2 t ) = ln c 1 + c 2 t the original least squares problem was to fit the data , find the c 1 , c 2 c_1,c_2 c 1 , c 2 that minimize
( c 1 e c 2 t 1 − y 1 ) 2 + ⋯ + ( c 1 e c 2 t m − y m ) 2
(c_1e^{c_2t_1}-y_1)^2 + \cdots + (c_1e^{c_2t_m} - y_m)^2
( c 1 e c 2 t 1 − y 1 ) 2 + ⋯ + ( c 1 e c 2 t m − y m ) 2 for now
( ln c 1 + c 2 t 1 − ln y 1 )
(\ln c_1 + c_2t_1 - \ln y_1)
( ln c 1 + c 2 t 1 − ln y 1 ) power law model y = c 1 t c 2
y = c_1 t^{c_2}
y = c 1 t c 2 ln y = ln c 1 + c 2 ln t
\ln y = \ln c_1 + c_2 \ln t
ln y = ln c 1 + c 2 ln t it is important to realize that model linearization changes the least squares problem. The solution obtained will minimize the RMSE with respect to the linearized problem,not necessarily the original problem
QR factorization Gram-Schmidt orthogonalization and least squares y j = A j − q 1 ( q 1 T A j ) − q 2 ( q 2 T A j ) − ⋯ − q j − 1 ( q j − 1 T A j ) q j = y j ∣ ∣ y j ∣ ∣ 2 ( A 1 ∣ ⋯ ∣ A n ) = ( q 1 ∣ ⋯ ∣ q n ) [ r 11 r 12 ⋯ r 1 n r 22 ⋯ r 2 n ⋱ ⋮ r n n ]
\begin{align}
y_j &= A_j - q_1(q_1^TA_j) - q_2 (q_2^T A_j) - \cdots - q_{j-1}(q_{j-1}^T A_j) \\
q_j &= \frac{y_j}{||y_j||_2} \\
(A_1|\cdots|A_n) &= (q_1| \cdots |q_n)\begin{bmatrix}
r_{11} & r_{12} & \cdots & r_{1n} \\
& r_{22} & \cdots & r_{2n} \\
& & \ddots & \vdots \\
& & & r_{nn}
\end{bmatrix}
\end{align}
y j q j ( A 1 ∣ ⋯ ∣ A n ) = A j − q 1 ( q 1 T A j ) − q 2 ( q 2 T A j ) − ⋯ − q j − 1 ( q j − 1 T A j ) = ∣∣ y j ∣ ∣ 2 y j = ( q 1 ∣ ⋯ ∣ q n ) r 11 r 12 r 22 ⋯ ⋯ ⋱ r 1 n r 2 n ⋮ r nn geometry of gram-schmidt
y 2 = A 2 − q 1 ( q 1 T A 2 ∣ q 1 ∣ ) = A 2 − q 1 ( q 1 T A 2 ) q 2 = y 2 ∣ ∣ y 2 ∣ ∣ 2
y_2 = A_2 - q_1(\frac{q_1^TA_2}{|q_1|}) = A_2 - q_1(q_1^TA_2) \\
q_2 = \frac{y_2}{||y_2||_2}
y 2 = A 2 − q 1 ( ∣ q 1 ∣ q 1 T A 2 ) = A 2 − q 1 ( q 1 T A 2 ) q 2 = ∣∣ y 2 ∣ ∣ 2 y 2 a square matrix Q Q Q is orthogonal if Q T = Q − 1 Q^T = Q^{-1} Q T = Q − 1