Ali K Esfahani
Projection Matrix and Least Square Problems
As discussed earlier, not every vector b ∈ ℝᵐ lies in the column space of a matrix A. When this happens, the equation Ax=b has no exact solution. Nevertheless, the orthogonality between the column space and the left null space allows us to find the best possible approximation to b within the column space of A.
As we previously explained, for any vector b ∈ ℝᵐ, there exists a unique orthogonal decomposition:

The vector p is called the orthogonal projection of b onto the column space of A. It is the closest vector in C(A) to b in Euclidean distance.
Since p ∈ C(A), there exists some vector x’ such that

Where x’ is the best possible solution (produces the best possible approximation to b). So, the residual vector r can be calculated as:

Since r lies in the left null space, we could say:

This equation is known as the normal equation.
The length of the residual vector is

Because p is the orthogonal projection of b onto C(A), this distance is minimal among all vectors of the form Ax. Therefore, the vector x′ satisfies

This is precisely the least squares problem. Thus, least squares is not a separate technique, but the optimization formulation of orthogonal projection.
To find the x’, if matrix A is invertible (=the matrix is full column rank), then the solution is unique

Substituting back gives the projected vector:

Moreover, we can define A(AᵀA)⁻¹Aᵀ as a new matrix and call it projection matrix P. The projection matrix maps any vector b ∈ ℝᵐ directly to its projection onto the column space of A.

Thus, we can say

It turns out that this projection matrix P has several interesting properties:
1) symmetry

This can be easily checked by transforming P=A(AᵀA)⁻¹Aᵀ. Any symmetric matrix represents a linear transformation that does not introduce skewness; instead, it acts along orthogonal directions. This behavior is exactly what we expect from an orthogonal projection matrix.
2) Idempotence

This is very straight forward. Once a vector is projected, nothing more happens. If b relies on C(A), applying the projection again leaves it unchanged (Pb=b).
3) Range and Null Space

Just like any other matrix, the projection matrix splits the space:
everything in C(A) survives unchanged,
everything orthogonal to C(A)-which is in fact the N(Aᵀ)-is annihilated.
Thus, every vector b ∈ ℝᵐ decomposes uniquely as

This is the algebraic expression of the orthogonal decomposition theorem.
All this said, however, if matrix A does not have full column rank, then (AᵀA) is not invertible and (AᵀA)⁻¹ does not exist. So, the equation x’= A(AᵀA)⁻¹Aᵀb no longer make sense. Note that in such case:
the projection p onto C(A) still exists and is unique,
but the vector x′ satisfying Ax′=p is not unique, due to the non-trivial null space of A.
Thus, we can’t rely on the orthogonal subspaces of the matrix anymore and have to look for other tools that we will soon explain, like using orthonormal bases, or using QR decomposition. As we will see, they more numerically stable and form the foundation of practical solution techniques for real-world systems.
05