English
中文
日本語
本质的研究
Least square estimation for Linear Regression

Abstract

This post shows how to derive the closed form solution of least square estimation for linear regression. After that, we will show how to derive the variance of the coefficient from the closed form solution.

Scalar by Vector Differentiation

(when A is not a function of x)dAxdx=A(when A is symmetric)dxTAxdx=2Ax(Definition of symmetric matrix A)A=AT(for any XRm×nXTX is symmetric)XTX=(XTX)T

Forward Process

(training data with m instances)XRm×n(labels for m instances)yRm(regression coefficients as parameters)θRn(during test time)Xθ=y^(objective function: Least Square)J(θ)=(Xθy)T(Xθy)

Get θ when dJ(θ)dθ=0

J(θ)=((Xθ)TyT)(Xθy)J(θ)=(Xθ)TXθ(Xθ)TyyTXθ+yTy((Xθ)Ty=yTXθ)J(θ)=θTXTXθ2(Xθ)Ty+yTy

(matrix calculus, XTX is symmetric)dJ(θ)dθ=2XTXθ2XTy(when dJ(θ)dθ=0J(θ) was minimized)0=2XTXθ2XTyXTXθ=XTy(Least square estimation of θ)θ=(XTX)1XTy

Variance of θ

To derive variance of θ, we need a few useful identities:

(C is a constant matrix)Var(CX)=E[(C(XX¯))(C(XX¯))T]=E[C(XX¯)(XX¯)TCT]=CE[(XX¯)(XX¯)T]CT=CVar(X)CT

(transpose of inverse is equal to inverse of transpose)(X1)T=(XT)1

With the assumption that all labels are independent with variance σ2, we have:

(I is an identity matrix)Var(y)=σ2I

With above identities, we can calculate the variance of θ

Var(θ)=Var((XTX)1XTy)((XTX)1XT is constant)=(XTX)1XTσ2I((XTX)1XT)T(σ2 is constant)=σ2(XTX)1XTIX((XTX)T)1(cancel out identity matrix)=σ2(XTX)1XTX(XTX)1(A1A=I)=σ2(XTX)1

AbstractScalar by Vector DifferentiationForward ProcessGet θ when dJ(θ)dθ=0Variance of θ
本质的研究
Dongqi Su, 苏东琪
链接
Github Linkedin
该网站
使用 sudoki.SiteMaker 制作