The Element of Financial Econometrics
1. Chapter 5 The Efficient Portfolios and CAPM
1.1. With Risk-free asset
Optimize allocation factor using Markowitz's mean-variance optimization, which is equivalent to optimize the expected utility under exponential utility. Notice that the risk-free asset here can be used to eliminate the constraint of weights sum to one.
For any optimal solutions, they have the same Sharpe ratio, therefore the efficient frontier is a line with slope=optimal Sharpe ratio and intercept the risk-free rate.

It can be shown that the market portfolio (with market value-weighted allocation vector) is on the efficient frontier and by two fund separation theorem, any efficient portfolio can be decomposed into the market portfolio and the risk-free one.
CAPM: For the excess return Y
Y=βYm+ϵ,β=Cov(Y,Ym)Var(Ym)


























and ϵ
The CAPM model can be validated using
- Econometrics: regression and hypothesis testing
- Maximum likelihood estimation with Wald test or likelihood ratio test.
1.2. Without Risk-free asset
We need to solve a constraint version of Markowitz's mean-variance optimization problem. It is also not hard using Lagrangian. This time, the efficient frontier is a parabola.

For any portfolio on the frontier, you can find another one on the frontier that has zero beta with the previous one. The CAPM model is similarly formulated using one portfolio (as market portfolio) and its excess return is compared using its zero-beta counterparts (as risk-free one).
2. Chapter 6 Factor Pricing Models
2.1. Multi-factor models
The multi-factor model with return R
R=a+Bf+ϵ








The multi-factor pricing model with expected return μ=E[R]
R=γ01+B(f−γ01)+ϵ,μ=γ01+Bλk


























where λk=E[f]−γ01
var(R)=Bvar(f)B⊤+var(ϵ)























Notice that multi-factor model is always valid since this is a statistical decomposition (P245) but multi-factor pricing model needs to be validated.
2.2. Validate multi-factor pricing model
If there is a risk-free asset, the multi-factor pricing model can be formed as
Y=a+B(f−γ01)+ϵ














where Y
H0:α=0






If there is not a risk-free asset, we need to compare multi-factor model with multi-factor pricing model and the hypothesis test is,
H0:α=γ0(1−B1)













Both test can be done using a likelihood ratio test where B,λk
If we use macroeconomic factors, whose excess return cannot be observed, we can estimate λk
2.3. PCA and factor analysis
For the multi-factor pricing model, if we assume the residual covariance matrix is identity matrix, then it can be shown that the solution of MLE is the same as PCA results. B
When the portfolio size is large, the PCA and factor analysis are approximately the same. See Fan, Liao and Mincheva (2013).
3. Chapter 7 Portfolio Allocation and Risk Assessment
3.1. Risk approximation
For any portfolio allocation vector w
w+=∑wi≥0wi=c+12,w−=∑wi<0wi=c−12































The variance of this portfolio is
R(w)=w⊤Σ=w,Σ=var(R)



















with estimated covariance matrix ˆΣ
ˆR(w)=w⊤ˆΣw











For any portfolio with allocation vector w
|R(w)−ˆR(w)|≤emaxc2



















where emax=maxi,j|σij−ˆσij|=∥ˆΣ−Σ∥∞
Fan, Liao and Shi (2015) use the concept of high confidence level upper bound (H-CLUB),
P[|R(w)−ˆR(w)|≥ˆU(τ)]→1−τ,τ∈(0,1)

































3.2. Estimation of a large volatility matrix
Exponential Smoothing
Denote vart(Rt+1)=Σt
ˆΣt=1tt−1∑i=0Rt−iR⊤t−i






















If we consider to localize the data, one naive way is to define a window h≪t
ˆΣt=1hh−1∑i=0Rt−iR⊤t−i






















A better way is using Exponential Smoothing method, with a smoothing parameter λ
ˆΣt=(1−λ)1tt−1∑i=0λiRt−iR⊤t−i





























and the estimated volatility matrix can be computed recursively as
ˆΣt=λˆΣt−1+(1−λ)RtR⊤t





















Thresholding regularization
When the portfolio size is large, to deal with the ill-conditioned problem in estimation of volatility matrix, we can use a thresholding method. Given our estimated matrix ˆΣ
ˆΣλ=(ˆσijI(|ˆσij|≥λ))(ij)

























or using a adaptive thresholding estimator,
ˆΣλ=(ˆσijI(|ˆσij/SE(ˆσij)|≥λ))(ij)


































where SE(ˆσij)
Projection onto positive definite matrix space
To make our estimated covariance matrix ˆΣλ
ˆΣλ=Γ⊤diag(λ1,⋯,λp)Γ,ˆΣ+λ=Γ⊤diag(λ+1,⋯,λ+p)Γ












































or we can solve a optimization problem, for any symmetric matrix A
minR∥A−R∥2F,s.t.R⪰0,diag(R)=Ip






























Here A
diag(ˆΣλ)−1/2ˆΣλdiag(ˆΣλ)−1/2





























and for the solution to the optimization problem ˆRλ
diag(ˆΣλ)1/2ˆRλdiag(ˆΣλ)1/2



























Regularization by penalized likelihood
Penalized likelihood is useful to explore the sparsity. Denote lT(θ)
argminθ{−2TlT(θ)+p∑i=1pλ(|θj|)}































where pλ
Fan and Li (2001) introduced a family of folded-concave penalty functions such as the smoothly clipped absolute deviation (SCAD) whose derivative is
p′λ(t)=λ{I(t≤λ)+(aλ−t)+(a−1)λI(t>λ)}




































The penalized likelihood with SCAD as penalty function can be solved by an iterated re-weighted LASSO using a local linear approximation (Zou and Li (2008)). For any estimate at k
Qk+1(θ)=−2TlT(θ)+p∑i=1wk,j|θj|+c
































where wk,j=p′λ(|ˆθ(k)j|)
c=p∑j=1[pλ(|ˆθ(k)j|)−p′λ(|ˆθ(k)j|)|ˆθ(k)j|]











































It can be shown that this algorithm is a majorization-minimization algorithm where the target value is decreasing with respect to sequence ˆθ(k)
Estimate volatility matrix using factor model
Given factor model
R=a+Bf+ϵ








We can run a regression to obtain ˆB
ˆΣS=ˆBˆΣfˆB⊤+diag(ˆσ21,⋯,ˆσ2p)






























This estimator has a better rate for estimating Σ−1
If instead we do not assume the uncorrelatedness of noise term, but explore the sparsity instead. We will have the following approximate factor model estimator,
ˆΣA=ˆBˆΣfˆB⊤+ˆΣ∗ϵ,λ



















where ˆΣ∗ϵ,λ
Notice that if λ=1
Now consider the approximate factor model with unobserved factors, we can use Principal Orthogonal complEment Thresholding (POET) by Fan, Liao and Mincheva (2013).
- Obtain the sample covariance matrix S
based on T returns. - Run a singular value decomposition: S=∑pj=1ˆλjˆξjˆξTj
. - Compute the residual covariance matrix ˆQ=∑pj=K+1ˆλjˆξjˆξ⊤j
. - Regularize Q
to obtain ˆQλ, e.g. correlation matrix thresholding. - Compute the POET estimator as
ˆΣPλ=K∑j=1ˆλjˆξjˆξTj+ˆQλ
























POET encompasses many methods, when λ=0
High-dimensional PCA and factor analysis
The factor model
Rt=a+(BH)(H−1ft)+ϵt



















for any non-singular matrix H
Σ=BB⊤+Σϵ








If the factors are strong (pervasive), then we have asymptotically,
p∑j=K+1ˆλjˆξjˆξ⊤j≈K∑j=1bjb⊤j=BB⊤
































and
fjt≈ξ⊤jRt/λ1/2j















where {ξj},{λj}
3.3. Portfolio allocation with gross-exposure constraints
Our previous Markowitz’s mean-variance analysis is usually too sensitive on input vectors and their estimation errors and can result in extreme short positions. This problem is more severe for large portfolio.
We consider optimization with gross-exposure constraint:
minwR(w)=w⊤Σws,t.w⊤1=1,∥w∥1≤c,w⊤μ=α



































Now if we only consider the risk profile not the return. We can define actual and perceived risks:
R(w)=wTΣw,ˆR(w)=wTˆΣw





















and oracle and empirical allocation vectors
wopt=argminw⊤1=1,∥w∥1≤cR(w),ˆwopt=argminw⊤1=1,∥w∥1≤cˆR(w)

























































Let emax=|ˆΣ−Σ|∞
|R(ˆwopt)−R(wopt)|≤2emaxc2|R(ˆwopt)−ˆR(ˆwopt)|≤emaxc2|R(wopt)−ˆR(ˆwopt)|≤emaxc2





















































































Now we show the convergence rate of emax=|ˆΣ−Σ|∞
maxi,jP{bT|σij−ˆσij|>x}<exp(−Cx1/a)



































for some positive constants a and C
|Σ−ˆΣ|∞=OP((logp)abT)





















Moreover, suppose that ∥Rt∥∞
To better understand why the gross-exposure constraint helps on risk approximation, we make connections with covariance regularization. By the Lagrange multiplier method, our previous risk profile optimization is,
w⊤ˆΣw/2+λ1(∥w∥1−c)+λ2(1−w⊤1)




























Let g
ˆΣw+λ1g−λ21=0λ1(c−∥w∥1)=0,λ1⩾0






























in addition to the constraints w⊤1=1
minw⊤1=1w⊤˜Σcw














in which, with ˜g
˜Σc=ˆΣ+λ1(˜g1⊤+1˜g⊤)




















We now show the link between regression and risk profile optimization. Set Rp=Y,Xj=Rj−Rp
var(w⊤R)=minbE(w⊤R−b)2=minbE(Y−w1X1−⋯−wp−1Xp−1−b)2


















































and the constraint on gross-exposure can be replaced by a similar L1
4. Chapter 8 Consumption based CAPM
4.1. CCAPM
Representative consumer consumes on a representative good with quantity Ct
ptCt=It+(αt−1−αt)TSt




















The Intertemporal choice problem: Let δ
maxEt(∞∑j=0δjU(Ct+j))























The first order condition for time t
St=Et[Mt+1St+1],Mt+1=δU′(Ct+1)ptU′(Ct)pt+1










































The same discount factor applies to price each asset:
Si,t=Et[Mt+1Si,t+1],1=Et[Mt+1(1+Ri,t+1)]








































The price depends on inflation rate pt+1/pt
If there exists a risk-free asset with return rf
EtMt+1=(1+rf,t+1)−1



















Notice that,
St=Et[Mt+1St+1]=(EtMt+1)(EtSt+1)+Covt(Mt+1,St+1)=(1+rf,t+1)−1EtSt+1+Cov(Mt+1,St+1)


















































































4.2. Power utility and normal distribution assumption
Assume the utility function has the following form:
U(C)=C1−γ−11−γ














where γ
St=Et[δptpt+1(Ct+1Ct)−γSt+1]




























Denote Rt+1
Rt+1=log(St+1St/pt+1pt)























and assume Yt+1=Rt+1−γlog(Ct+1/Ct)
Et(Ri,t+1−rf,t+1)=−12vart(Ri,t+1−γΔCt+1)+γ22vart(ΔCt+1)=−12vart(Ri,t+1)+γcovt(Ri,t+1,ΔCt+1)
where ΔCt+1=logCt+1/Ct.
4.3. Mean-variance frontier
Value of a portfolio: Wt=αTtSt. Let Wt+1=αTtSt+1. Then from CCAPM, St=EtMt+1St+1 and,
Wt=EtMt+1Wt+1=Covt(Mt+1,Wt+1)+(EtMt+1)(EtWt+1)
Therefore, we have
EtWt+1−(1+rf,t+1)Wt=Covt(Mt+1,Wt+1)(1+rf,t+1)
By the Cauchy-Schwartz inequality,
(EtWt+1−(1+rf,t+1)Wt)2vart(Wt+1)≤vart(Mt+1)(EtMt+1)2
Efficient frontier: Excess gain (return) per unit risk (the Sharpe ratio) is bounded by vart(Mt+1)/(EtMt+1)2, which is not always achievable. Because Mt+1=δU′(Ct+1)ptU′(Ct)pt+1 which is not tradable. If we can construct a stochastic discount factor using tradable portfolio, we can assure the efficient frontier can be achieved.
Consider M∗t+1=α∗TtSt+1 where α∗t minimizes
Et(Mt+1−αTtSt+1)2
The first order condition yields the Euler condition,
EtM∗t+1St+1=EtMt+1St+1=St
Therefore, we have EtM∗t+1St+1=St,EtM∗t+1=(1+rf,t+1)−1 and we can derive the above upper bound for Sharpe ratio again using M∗t+1,
(EtWt+1−(1+rf,t+1)Wt)2vart(Wt+1)≤vart(M∗t+1)(EtM∗t+1)2
Obviously, M∗t+1 attains the maximum Sharpe ratio, which is called the benchmark portfolio. And the excess gain for any portfolio can be expressed as
EtRt+1−rf,t+1=β∗t[EtR∗M,t+1−rf,t+1],β∗t=Covt(M∗t+1,Wt+1)vart(M∗t+1)
5. Chapter 9 Present-value Models
5.1. Fundamental price
The return of a stock using price and dividend,
Rt+1=St+1+Dt+1St−1
Assume E[Rt+1]=R is a constant, then
St=(1+R)−1Et[St+1+Dt+1]=(1+R)−1EtDt+1+(1+R)−2Et[Dt+2+St+2]⋯=K∑i=1(1+R)−iEtDt+i+(1+R)−KEtSt+K
Assume the growth condition
limK→∞(1+R)−KEtSt+K=0
we have
St=∞∑i=1(1+R)−iEtDt+i
This is the discounted dividend paid in the life time of a stock and it is called the discounted-cash-flow or present-value model.
If we assume dividends grow at a rate G,
EtDt+i=(1+G)EtDt+i−1=(1+G)iDt
Then the stock price is,
St=[∞∑i=1(1+R)−i(1+G)iDt]=(1+G)DtR−G, if R>G
5.2. Rational bubbles
Recall the price equation,
St=(1+R)−1Et(St+1+Dt+1)
with a unique solution
St=SD,t if limK→∞(1+R)−KEtSt+K=0
The general solution is St=SD,t+Bt where
SD,t+Bt=(1+R)−1Et(SD,t+1+Bt+1+Dt+1) or Bt=(1+R)−1EtBt+1
Here SD,t is called fundamental value and Bt is called rational bubble. The growth condition will rule out this possibility. The word "bubble" recalls some famous episodes in history in which asset price rose higher than could not easily be explained by fundamentals (investors betting other investors would drive prices even higher in the future). Notice that EtBn+t=(1+R)nBt.
5.3. Time-varying expected returns
How to price an asset if the expected returns are not constant? Let st=logSt,dt=logDt,θ= average dividend-price ratio. The log-return rt+1=log(St+1+Dt+1)/St, using Taylor’s expansion, we have the approximate present-value model
st=κ+ρst+1+(1−ρ)dt+1−rt+1
where ρ=(1+exp(θ))−1 and κ=−(1−ρ)log(1−ρ). By the condition that
limj→∞ρjst+j=0
we have the approximate pricing formula,
st=κ1−ρ+(1−ρ)Pdt−Prt,Pdt=∞∑j=0ρjEtdt+1+j,Prt=∞∑j=0ρjEtrt+1+j