The Element of Financial Econometrics

1. Chapter 5 The Efficient Portfolios and CAPM

1.1. With Risk-free asset

Optimize allocation factor using Markowitz's mean-variance optimization, which is equivalent to optimize the expected utility under exponential utility. Notice that the risk-free asset here can be used to eliminate the constraint of weights sum to one.

For any optimal solutions, they have the same Sharpe ratio, therefore the efficient frontier is a line with slope=optimal Sharpe ratio and intercept the risk-free rate.

Efficient Frontier with risk-free asset
Figure 1. Efficient Frontier with risk-free asset

It can be shown that the market portfolio (with market value-weighted allocation vector) is on the efficient frontier and by two fund separation theorem, any efficient portfolio can be decomposed into the market portfolio and the risk-free one.

CAPM: For the excess return Y of any portfolio, denote Ym as the excess return of market portfolio, we have

Y=βYm+ϵ,β=Cov(Y,Ym)Var(Ym)

and ϵ is the external noise.

The CAPM model can be validated using

  • Econometrics: regression and hypothesis testing
  • Maximum likelihood estimation with Wald test or likelihood ratio test.

1.2. Without Risk-free asset

We need to solve a constraint version of Markowitz's mean-variance optimization problem. It is also not hard using Lagrangian. This time, the efficient frontier is a parabola.

Efficient Frontier without risk-free asset
Figure 2. Efficient Frontier without risk-free asset

For any portfolio on the frontier, you can find another one on the frontier that has zero beta with the previous one. The CAPM model is similarly formulated using one portfolio (as market portfolio) and its excess return is compared using its zero-beta counterparts (as risk-free one).

2. Chapter 6 Factor Pricing Models

2.1. Multi-factor models

The multi-factor model with return R, factor loading matrix B, factor f and external noise ϵ,

R=a+Bf+ϵ

The multi-factor pricing model with expected return μ=E[R], risk-free return γ0,

R=γ01+B(fγ01)+ϵ,μ=γ01+Bλk

where λk=E[f]γ01 is the excess return of factors. Using multi-factor pricing model, we can estimate the covariance matrix of return R with fewer parameters:

var(R)=Bvar(f)B+var(ϵ)

Notice that multi-factor model is always valid since this is a statistical decomposition (P245) but multi-factor pricing model needs to be validated.

2.2. Validate multi-factor pricing model

If there is a risk-free asset, the multi-factor pricing model can be formed as

Y=a+B(fγ01)+ϵ

where Y is the excess return of all asset and the hypothesis test is simply,

H0:α=0

If there is not a risk-free asset, we need to compare multi-factor model with multi-factor pricing model and the hypothesis test is,

H0:α=γ0(1B1)

Both test can be done using a likelihood ratio test where B,λk (and γ0 in the risk-free absence case) can be estimated using maximum likelihood estimation iteratively if we assume the residual ϵ has a normal distribution.

If we use macroeconomic factors, whose excess return cannot be observed, we can estimate λk using MLE and change the hypothesis test accordingly.

2.3. PCA and factor analysis

For the multi-factor pricing model, if we assume the residual covariance matrix is identity matrix, then it can be shown that the solution of MLE is the same as PCA results. B and the first K principal component of sample covariance matrix of return R span the same subspace.

When the portfolio size is large, the PCA and factor analysis are approximately the same. See Fan, Liao and Mincheva (2013).

3. Chapter 7 Portfolio Allocation and Risk Assessment

3.1. Risk approximation

For any portfolio allocation vector w, we define the gross exposure c of the portfolio as c=w1. Therefore,

w+=wi0wi=c+12,w=wi<0wi=c12

The variance of this portfolio is

R(w)=wΣ=w,Σ=var(R)

with estimated covariance matrix ˆΣ, the empirical risk is

ˆR(w)=wˆΣw

For any portfolio with allocation vector w,

|R(w)ˆR(w)|emaxc2

where emax=maxi,j|σijˆσij|=ˆΣΣ.

Fan, Liao and Shi (2015) use the concept of high confidence level upper bound (H-CLUB),

P[|R(w)ˆR(w)|ˆU(τ)]1τ,τ(0,1)

3.2. Estimation of a large volatility matrix

Exponential Smoothing

Denote vart(Rt+1)=Σt the conditional covariance matrix of return on time t+1. The traditional way to estimate is using the return matrix from time t to time 1 and compute the sample covariance matrix as

ˆΣt=1tt1i=0RtiRti

If we consider to localize the data, one naive way is to define a window ht and

ˆΣt=1hh1i=0RtiRti

A better way is using Exponential Smoothing method, with a smoothing parameter λ,

ˆΣt=(1λ)1tt1i=0λiRtiRti

and the estimated volatility matrix can be computed recursively as

ˆΣt=λˆΣt1+(1λ)RtRt

Thresholding regularization

When the portfolio size is large, to deal with the ill-conditioned problem in estimation of volatility matrix, we can use a thresholding method. Given our estimated matrix ˆΣ and a thresholding parameter λ,

ˆΣλ=(ˆσijI(|ˆσij|λ))(ij)

or using a adaptive thresholding estimator,

ˆΣλ=(ˆσijI(|ˆσij/SE(ˆσij)|λ))(ij)

where SE(ˆσij) is the estimated standard error of ˆσij.

Projection onto positive definite matrix space

To make our estimated covariance matrix ˆΣλ a valid covariance matrix after thresholding, we can use

ˆΣλ=Γdiag(λ1,,λp)Γ,ˆΣ+λ=Γdiag(λ+1,,λ+p)Γ

or we can solve a optimization problem, for any symmetric matrix A,

minRAR2F,s.t.R0,diag(R)=Ip

Here A is set to

diag(ˆΣλ)1/2ˆΣλdiag(ˆΣλ)1/2

and for the solution to the optimization problem ˆRλ, the corrected covariance matrix is

diag(ˆΣλ)1/2ˆRλdiag(ˆΣλ)1/2

Regularization by penalized likelihood

Penalized likelihood is useful to explore the sparsity. Denote lT(θ) be the log-likelihood function, we assume θ is sparse and the penalized likelihood is

argminθ{2TlT(θ)+pi=1pλ(|θj|)}

where pλ is a penalty function. Usually we set pλ(|θ|)=λI(θ0), the L0 penalty, which results in the best subset selection. However, it is computationally expensive. The LASSO estimator, which uses L1-penalty, is a convex relaxation but introduces biases due to the shrinkage of LASSO.

Fan and Li (2001) introduced a family of folded-concave penalty functions such as the smoothly clipped absolute deviation (SCAD) whose derivative is

pλ(t)=λ{I(tλ)+(aλt)+(a1)λI(t>λ)}

The penalized likelihood with SCAD as penalty function can be solved by an iterated re-weighted LASSO using a local linear approximation (Zou and Li (2008)). For any estimate at k, ˆθ(k), the target is

Qk+1(θ)=2TlT(θ)+pi=1wk,j|θj|+c

where wk,j=pλ(|ˆθ(k)j|) and

c=pj=1[pλ(|ˆθ(k)j|)pλ(|ˆθ(k)j|)|ˆθ(k)j|]

It can be shown that this algorithm is a majorization-minimization algorithm where the target value is decreasing with respect to sequence ˆθ(k).

Estimate volatility matrix using factor model

Given factor model

R=a+Bf+ϵ

We can run a regression to obtain ˆB and the regression residuals ˆσ21,,ˆσ2p. And we can compute sample covariance matrix Σf of f. If we assume noise terms are independent of each other, we have the strict factor model based estimator,

ˆΣS=ˆBˆΣfˆB+diag(ˆσ21,,ˆσ2p)

This estimator has a better rate for estimating Σ1 and the same rate for estimating Σ compared to the sample covariance matrix of R.

If instead we do not assume the uncorrelatedness of noise term, but explore the sparsity instead. We will have the following approximate factor model estimator,

ˆΣA=ˆBˆΣfˆB+ˆΣϵ,λ

where ˆΣϵ,λ is the estimated error covariance matrix applied thresholding on correlation matrix with parameter λ.

Notice that if λ=1, then ˆΣA=ˆΣS and if λ=0, ˆΣA is the sample covariance matrix of R.

Now consider the approximate factor model with unobserved factors, we can use Principal Orthogonal complEment Thresholding (POET) by Fan, Liao and Mincheva (2013).

  1. Obtain the sample covariance matrix S based on T returns.
  2. Run a singular value decomposition: S=pj=1ˆλjˆξjˆξTj.
  3. Compute the residual covariance matrix ˆQ=pj=K+1ˆλjˆξjˆξj.
  4. Regularize Q to obtain ˆQλ, e.g. correlation matrix thresholding.
  5. Compute the POET estimator as

ˆΣPλ=Kj=1ˆλjˆξjˆξTj+ˆQλ

POET encompasses many methods, when λ=0, it is sample covariance matrix; when λ=1, it is strict factor model with unknown factors; when K=0, it is thresholded sample covariance matrix.

High-dimensional PCA and factor analysis

The factor model

Rt=a+(BH)(H1ft)+ϵt

for any non-singular matrix H. Therefore, we can assume var(ft)=IK and columns of B are orthogonal. Therefore,

Σ=BB+Σϵ

If the factors are strong (pervasive), then we have asymptotically,

pj=K+1ˆλjˆξjˆξjKj=1bjbj=BB

and

fjtξjRt/λ1/2j

where {ξj},{λj} are the eigenvectors and eigenvalues of covariance matrix of Rt.

3.3. Portfolio allocation with gross-exposure constraints

Our previous Markowitz’s mean-variance analysis is usually too sensitive on input vectors and their estimation errors and can result in extreme short positions. This problem is more severe for large portfolio.

We consider optimization with gross-exposure constraint:

minwR(w)=wΣws,t.w1=1,w1c,wμ=α

Now if we only consider the risk profile not the return. We can define actual and perceived risks:

R(w)=wTΣw,ˆR(w)=wTˆΣw

and oracle and empirical allocation vectors

wopt=argminw1=1,w1cR(w),ˆwopt=argminw1=1,w1cˆR(w)

Let emax=|ˆΣΣ|. Then, we have,

|R(ˆwopt)R(wopt)|2emaxc2|R(ˆwopt)ˆR(ˆwopt)|emaxc2|R(wopt)ˆR(ˆwopt)|emaxc2

Now we show the convergence rate of emax=|ˆΣΣ|. If for a sufficiently large x,

maxi,jP{bT|σijˆσij|>x}<exp(Cx1/a)

for some positive constants a and C and rate bT, then

|ΣˆΣ|=OP((logp)abT)

Moreover, suppose that Rt is bounded and that the returns are weakly dependent in that its α mixing coefficient α(q) decays exponentially, namely α(q)=O(exp(Cq1/b)) for some b<2a1. If logp=o(n1/(2b+1)), then we have the above convergence rate. (See Fan, Zhang and Yu (2012))

To better understand why the gross-exposure constraint helps on risk approximation, we make connections with covariance regularization. By the Lagrange multiplier method, our previous risk profile optimization is,

wˆΣw/2+λ1(w1c)+λ2(1w1)

Let g be the sub-gradient vector of the function w1, whose ith element is 1, 1 or any values in [1,1] depending on whether wi is positive, negative or zero, respectively. Then, the first order condition is

ˆΣw+λ1gλ21=0λ1(cw1)=0,λ10

in addition to the constraints w1=1 and w1c. Let ˜w be the solution. We can show that it is also the solution to the unconstrained portfolio optimization problem

minw1=1w˜Σcw

in which, with ˜g being the gradient evaluated at ˜w,

˜Σc=ˆΣ+λ1(˜g1+1˜g)

We now show the link between regression and risk profile optimization. Set Rp=Y,Xj=RjRp, we have

var(wR)=minbE(wRb)2=minbE(Yw1X1wp1Xp1b)2

and the constraint on gross-exposure can be replaced by a similar L1 norm.

4. Chapter 8 Consumption based CAPM

4.1. CCAPM

Representative consumer consumes on a representative good with quantity Ct and price pt at time t. His income consists of external income It and αt shares on stocks with price St. Now we have the budget constraint,

ptCt=It+(αt1αt)TSt

The Intertemporal choice problem: Let δ be the subjective dis- count factor. An individual wants to maximize w.r.t. αt the discounted expect utility, under the budget constraint

maxEt(j=0δjU(Ct+j))

The first order condition for time t is called Euler condition, with Mt+1 the stochastic discount factor

St=Et[Mt+1St+1],Mt+1=δU(Ct+1)ptU(Ct)pt+1

The same discount factor applies to price each asset:

Si,t=Et[Mt+1Si,t+1],1=Et[Mt+1(1+Ri,t+1)]

The price depends on inflation rate pt+1/pt, and intertemporal rate of substitution δU(Ct+1)/U(Ct).

If there exists a risk-free asset with return rf, then

EtMt+1=(1+rf,t+1)1

Notice that,

St=Et[Mt+1St+1]=(EtMt+1)(EtSt+1)+Covt(Mt+1,St+1)=(1+rf,t+1)1EtSt+1+Cov(Mt+1,St+1)

4.2. Power utility and normal distribution assumption

Assume the utility function has the following form:

U(C)=C1γ11γ

where γ is the coefficient of relative risk aversion. Then,

St=Et[δptpt+1(Ct+1Ct)γSt+1]

Denote Rt+1 be the vector of the inflation-adjusted log-returns:

Rt+1=log(St+1St/pt+1pt)

and assume Yt+1=Rt+1γlog(Ct+1/Ct) follows a normal distribution, then we have the expected excess return (inflation-adjusted log return)

Et(Ri,t+1rf,t+1)=12vart(Ri,t+1γΔCt+1)+γ22vart(ΔCt+1)=12vart(Ri,t+1)+γcovt(Ri,t+1,ΔCt+1)

where ΔCt+1=logCt+1/Ct.

4.3. Mean-variance frontier

Value of a portfolio: Wt=αTtSt. Let Wt+1=αTtSt+1. Then from CCAPM, St=EtMt+1St+1 and,

Wt=EtMt+1Wt+1=Covt(Mt+1,Wt+1)+(EtMt+1)(EtWt+1)

Therefore, we have

EtWt+1(1+rf,t+1)Wt=Covt(Mt+1,Wt+1)(1+rf,t+1)

By the Cauchy-Schwartz inequality,

(EtWt+1(1+rf,t+1)Wt)2vart(Wt+1)vart(Mt+1)(EtMt+1)2

Efficient frontier: Excess gain (return) per unit risk (the Sharpe ratio) is bounded by vart(Mt+1)/(EtMt+1)2, which is not always achievable. Because Mt+1=δU(Ct+1)ptU(Ct)pt+1 which is not tradable. If we can construct a stochastic discount factor using tradable portfolio, we can assure the efficient frontier can be achieved.

Consider Mt+1=αTtSt+1 where αt minimizes

Et(Mt+1αTtSt+1)2

The first order condition yields the Euler condition,

EtMt+1St+1=EtMt+1St+1=St

Therefore, we have EtMt+1St+1=St,EtMt+1=(1+rf,t+1)1 and we can derive the above upper bound for Sharpe ratio again using Mt+1,

(EtWt+1(1+rf,t+1)Wt)2vart(Wt+1)vart(Mt+1)(EtMt+1)2

Obviously, Mt+1 attains the maximum Sharpe ratio, which is called the benchmark portfolio. And the excess gain for any portfolio can be expressed as

EtRt+1rf,t+1=βt[EtRM,t+1rf,t+1],βt=Covt(Mt+1,Wt+1)vart(Mt+1)

5. Chapter 9 Present-value Models

5.1. Fundamental price

The return of a stock using price and dividend,

Rt+1=St+1+Dt+1St1

Assume E[Rt+1]=R is a constant, then

St=(1+R)1Et[St+1+Dt+1]=(1+R)1EtDt+1+(1+R)2Et[Dt+2+St+2]=Ki=1(1+R)iEtDt+i+(1+R)KEtSt+K

Assume the growth condition

limK(1+R)KEtSt+K=0

we have

St=i=1(1+R)iEtDt+i

This is the discounted dividend paid in the life time of a stock and it is called the discounted-cash-flow or present-value model.

If we assume dividends grow at a rate G,

EtDt+i=(1+G)EtDt+i1=(1+G)iDt

Then the stock price is,

St=[i=1(1+R)i(1+G)iDt]=(1+G)DtRG, if R>G

5.2. Rational bubbles

Recall the price equation,

St=(1+R)1Et(St+1+Dt+1)

with a unique solution

St=SD,t if limK(1+R)KEtSt+K=0

The general solution is St=SD,t+Bt where

SD,t+Bt=(1+R)1Et(SD,t+1+Bt+1+Dt+1) or Bt=(1+R)1EtBt+1

Here SD,t is called fundamental value and Bt is called rational bubble. The growth condition will rule out this possibility. The word "bubble" recalls some famous episodes in history in which asset price rose higher than could not easily be explained by fundamentals (investors betting other investors would drive prices even higher in the future). Notice that EtBn+t=(1+R)nBt.

5.3. Time-varying expected returns

How to price an asset if the expected returns are not constant? Let st=logSt,dt=logDt,θ= average dividend-price ratio. The log-return rt+1=log(St+1+Dt+1)/St, using Taylor’s expansion, we have the approximate present-value model

st=κ+ρst+1+(1ρ)dt+1rt+1

where ρ=(1+exp(θ))1 and κ=(1ρ)log(1ρ). By the condition that

limjρjst+j=0

we have the approximate pricing formula,

st=κ1ρ+(1ρ)PdtPrt,Pdt=j=0ρjEtdt+1+j,Prt=j=0ρjEtrt+1+j

results matching ""

    No results matching ""