# 13.1 ARCH and GARCH Models

## 13.1.1 ARCH(1): Definition and Properties

The ARCH model of order 1, ARCH(1), is defined as follows:

Definition 13.1 (ARCH(1))
The process , , is ARCH(1), if ,

 (13.2)

with and

• and is i.i.d.

## 13.1.2 Estimation of ARCH(1) Models

Theorem 12.5 says that an ARCH(1) process can be represented as an AR(1) process in . A simple Yule-Walker estimator uses this property:

with . Since the distribution of is naturally not normal, the Yule-Walker estimator is inefficient. However it can be used as an initial value for iterative estimation methods.

The estimation of ARCH models is normally done using the maximum likelihood (ML) method. Assuming that the returns have a conditionally normal distribution, we have:

 (13.7)

The log-likelihood function can be written as a function of the parameters and :

 (13.8)

where is the stationary marginal density of . A problem is that the analytical expression for is unknown in ARCH models thus (12.8) can not be calculated. In the conditional likelihood function the expression disappears:

 (13.9)

For large the difference is negligible.

Figure 12.5 shows the conditional likelihood of a generated ARCH(1) process with . The parameter is chosen so that the unconditional variance is everywhere constant, i.e., with a variance of , . The optimization of the likelihood of an ARCH(1) model can be found by analyzing the graph. Most often we would like to know the precision of the estimator as well. Essentially it is determined by the second derivative of the likelihood at the optimization point by the asymptotic properties of the ML estimator (see Section 12.1.6). Furthermore one has to use numerical methods such as the score algorithm introduced in Section 11.8 to estimate the parameters of the models with a larger order. In this case the first and second partial derivatives of the likelihood must be calculated.

 Fig.: Conditional likelihood function of a generated ARCH(1) process with . The true parameter is . SFElikarch1.xpl

With the ARCH(1) model these are

 (13.10) (13.11) (13.12) (13.13) (13.14)

The fist order conditions are and . For the score algorithm the expected value of the second derivative has to be calculated. It is assumed that , so that the expression in the parentheses has an expected value of one. From this it follows that

The expectation of is consistently estimated by , so that for the estimator of the expected value of the second derivative we have:

Similarly the expected value of the second derivative with respect to follows with

and the estimator is

Theorem 13.6
Given , it holds that

Proof:
This follows immediately from

Obviously Theorem 12.6 also holds for the parameter in place of . In addition it essentially holds for more general models, for example the estimation of GARCH models in Section 12.1.6. In more complicated models one can replace the second derivative with the square of the first derivative, which is easier to calculate. It is assumed, however, that the likelihood function is correctly specified, i.e., the true distribution of the error terms is normal.

Under the two conditions

1. and
2. (strict stationarity)

and under certain technical conditions, the ML estimators are consistent. If and , hold in addition, then is asymptotically normally distributed:

 (13.15)

with

and

If the true distribution of is normal, then and the asymptotic covariance matrix is simplified to , i.e., the inverse of the Fischer Information matrix. If the true distribution is instead leptokurtic, then the maximum of (12.9) is still consistent, but no longer efficient. In this case the ML method is interpreted as the Quasi Maximum Likelihood' (QML) method.

In a Monte Carlo simulation study in Shephard (1996) 1000 ARCH(1) processes with and were generated and the parameters were estimated using QML. The results are given in Table 12.2. Obviously with the moderate sample sizes () the bias is negligible. The variance, however, is still so large that a relatively large proportion (10%) of the estimators are larger than one, which would imply covariance nonstationarity. This, in turn, has a considerable influence on the volatility prediction.

Table 12.2: Monte Carlo simulation results for QML estimates of the parameter from an ARCH(1) model with replications. The last column gives the proportion of the estimator that are larger than 1 (according to Shephard (1996)).

 #( ) 100 0.852 0.257 27% 250 0.884 0.164 24% 500 0.893 0.107 15% 1000 0.898 0.081 10%

## 13.1.3 ARCH(): Definition and Properties

The definition of an ARCH(1) model will be extended for the case that lags, on which the conditional variance depends.

Definition 13.2 (ARCH())
The process , , is ARCH(), when ,

 (13.16)

with and

• and is i.i.d.

The conditional variance in an ARCH() model is also a linear function of the squared lags.

Theorem 13.7
Let be an ARCH() process with . Then

with .

Proof:
as in Theorem 12.2.

If instead , then the unconditional variance does not exist and the process is not covariance-stationary.

Theorem 13.8 (Representation of an ARCH() Process)
Let be an ARCH() process with . Then

1. is white noise.
2. is an AR() process with .

Proof:
as in Theorem 12.5.

It is problematic with the ARCH() model that for some applications a larger order must be used, since large lags only lose their influence on the volatility slowly. It is suggested as an empirical rule of thumb to use a minimum order of . The disadvantage of a large order is that many parameters have to be estimated under restrictions. The restrictions can be categorized as conditions for stationarity and the strictly positive parameters. If efficient estimation methods are to be used, for example, the maximum likelihood method, the estimation of large dimensional parameter spaces can be numerically quite complicated to obtain.

One possibility of reducing the number of parameters while including a long history is to assume linearly decreasing weights on the lags, i.e.,

with

so that only two parameters have to be estimated. In Section 12.1.5 we describe a generalized ARCH model, which on the one hand, has a parsimonious parameterization, and on the other hand a flexible lag structure.

## 13.1.4 Estimation of an ARCH() Model

For the general ARCH() model from (12.16) the conditional likelihood is

 (13.17)

with the parameter vector . Although one can find the optimum of ARCH(1) models by analyzing the graph such as Figure 12.5, it is complicated and impractical for a high dimensional parameter space. The maximization of (12.17) with respect to is a non-linear optimization problem, which can be solved numerically. The score algorithm is used empirically not only in ARMA models (see Section 11.8) but also in ARCH models. In order to implement this approach the first and second derivatives of the (conditional) likelihood with respect to the parameters need to be formed. For the ARCH() model the first derivative is

 (13.18)

with

The first order condition is . For the second derivative and the asymptotic properties of the QML estimator see Section 12.1.6.

## 13.1.5 Generalized ARCH (GARCH)

The ARCH() model can be generalized by extending it with autoregressive terms of the volatility.

Definition 13.3 (GARCH())   The process , , is GARCH(), if ,

 (13.19)

and

• and is i.i.d.

The sufficient but not necessary conditions for

 (    ) (13.20)

are and . In the case of the GARCH(1,2) model

with . , and are necessary and sufficient conditions for (12.20) assuming that the sum converges.

Theorem 13.9 (Representation of a GARCH() process)
Let be a GARCH() process with . Then

1. is white noise.
2. is an ARMA() process with
 (13.21)

1. with , . when , and when .

Proof:
as in Theorem 12.5.

If follows a GARCH process, then from Theorem 12.9 we can see that follows an ARMA model with conditional heteroscedastic error terms . As we know if all the roots of the polynomial lie outside the unit circle, then the ARMA process (12.21) is invertible and can be written as an AR() process. Moveover it follows from Theorem 12.8 that the GARCH() model can be represented as an ARCH() model. Thus one can deduce analogous conclusions from the ARMA models in determining the order of the model. There are however essential differences in the definition of the persistence of shocks.

Theorem 13.10 (Unconditional variance of a GARCH() process)
Let be a GARCH() process with . Then

with .

Proof:
as in Theorem 12.2.

General conditions for the existence of higher moments of the GARCH() models are given in He and Teräsvirta (1999). For the smaller order models and under the assumption of distribution we can derive:

Theorem 13.11 (Fourth moment of a GARCH(1,1) process)
Let be a (semi-)strong GARCH(1,1) process with and Then holds if and only if . The Kurtosis is given as

 (13.22)

Proof:
It can be proved that and the stationarity of .

The function (12.22) is illustrated in Figure 12.6 for all , , i.e., the distribution of is leptokurtic. We can observe that the kurtosis equals 3 only in the case of the boundary value where the conditional heteroscedasticity disappears and a Gaussian white noise takes place. In addition it can be seen in the figure that the kurtosis increases in slowly for a given . On the contrary it increases in much faster for a given .

 Fig.: Kurtosis of a GARCH(1,1) process according to (12.22). The left axis shows the parameter , the right . SFEkurgarch.xpl

Remark 13.3
Nelson (1990) shows that the strong GARCH(1,1) process is strictly stationary when . If , then the conditions for strict stationarity are weaker than those for covariance-stationarity: .

In practical applications it is frequently shown that models with smaller order sufficiently describe the data. In most cases GARCH(1,1) is sufficient.

A substantial disadvantage of the standard ARCH and GARCH models exists since they can not model asymmetries of the volatility with respect to the sign of past shocks. This results from the squared form of the lagged shocks in (12.16) and (12.19). Therefore they have an effect on the level but no effect on the sign. In other words, bad news (identified by a negative sign) has the same influence on the volatility as good news (positive sign) if the absolute values are the same. Empirically it is observed that bad news has a larger effect on the volatility than good news. In Section 12.2 and 13.1 we will take a closer look at the extensions of the standard models which can be used to calculate these observations.

## 13.1.6 Estimation of GARCH() Models

Based on the ARMA representation of GARCH processes (see Theorem 12.9) Yule-Walker estimators are considered once again. These estimators are, as can be shown, consistent and asymptotically normally distributed, . However in the case of GARCH models they are not efficient in the sense that the matrix is positively definite, where is the asymptotic covariance matrix of the QML estimator, see (12.25). In the literature there are several experiments on the efficiency of the Yule-Walker and QML estimators in finite samples, see Section 12.4. In most cases maximum likelihood methods are chosen in order to get the efficiency.

The likelihood function of the general GARCH() model (12.19) is identical to (12.17) with the extended parameter vector . Figure 12.7 displays the likelihood function of a generated GARCH(1,1) process with , , and . The parameter was chosen so that the unconditional variance is everywhere constant, i.e., with a variance of , . As one can see, the function is flat on the right, close to the optimum, thus the estimation will be relatively imprecise, i.e., it will have a larger variance. In addition, Figure 12.8 displays the contour plot of the likelihood function.

 Fig.: Likelihood function of a generated GARCH(1,1) process with . The left axis shows the parameter , the right . The true parameters are , and . SFElikgarch.xpl

 Fig.: Contour plot of the likelihood function of a generated GARCH(1,1) process with . The perpendicular axis displays the parameter , the horizontal . The true parameters are , and . SFElikgarch.xpl

The first partial derivatives of (12.17) are

 (13.23)

with

and . The first order conditions are
. The matrix of the second derivatives takes the following form:

 (13.24)

Under the conditions

1. and ,
2. strict stationarity of

and under some technical conditions the ML estimator is consistent. If in addition it holds that , then is asymptotically normally distributed:

 (13.25)

with

and

Theorem 13.12 (Equivalence of and )
If , then it holds that .

Proof:
Building the expectations of (12.24) one obtains

For we have

 (13.26)

From the assumption it follows that and thus the claim.

If the distribution of is specified correctly, then and the asymptotic variance can be simplified to , i.e., the inverse of the Fisher Information matrix. If this is not the case and it is instead leptokurtic, for example, the maximum of (12.9) is still consistent but no longer efficient. In this case the ML method is interpreted as the Quasi Maximum Likelihood' (QML) method.

Consistent estimators for the matrices and can be obtained by replacing the expectation with the simple average.