A Competitive Model to Forecast a Stock Market Index

This study proposes a competitive model using the Box–Jenkins approach to implement a Box–Jenkins ARIMA-GARCH model in order to improve financial forecasting. Differing from previous studies, we consider optimizing the lagged terms, which assist in capturing the relationships more properly. The competitive model is then used to forecast the stock market index in Taiwan. This study conducts out-of-sample forecasting and compares the root mean square errors (RMSEs) against previous studies. The results show that the competitive model outperformed in terms of both RMSEs and consistency.

The application of these Box-Jenkins ARIMA models consists of three main steps: identifying I(0) or I(1) (unit root test and differencing), Box-Jenkins optimization, and training linear and non-linear models. Many studies suggest that a hybrid model should be applied to solve both linear and non-linear approaches. However, this study uses GARCH(1,1) 2 as a solution for non-linear and complex relationships. Moreover, heteroskedasticity is a specific problem in both time series and statistics (Kristjanpoller and Hernández, 2017;Gray, 1996;Andersen and Bollerslev, 1998;Hakim and McAleer, 2009) and causes biased parameters, which heavily affect the forecasting results. Hence, this study proposes the Box-Jenkins ARIMA-GARCH model.
For comparison purposes, this study chooses various fuzzy time series models, including first-order models (Chen, 1996), bivariate models (Yu and Huarng, 2008), multivariate models (Huarng et al., 2007), and hybrid models (Huarng and Yu, 2006;Yu and Huarng, 2010). In these studies, the fuzzy models do not handle the lagged terms. Therefore, the competitive model is expected to outperform these models.
The contributions of this study are as follows. First, the competitive model successfully builds a time series approach to solve both the linearity and non-linearity of the data with optimizing lagged terms, and presents good forecasting results. Second, the application of the Box-Jenkins approach can better capture the lagged term relationships and thus provide a better forecast. Third, optimizing the lagged terms allows the competitive model (a univariate model) to compete with a bivariate model.
This study proposes a competitive model using the Box-Jenkins approach to implement a Box-Jenkins ARIMA-GARCH model to improve forecasting. Toward that end, the remainder of this paper consists of the following sections. Section 2 reviews the concepts of Box-Jenkin ARIMA, fuzzy time series, and neural network (NN) models. Section 3 describes the data and explains the competitive model. Section 4 uses an example to demonstrate the forecasting analysis. Section 5 compares the performance of the empirical models. Section 6 concludes the paper.

Literature Review
2.1. Box-Jenkins ARIMA ARIMA(p,d,q) is a well-known linear approach that has been applied in many studies in the forecast literature. Before using ARIMA (p,d,q), the stationarity of the data series (order d ) and the order (p,q) should be determined. The best-suited ARIMA can be validated by Akaike information criterion (AIC). For stock index forecasting, many researchers suggest that ARIMA be combined with a non-linear approach, because the asymmetric volatility of the stock index is typically researchers' target interest (Wang et al., 2012;Mostafa, 2010;Kang and Yoon, 2013). The Box-Jenkins can optimize p and q choices in the ARIMA equation as: (1) In ARIMA, the autoregressive series n p−1 A p D(t − p, t − p − 1) captures the linear trend of data, and the moving average series n q−1 W q e t−q captures the linear terms in error. Building a hybrid model, this study combines the GARCH model with ARIMA to better predict the results of a stock market index, called GARCH(1,1). This study also uses the applications of the GARCH model, which is provided to duel with sensitiveness, non-stationarity, and asymmetric volatility series (Engle, 2002). However, various hybrid models have been employed to solve both linear and non-linear characteristics of the stock index problem (Blazsek and Mendoza, 2016;Kazem et al., 2013). GARCH(1,1) is one of the most popular methods (Sbrana and Silvestrini, 2013;Kömm and Küsters, 2015). The square error of ARIMA u 2 t−1 can be used exogenously, and σ 2 t−1 can also be adjusted exogenously in the GARCH function, as in Equation (2). Hence, GARCH(1,1) can exclude the condition of heteroskedasticity in the volatility of a stock market index. (2)

Fuzzy Set Time Series
Fuzzy time series models have been utilized for decades by many researchers. Song and Chissom (1993) proposed the foundation for fuzzy time series models, including (1) to define the universe of discourse and intervals; (2) to fuzzify; (3) to establish fuzzy relationships; and (4) to forecast. We use Chen's (1996) model (referred to as Model 1) as an example of a first-order model and conduct similar forecasts. The heuristic model integrates the heuristic to improve a fuzzy time series (Huarng, 2001) and also is conducted as a multivariate model (Huarng et al., 2007) (referred to as Model 2).

NN Models
The neural network (NN) is a non-linear technique that is similar to the human brain architecture and is applied widely in forecasting. The first NN-fuzzy time series in forecasting was suggested by Huarng and Yu (2006) (Model 3). The basic framework uses the most significant degrees of membership for each observation both for in-sample and out-of-sample forecast when the other ones are ignored, which may affect the outcome. The univariate NN-fuzzy time series model (Yu and Huarng, 2010) (Model 4), which uses all the degrees of membership to establish a fuzzy relationship, is a more innovative and complicated model. The bivariate NN-fuzzy time series model (Yu and Huarng, 2008) (Model 5) performs better than a univariate model by using D(TAIFEX 3 ) rather than D(TAIEX 4 ) as the input to generate a forecast series.

Data
This study employed data for the daily closing stock market index of Taiwan, Taiwan Stock Exchange Capitalization Weighed Stock Index (TAIEX). To facilitate comparisons, the sample size was set to be the same as in Yu and Huarng (2008) from years 2000 to 2004. To achieve forecasting, a previous study stated the importance of out-of-sample observations (Martin and Witt, 1989). Observations from January to October were considered as in-sample data (training sample). This study also used out-of-sample observations for each year, with November to December as the out-of-sample data (testing sample). Hence, the ratios were consistently 10 12 : 2 12 every year.

The Competitive Model
We conducted the Box-Jenkins ARIMA-GARCH(1,1) as follows, named as Model 6.
Step 1. Unit root test Checking for data stationarity is an important step. In the case of stationarity, the TAIEX at time (t − 1) was directly used to forecast the TAIEX at time (t) by Box-Jenkins ARIMA(p;0;q)-GARCH(1,1). The indirect case implies that Box-Jenkins ARIMA(p;1;q)-GARCH(1,1) was applied to use the TAIEX in first difference at time (t − 1) and to forecast the difference of the TAIEX at time (t) if the data series exhibited non-stationarity. We used the augmented Dickey-Fuller (ADF) unit root test to detect data stationarity (Dickey and Fuller, 1979).
Step 2. Difference Many previous researchers used the differences of a stock market index for prediction by the ARIMA family (Babu and Reddy, 2014). Hence, the order d is usually set to be 1. Following the unit root results, if first differences are used as the input series, then the first difference series are calculated by Equation (3): Step 3. Box-Jenkins optimization The orders p and q were determined by the Box-Jenkins method (Zhang, 2003;Hosking, 1981). Each year's data were used to conduct the Box-Jenkins ARIMA (p;d ;q) individually. The orders p and q can be observed by the trend and correlation analysis of the series. The order p can be picked up from ACF (autocorrelation), and the order q can be picked up from PACF (partial autocorrelation). The orders p and q were substituted into ARIMA (p;d ;q) in order to optimize the order by considering the smallest Schwarz information criterion (SIC) and Akaike information criterion (AIC).
Step 4. Building the competitive model Heteroskedasticity is a statistical problem that causes a bias parameter. Hence, GARCH(1,1) equation is usually used to provide the solution for homoskedasticity and to enhance the robustness for the ARIMA family (Brooks, 2014). The hybrid model is a combination of the optimized ARIMA(p;d ;q) model with GARCH(1,1), due to its capabilities in handling non-linear relationships. The hybrid simultaneous model is listed as Equation (4) below, which combines Equations (1) and (2): Here, FD(t,t − 1) is the forecasted values of the stock market index in differencing at time (t); D(t − p, t − p − 1) is the autoregressive series at time (t − p); e t−q is the moving average series at time (t − q), and e pt and e qt are error terms in the autoregressive series and moving average series, respectively. Thus, u t is the total error term, and σ 2 t is the variance series of heteroskedasticity estimated by the error term of the autoregressive moving average series.
Step 5. Forecasting Similar to previous studies, this research takes the backward induction of the difference in forecast values to result in the forecasted stock market index. The output of the Box-Jenkins ARIMA-GARCH(1,1) model still forecasts the index (FStockindex t ). Therefore, the index at time (t − 1) can be calculated as the input of Equation (5): Step 6. Performance evaluation Following a previous study (Yu and Huarng, 2010), this research also uses root mean square errors (RMSEs) to compare the performance, as in Equation (6): where there are n observations, including k in-sample and n − k out-of-sample observations.

Forecasting Analysis
We took TAIEX data in the year 2000 as an example to demonstrate the empirical analysis.
Step 1. Unit root test The null hypothesis of the ADF tests is that TAIEX is non-stationary. We performed this test and rejected the null hypothesis. The results are shown in Table 1. Here, ARIMA(p;d ;q) was applied by using the differences of the TAIEX at time (t − 1) as the input to forecast the TAIEX at time (t). The integrated order d was fixed at 1. The ARIMA (p;1;q) model was thus executed. *** denotes significance at 99%, ** significance at 95%, and * significance at 90%. Note: The null hypothesis of the Augmented Dickey-Fuller (ADF) test is that the TAIEX has a unit root. Step 2. Difference The stock market index on January 5 was 8849.87, and it was 8756.55 on January 4. Hence, Step 3. Box-Jenkins optimization By using Box-Jenkins methods, the value of order p can be considered as suitable when the autocorrelation value in the ACF column is observed from the highest |−0.178|(lag p = 18) to the smallest value 0.084 (lag p = 28) and larger than 0.05. We employed a similar approach with the order q (Table 2). We considered ARIMA(18;1;4,15) as the most optimized, in which SIC = 12.9664 and AIC = 12.9126.

Model
Available order to be chosen as p and q lagged terms (see the full  (0;1;4,15,18) 12.9686 12.9131 2 *** denotes significance at 99%, ** significance at 95%, and * significance at 90%. Step 4. Building the competitive model By using Equation (4), we optimized the error terms u t of both autoregressive series e pt and moving average series (e pt ) in GARCH(1,1). The new hybrid model, which can solve for heteroskedasticity, is expected to generate a better output. We followed Box-Jenkins ARIMA-GARCH(1,1) methods to optimize the forecast model for each year. Table 3 lists all the results.

Year
Best model  Step 5: Forecasting As in Equation (4), the closing index on November 1 was 5425.02 and on November 2 it was 5625.08. Employing the model, the forecast of stock difference (FD(t,t − 1)) was 14.5274 on November 2. Hence, the output of the model (FStockindex t ) was computed as 5439.55 for November 2. Table 4 presents the forecasts.  Table 4 Forecast from the Box-Jenkins ARIMA-GARCH(1,1) model.
Step 6. Performance evaluation For the year 2000, the RMSE of Model 6 was 122.68.

Empirical Analysis
We repeated the forecasting for all years and compared the performance of six models in terms of RMSE, as in Table 5. In the table, the competitive model (Model 6) performed best among all the models, because it had more years with smaller RMSEs than the other models. To show consistency, the competitive model also outperformed the other models, because it had more total number of years with smaller RMSEs than each model. (1) First-order model (Chen, 1996) 176  The competitive model could capture both the linearity and non-linearity of the data, which is the same as any hybrid models in previous studies. In the model, the Box-Jenkins approach allows the researcher to optimize the lagged terms of both autoregressive series and moving average series by minimizing the white noise, which means that more correct information is taken into consideration. Hence, the forecasting results are expected to improve. Model 5 (bivariate NN-fuzzy approach), using all the degrees of membership, exhibited good results. The drawback of taking all the degrees of membership for training and forecasting is that there can be too many fuzzy sets or inputs for the NN.

Conclusion
This study proposes a Box-Jenkins ARIMA-GARCH model as a competitive model to improve forecasting, as it optimizes the lagged terms of both autoregressive series and moving average series by minimizing the white noise. Due to the coverage of most optimizing information, the model performed better than many previous studies. Another advantage of taking the optimized lagged terms is that the competitive model can improve a univariate model to compete with a bivariate model.
The competitive model can easily expand its function and also calculate fuzzy relationships. Following the empirical results in this study, the competitive model can solve problems of both linear and non-linear data. For future work, if other suitable inputs can be observed, then the competitive model can be easily expanded to bivariate models.