FITTING ARIMA MODEL in R
Auto Regressive Integrated Moving Average ( ARIMA) model is generalisation of Auto Regressive Moving Average Model (ARMA) and used to predict future points in Time series.
But what is time series , Acc to google "A time series is a sequence of data points, typically consisting of successive measurements made over a time interval. Examples of time series are ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average." and more in a general way it is a series of values of a quantity obtained at successive times, often with equal intervals between them.
ARIMA models are defined for stationary time series , stationary time series is one whose mean, variance, autocorrelation are all constant over time. But for checking that the time series is stationary or not , we have several statistical tests for them namely.
1) For the
Now we check whether the time series is stationary or not using our statistical test.
> kpss.test(kings)
KPSS Test for Level Stationarity
data: kings
KPSS Level = 0.6906, Truncation lag parameter = 1, p-value = 0.0144
> adf.test(skirts)
Augmented Dickey-Fuller Test
data: kings
Dickey-Fuller = -2.1132, Lag order = 3, p-value = 0.529
alternative hypothesis: stationary
As we have mentioned before for the kpss test, p-value should be greater than 0.05 for a time series to be stationary , but whereas in our case it is the opposite which suggests that it is not stationary.
Also with the adf.test that we have performed above ,the p-value is greater than 0.05 whereas it should be less than 0.05 for a stationary time series. Also we can look at the plot of the data and as according to stationary definition mean , variance etc should be constant over time which is not.
Thus here we concluded that the time series is not stationary and we cannot use ARIMA on this , but don't worry its not the end.
Now we will difference the time series until we obtain a stationary time series.But what does differencing mean?. The first difference of a time series is the series of changes from one period to the next. If Yt denotes the value of the time series Y at period t, then the first difference of Y at period t is equal to Yt-Yt-1.Now we will difference the time series.If you have to difference the time series d times to obtain a stationary series, then you have an ARIMA(p,d,q) model, where d is the order of differencing used.
We can use diff() in R to difference the time series
> king_diff=diff(kings, difference=1)
> plot.ts(king_diff)
Now we will check whether this differencing has worked for us to make it stationary or not
> kpss.test(king_diff)
KPSS Test for Level Stationarity
data: king_diff
KPSS Level = 0.0347, Truncation lag parameter = 1, p-value = 0.1
> adf.test(king_diff)
Augmented Dickey-Fuller Test
data: king_diff
Dickey-Fuller = -4.0754, Lag order = 3, p-value = 0.01654
alternative hypothesis: stationary
The above tests gives us the positive results. the p-value of kpss.test is greater than 0.05 and for adf.test it is less than 0.05 , hence suggesting us the time series is now stationary.Since we have differenced the series by 1, an ARIMA(p,1,q) model is probably appropriate for the time series of the age of death of the kings of England.
Now we have made our time series stationary, its time to select appropriate ARIMA model , which means finding the values of most appropriate values of p and q for an ARIMA(p,d,q) model. To do this, you usually need to examine the correlogram and partial correlogram of the stationary time series. First we will see the acf plot.
>acf(king_diff)
> acf(king_diff, plot=F)
0 1 2 3 4 5 6 7 8 9 10 11 12
1.000 -0.360 -0.162 -0.050 0.227 -0.042 -0.181 0.095 0.064 -0.116 -0.071 0.206 -0.017
13 14 15 16
-0.212 0.130 0.114 -0.009
We see from the correlogram that the autocorrelation at lag 1 (-0.360) exceeds the significance bounds, but all other autocorrelations between lags 1-15 do not exceed the significance bounds
Now we see the pacf plot
> pacf(king_diff)
> pacf(king_diff, plot=F)
1 2 3 4 5 6 7 8 9 10 11 12 13
-0.360 -0.335 -0.321 0.005 0.025 -0.144 -0.022 -0.007 -0.143 -0.167 0.065 0.034 -0.161
14 15 16
0.036 0.066 0.081
The partial correlogram shows that the partial autocorrelations at lags 1, 2 and 3 exceed the significance bounds, are negative, and are slowly decreasing in magnitude with increasing lag (lag 1: -0.360, lag 2: -0.335, lag 3:-0.321). The partial autocorrelations tail off to zero after lag 3.
Since the Corrlelogram is 0 after lag 1 and the partial correlelogram tails of to 0 after lag 3.
Thus we can have ARIMA(3,1,1) model
FORECASTING ARIMA MODEL
Now that we have our ARIMA(3,1,1) model we can forecast the future values with the help of "forecast" package.
> king_arima=arima(kings, order=c(3,1,1))
> king_forecast=forecast.Arima(king_arima, h=3)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
43 64.79235 45.69668 83.88803 35.58804 93.99667
44 67.48443 46.98649 87.98238 36.13553 98.83334
45 68.43173 47.31180 89.55165 36.13159 100.73186
The original time series for the English kings includes the ages at death of 42 English kings. The forecast.Arima() function gives us a forecast of the age of death of the next 3 English kings (kings 43-45), as well as 80% and 95% prediction intervals for those predictions. The age of death of the 42nd English king was 56 years (the last observed value in our time series), and the ARIMA model gives the forecasted age at death of the next five kings
as 64.8 years.We can also plot the ages, as can be seen below
> plot.forecast(king_forecast)
Auto Regressive Integrated Moving Average ( ARIMA) model is generalisation of Auto Regressive Moving Average Model (ARMA) and used to predict future points in Time series.
But what is time series , Acc to google "A time series is a sequence of data points, typically consisting of successive measurements made over a time interval. Examples of time series are ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average." and more in a general way it is a series of values of a quantity obtained at successive times, often with equal intervals between them.
ARIMA models are defined for stationary time series , stationary time series is one whose mean, variance, autocorrelation are all constant over time. But for checking that the time series is stationary or not , we have several statistical tests for them namely.
1) For the
Box.test
, if p-value < 0.05 => stationary
2) For the
adf.test
, if p-value < 0.05 => stationary
3) For the
kpss.test
, if p-value > 0.05 => stationary
We will use time series data on the age of death of successive kings of England, starting with William the Conqueror.We will load the data in R and see the plot
> kings <- scan("http://robjhyndman.com/tsdldata/misc/kings.dat",skip=3)
> plot.ts(kings)
> plot.ts(kings)
> kpss.test(kings)
KPSS Test for Level Stationarity
data: kings
KPSS Level = 0.6906, Truncation lag parameter = 1, p-value = 0.0144
> adf.test(skirts)
Augmented Dickey-Fuller Test
data: kings
Dickey-Fuller = -2.1132, Lag order = 3, p-value = 0.529
alternative hypothesis: stationary
As we have mentioned before for the kpss test, p-value should be greater than 0.05 for a time series to be stationary , but whereas in our case it is the opposite which suggests that it is not stationary.
Also with the adf.test that we have performed above ,the p-value is greater than 0.05 whereas it should be less than 0.05 for a stationary time series. Also we can look at the plot of the data and as according to stationary definition mean , variance etc should be constant over time which is not.
Thus here we concluded that the time series is not stationary and we cannot use ARIMA on this , but don't worry its not the end.
Now we will difference the time series until we obtain a stationary time series.But what does differencing mean?. The first difference of a time series is the series of changes from one period to the next. If Yt denotes the value of the time series Y at period t, then the first difference of Y at period t is equal to Yt-Yt-1.Now we will difference the time series.If you have to difference the time series d times to obtain a stationary series, then you have an ARIMA(p,d,q) model, where d is the order of differencing used.
We can use diff() in R to difference the time series
> king_diff=diff(kings, difference=1)
> plot.ts(king_diff)
Now we will check whether this differencing has worked for us to make it stationary or not
> kpss.test(king_diff)
KPSS Test for Level Stationarity
data: king_diff
KPSS Level = 0.0347, Truncation lag parameter = 1, p-value = 0.1
> adf.test(king_diff)
Augmented Dickey-Fuller Test
data: king_diff
Dickey-Fuller = -4.0754, Lag order = 3, p-value = 0.01654
alternative hypothesis: stationary
The above tests gives us the positive results. the p-value of kpss.test is greater than 0.05 and for adf.test it is less than 0.05 , hence suggesting us the time series is now stationary.Since we have differenced the series by 1, an ARIMA(p,1,q) model is probably appropriate for the time series of the age of death of the kings of England.
Now we have made our time series stationary, its time to select appropriate ARIMA model , which means finding the values of most appropriate values of p and q for an ARIMA(p,d,q) model. To do this, you usually need to examine the correlogram and partial correlogram of the stationary time series. First we will see the acf plot.
>acf(king_diff)
> acf(king_diff, plot=F)
0 1 2 3 4 5 6 7 8 9 10 11 12
1.000 -0.360 -0.162 -0.050 0.227 -0.042 -0.181 0.095 0.064 -0.116 -0.071 0.206 -0.017
13 14 15 16
-0.212 0.130 0.114 -0.009
We see from the correlogram that the autocorrelation at lag 1 (-0.360) exceeds the significance bounds, but all other autocorrelations between lags 1-15 do not exceed the significance bounds
Now we see the pacf plot
> pacf(king_diff)
> pacf(king_diff, plot=F)
1 2 3 4 5 6 7 8 9 10 11 12 13
-0.360 -0.335 -0.321 0.005 0.025 -0.144 -0.022 -0.007 -0.143 -0.167 0.065 0.034 -0.161
14 15 16
0.036 0.066 0.081
The partial correlogram shows that the partial autocorrelations at lags 1, 2 and 3 exceed the significance bounds, are negative, and are slowly decreasing in magnitude with increasing lag (lag 1: -0.360, lag 2: -0.335, lag 3:-0.321). The partial autocorrelations tail off to zero after lag 3.
Since the Corrlelogram is 0 after lag 1 and the partial correlelogram tails of to 0 after lag 3.
Thus we can have ARIMA(3,1,1) model
FORECASTING ARIMA MODEL
Now that we have our ARIMA(3,1,1) model we can forecast the future values with the help of "forecast" package.
> king_arima=arima(kings, order=c(3,1,1))
> king_forecast=forecast.Arima(king_arima, h=3)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
43 64.79235 45.69668 83.88803 35.58804 93.99667
44 67.48443 46.98649 87.98238 36.13553 98.83334
45 68.43173 47.31180 89.55165 36.13159 100.73186
The original time series for the English kings includes the ages at death of 42 English kings. The forecast.Arima() function gives us a forecast of the age of death of the next 3 English kings (kings 43-45), as well as 80% and 95% prediction intervals for those predictions. The age of death of the 42nd English king was 56 years (the last observed value in our time series), and the ARIMA model gives the forecasted age at death of the next five kings
as 64.8 years.We can also plot the ages, as can be seen below
> plot.forecast(king_forecast)
Very useful. Thanks.
ReplyDeleteGlad i could help.
DeleteThe t-values in your model seem to be insignificant. This is a problem.
ReplyDeleteMay I suggest you consider determinstic variables instead of relying upon just differencing or AR parameters?
Detecting Level Shifts in Time Series
Nathan S. Balke
Journal of Business & Economic Statistics
Vol. 11, No. 1 (Jan., 1993), pp. 81-92
May I also suggest you consider searching and adjusting for outliers (#17,21,27)?
I think this is a better model. It is a constant with a level shift and 3 outliers. The last 4 are 0/1 deterministic dummy variables.
Y(T) = 51.967 kings
+[X1(T)][(- 35.9667)] :PULSE 21
+[X2(T)][(+ 20.0333)] :LEVEL SHIFT 34
+[X3(T)][(+ 34.0333)] :PULSE 27
+[X4(T)][(- 38.9667)] :PULSE 17
+ + [A(T)]
thanks for the update..will surely look into that
DeleteVery nicely explained. Very good. Keep it up.
ReplyDelete