EXPONENTIAL SMOOTHING ON FORECASTING DENGUE CASES IN COLOMBO, SRI LANKA

Prediction of number of people to be infected is an essential component in studying any leading diseases. Particularly it is important in dengue disease as it is the most critical mosquito-borne viral disease in the world. The number of reported dengue cases gradually increased all over the world as well as in Sri Lanka. In Sri Lanka, the majority of dengue cases reported in the Colombo district. The authors applied exponential smoothing technique in order to model and forecast dengue cases in Colombo, Sri Lanka. Data consist of monthly reported dengue cases in Colombo district from January 2010 to May 2019. January 2010 to February 2019 data used for model building and rest of the data used for model validation. Both original cases and log transformed cases considered for modelling and Holt Winters smoothing suits well with both cases. Best model in each case and finally the most parsimonious model within these two best models were selected by considering AIC, BIC, MAE, RMSE and MAPE measures. The most parsimonious model fits on log transformed dengue cases. Using the most parsimonious model predictions were made for June to August 2019. It can be concluded that the best model able to fit on the data in an adequate level and reported dengue will increase slowly during the prediction period. .


INTRODUCTION
Dengue is one of the fastest spreading infectious diseases around the world. It is caused by the bites of infected mosquitos which is identified as Aedes aegypti mosquitos. Infected people with dengue fever are drastically increased over recent years in all over the world. World Health Organization (WHO) estimates that 390 million people world-wide getting infected annually with dengue [1]. The most affected areas are Southeast Asia, the Americas and Western Pacific. From the annual estimates of the dengue disease, nearly 500, 000 cases transformed in to more severe form of the dengue; named as dengue hemorrhagic fever and resulted 25,000 deaths annually worldwide. Therefore, dengue controlling and management actions are necessary to reduce the burden of the dengue disease.
Sri Lanka is an island located in the Indian Ocean. Dengue found in Sri Lanka in 1960 [2] and gradually increased the number of infected people over years. Approximately 43 % of the dengue fever cases were reported from the Western Province of Sri Lanka in 2017 and the usually the most affected area with the highest number of reported dengue cases is the Colombo district [3]. The other highly affected areas are Galle, Jaffna, Kaluthara and Gampaha. 51,659 number of dengue cases reported in 2018 and 36 858 suspected dengue cases were reported to the Epidemiology Unit of Sri Lanka from all over the island in March 2019 [3].
The modelling and predicting the number of reported dengue cases will be useful to understand the dynamics of the disease and thus to control it. Exponential Smoothing technique is one of the powerful forecasting techniques available in the area of univariate time series analysis. The invention of exponential smoothing technique is in late 1950s which is during the study of Robert Goodell Brown [4] and expanded the technique by Charles C. Holt [5]. As the name implies the technique uses exponentially weighted observations in order to make predictions. More recent the observation promotes to get a higher weight than older observations. Hence, this exponential smoothing technique is useful in short-term forecasting. Several methods are available within the exponential smoothing technique such as simple exponential smoothing, double exponential smoothing, Holt-Winters additive and multiplicative methods, etc. These methods are widely applying in almost all fields because of its simplicity, accuracy and it also assumes minimum assumptions. Particularly, many applications of exponential smoothing techniques in the field of epidemiology can be found in literature [6,7,8]. But there are limited numbers of studies in the literature on application of exponential smoothing on dengue fever specially in Sri Lankan context.
In this study, monthly dengue cases were predicted using exponential smoothing for Colombo district. Both original data set and log transformed data set considered for modelling under exponential smoothing with the purpose of finding the best model to predict the dengue disease in Colombo.
Multiple model selection criterions were considered in order to select the best prediction model. Availability of an effective prediction model will helpful in anticipating the dengue and to make timely actions on controlling the dengue incidence.

Secondary Data and Model Selection
Monthly reported dengue cases in Colombo district were acquired from the Epidemiology Unit of Ministry of Health, Sri Lanka from January 2010 to May 2019. Exponential smoothing models were fitted for both monthly reported dengue cases as well as for the log-transformed monthly reported dengue cases by considering data from January 2010 to February 2019. Data from March to May in 2019 were used for model validation. Forecasted values for three months from June to August in 2019 were generated from the best exponential smoothing model.

Statistical Tests and Methods
The following tests and methods were used in the study: where the white noise is, n is the number of observations. Null hypothesis is series is non-stationary.

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: Use for testing a null
hypothesis that an observable time series is stationary around a deterministic trend against the alternative of a unit root. The KPSS test is based on a linear regression. It breaks up a series into three parts: a deterministic trend ( ), a random walk ( ), and a stationary error ( ), with the regression equation: . If the data is stationary, it will have a fixed element for an interceptor the series will be stationary around a fixed level.

Autocorrelation Function (ACF)
The coefficient of correlation between two values in a time series is called the autocorrelation function. For example, the ACF for a time series is given by: This value of is the time gap being considered and is called the lag. A lag 1 autocorrelation ( in the above) is the correlation between values that are one time period apart. More generally, a lag autocorrelation is the correlation between values that are time periods apart.

Partial Autocorrelation Function (PACF)
The partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It contrasts with the ACF, which does not control for other lags.

Box-Pierce test
A statistical test of whether any of a group of autocorrelations of a time series are different from zero. Instead of testing randomness at each distinct lag, it tests the "overall" randomness based on a number of lags. H0: The data are independently distributed H1: The data are not independently distributed; exhibit serial correlation. The test statistics is where is the sample autocorrelation at lag k, and h is the number of lags being tested. Under H0, the statistic follows . For significance level α, the critical region for rejection of the hypothesis of randomness is: where h is the degrees of freedom.

Exponential Smoothing
In exponential smoothing recent observations are weighted more heavily than older observations. This method is more applicable when the data represent a trend. In addition to the smoothing parameter used in SES there is another parameter to capture the trend. The step-ahead forecast is generated by concatenating the level estimate at time t as and the trend estimate (which is assumed additive) at time t as as follows: where the level estimate and trend estimate will update by using updating equations with two smoothing parameters and : The first equation represents the level at time t is a weighted average of the actual value at time t and the level in the earlier period, adjusted for trend. The second equation represents the trend at time t is a weighted average of trend in the earlier period and the more recent information. The parameters α and β are in between 0 and 1.
To capture a multiplicative trend, the following changes must be made to the above equations:

Holt-Winters Seasonal Smoothing
If a data set comprises with both trend and seasonality this smoothing will useful in making predictions. This method has two options to capture additive seasonality or multiplicative seasonality. For an additive Holt-Winters model a general equation is: Three smoothing parameters; the level, trend and season will update using the following updating equations: Where and are the three smoothing parameters to capture pattern, trend and seasonality respectively. All three parameters are in between 0 and 1. For multiplicative seasonality the following equations can be used: By considering trend, seasonality and resulting error structure as either additive or multiplicative, varies models can be constructed and validated. where k is the number of parameters in the model and L be the maximum value of the likelihood function for the model. A model with the minimum AIC value will be the best model.

2.2.8.2
Bayesian information criterion (BIC) BIC is one of the model selection criterion to select a model among a finite set of available models. A model with the minimum BIC will be the best model. The formula for BIC is given below: where L be the maximum value of the likelihood function for the model, n is the sample size and k is the number of parameters in the model.

2.2.8.3
Root Mean Square Error (RMSE) RMSE is an accuracy measure which represents the difference between two data sets; predicted values and actual values. That represents error or residual.
, where is the residual at time t and n is the total number of the time periods.

Mean Absolute Error (MAE)
It is the average absolute difference between predicted and actual values.
where is the residual at time t and n is the total number of the time periods.

Mean Absolute Percentage Error (MAPE)
MAPE is a popular measure of prediction accuracy which is given by the following formula: Where is the residual at time t , is the actual value at time t and n is the total number of the time periods. If calculated value of MAPE is less than 10 %, it is interpreted as excellent accurate forecasting, between 10-20 % good forecasting, between 20-50 % acceptable forecasting and over 50 % inaccurate forecasting [9].

RESULTS AND DISCUSSION
R software is mainly used for the data analysis [10]. Figure 1 displays the time series plot of monthly reported dengue cases whereas the associated Autocorrelation Function describes in Figure 2 with the Partial Autocorrelation Function in Figure 3. According to Figure 1 reported dengue cases in the Colombo district varies in between the minimum value of 97 and maximum value of 3620 other than the highest numbers of dengue cases were reported in June and July of 2017. Both ADF and KPSS tests confirms the nonstationarity of the original dengue series at 5% significance level whereas it can be already seen from ACF and PACF plots that there is a seasonality in the series. The Box-Pierce test indicated the dependency of autocorrelations. Therefore, both simple and double exponential smoothing techniques will not be applicable for modeling the series. Hence, Holt Winters smoothing technique was applied to predict future dengue cases. All possible combinations that can be considered for modelling by changing multiplicative and additive structures for all error, trend and seasonality of the series were implemented. The optimal model with minimum AIC, BIC, MAE, MAPE and RMSE selected as the best model to forecast future dengue cases in Colombo district. It consists with multiplicative error, multiplicative seasonality and additive structure for error. Residual analysis of that model of monthly reported dengue cases is given in Figure 4. To minimize the variability present in the original series which is represented in Figure 1, log transformed monthly reported dengue cases also considered for modelling using exponential smoothing. When examining the time series plot ( Figure 5), ACF ( Figure 6) and PACF ( Figure  7) of the log transformed monthly reported dengue cases it can be seen that the transformed series also represent non-stationarity. Further it is confirmed by ADF and KPSS tests at 5% level of significance. The Box-Pierce test indicated the dependency of autocorrelations.
Because there is a seasonality in the transformed series and it is also non stationary, Holt Winters smoothing may be more appropriate for modelling. By changing the additive or multiplicative structure for trend, seasonality and error, all the possible combinations for modelling were considered and optimal model with minimum AIC, BIC, MAE, RMSE and MAPE selected as the best model to predict the monthly reported dengue cases in Colombo, Sri Lanka. The best model for the log transformed series was with additive error, additive seasonality with no trend. Although there is no strong assumption of normality and independency of errors under exponential smoothing, the residual analysis of the best model for transformed series is given in Figure 8. Normality of residuals confirms in Figure 8 than in Figure 4. Among the best models selected under monthly cases and transformed monthly cases the most parsimonious model to predict monthly reported dengue cases was selected by considering accuracy measures which as summarized in Table 1. In both cases using two models predicted values were generated for March to August in 2019 and for March to May 2019 data were used to validate (test) the models.
By comparing the summary measures in Table 1 it can be conclude that minimum AIC, BIC, RMSE, MAPE and MAE values given by the model fitted on log-transformed monthly reported dengue series.  Figure 9 with 80 % and 95 % confidence intervals.  Several model selection criterions were considered in this study in order to select the best model to forecast dengue cases rather than considering one or two criterions. The forecasted values were generated by the best model for the period of March to August 2019. Since the predictions cover the upcoming periods in 2019 it will be more useful in controlling the disease and managing the resources related to the dengue disease rather than some models fit in other researches use past data that do not cover the current year 2019. By considering the forested values of the model it can be concluded that the monthly dengue cases to be reported in the upcoming months (June to August in 2019) will increase slowly in Colombo district.

CONCLUSION
This study successfully models the monthly reported dengue cases in Colombo, Sri Lanka through exponential smoothing technique with the aim of forecasting future dengue cases. Specially, Holt Winters smoothing technique suits well with the available data under the area of exponential smoothing. Both original data and log transformed data considered for modelling with the purpose of finding the best model to predict the dengue disease in Colombo. The best model to predict monthly dengue cases in Colombo district is the Holt Winters exponential smoothing model fitted on log transformed series which exists additive error, additive seasonality with no trend. The forecasted values generated by the best model for the March to August 2019 are 580, 536, 784, 1342, 1898 and 1192. The forecasted values may be useful in taking actions towards controlling the dengue cases in Colombo, Sri Lanka.