Abstract
Dengue is a serious public health concern in Brazil and globally. In the absence of a universal vaccine or specific treatments, prevention relies on vector control and disease surveillance. Accurate and early forecasts can help reduce the spread of the disease. In this study, we develop a model to predict monthly dengue cases in Brazilian cities one month ahead from 2007-2019. We compare different machine learning algorithms and feature selection methods using epidemiological and meteorological variables. We find that different models work best in different cities, and a random forests model trained on monthly dengue cases performs best overall. It produces lower errors than a seasonal naïve baseline model, gradient boosting regression, feed-forward neural network, and support vector regression. For each city, we compute the mean absolute error between predictions and true monthly dengue cases on the test set. For the median city, the error is 1 2.2 cases. This error is reduced to 11.9 when selecting the optimal combination of algorithm and input features for each city individually. Machine learning and especially decision tree ensemble models may contribute to dengue surveillance in Brazil, as they produce low out-of-sample prediction errors for a geographically diverse set of cities.