Aarhus University Seal

Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis

New publication by Song Quan Ong, Hamdan Ahmad

Abstract:

Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquito-human interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion.

https://doi.org/10.7717/PEERJ.17045