Mpox Viral Sentiments: LLM and BERt-based Approaches to Sentiment Analysis

Project Overview:  This project explored use of social media and news data (Meta, X, Reddit, Google News) and techniques to identify the changing social sentiment regarding Mpox, a viral disease that emerged as a global epidemic in 2022, with North American focus and again in summer 2024, within Africa.  The team partnered with Dr. Bouchra Nasri from the University of Montreal and the project won the Dean’s Prize for best data science capstone.

Approach: My contribution to the team’s work was to use unsupervised topic modeling to identify key themes in the Mpox outbreak and how they evolved from 2023-25. The work primarily used the BERTopic package in Python, which takes advantage of context-based vectors, pre-trained using the BERT transformer model, and then clustered the embeddings to identify themes that are contextually similar to one another.  OpenAI’s ChatGPT was used to develop summary of the postings to better interpret the themes. Other members of the team explored use of LLMs to accurately predict sentiment based on hand labeled X, in a supervised modeling approach. 

Figure 1: Overview of the BERTopic Process:

Source: The Algorithm – BERTopic.” Accessed: Mar. 30, 2025. [Online].
Available: https://maartengr.github.io/BERTopic/algorithm/algorithm.html#visual-overview

Project Outputs: (Links to GitHub)

  • BERTopic Code
  • Draft Academic Paper

Team Presentation:

Project Collaborators: https://www.linkedin.com/in/jasonzhaotmt/, https://www.linkedin.com/in/ananth-krishnan-dataai/ , https://www.linkedin.com/in/thomas-brauch-514738/

Featured Image:

Leave a comment