Topic Modeling in SE
Lots of software engineering involves processing unstructured text, e.g., from issue discussions, documentation sites, app reviews. Topic modeling is a technique for capturing the latent (hidden) topics of a document that has been widely applied in SE.
Learning Outcomes
- overview knowledge of the way statistical topic models work.
- analyze a topic model and understand its benefits and drawbacks.
Lecture Notes
Required Readings
- Campbell, Hindle, Stroulia, Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data
- Agrawal, What is Wrong with Topic Modeling
Activity
- Intro to BertTopic - this is a Colab notebook. Please go through it ahead of class.
Optional Readings and Activities
- compare this to the Novielli work on sentiment analysis.
- Blei, Ng, Jordan, LDA. Classic paper introducing Latent Dirichlet Allocation.
- Barua, What Are Developers Talking About? An Analysis of Topics and Trends in Stack Overflow