Large Language Models in SE
AI-supported development tools, like Codex, Claude, ChatGPT, etc., have taken a big role in SE recently. What underpins these tools, how do they work so well, what ethical concerns do they raise, and what can we expect for SE in the AI future?
Learning Outcomes
- a more than passing awareness of how large language models “work” on code
- deep dive on language parsing and embedding for LLMs
- ability to discuss the (current) tradeoffs of these tools
- analyze the way such tools are evaluated and discern hype from reality
- how to apply these tools for SE problems like refactoring or code comprehension
Before Class
Lectures
Readings
- Hindle et al., On the Naturalness of Software
- Codex
- SWE Bench Verified
- Components of A Coding Agent
- Chapters 1-3 of Raschka, “Build an LLM”
- Come to class June 10 with a working embedding from the samples in the book.
In Class
Slides
- In-class notes
- Research Opportunities (no video, covered in class)
Data and code
- Implement Command in code: Command Pattern
Optional Readings and Activities
- LeGoues, Survey of APR
- DeepSeekCoder
- Vaswani et al. Attention is all you need
- See this page for more benchmarks and metrics: https://slds-lmu.github.io/seminar_nlp_ss20/resources-and-benchmarks-for-nlp.html
- https://peterbloem.nl/blog/transformers
- Part 2: Pre-trained BERT Model and Requirements Classification
- Lambert, et al., “Illustrating Reinforcement Learning from Human Feedback (RLHF)”, Hugging Face Blog, 2022.
- https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
- https://magazine.sebastianraschka.com/p/new-llm-pre-training-and-post-training
Helpful tutorials and summaries:
- Alammar, The Illustrated Transformer
- Willison, “understanding GPT tokenizers”
- Self-Attention from Scratch
- https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
- https://jalammar.github.io/illustrated-word2vec/
- https://towardsdatascience.com/attention-is-all-you-need-discovering-the-transformer-paper-73e5ff5e0634