2025-07-29
In your team, please discuss the following. We will then go thru each pair in turn to discuss the questions.
How did “AI” help in Software Engineering tasks in the past?
As we will see, one way to think about a language model is that it can solve a task like: 1
System.out.println("neil is an awesome instructor") do?Source code is a language!!
Complete:
System.out.println( ...
Complete:
My cat has three ...
{).var theVar; ... ; theVar = 5;). Need something to represent this span of relationship.Convert text into numeric representations of the text. Remember that in training we will be using these tokens billions of times.
Try out https://simonwillison.net/2023/Jun/8/gpt-tokenizers/
Create a language model by effectively counting how often a particular set of n tokens occurs. For example (from the Hindle paper):
\(p(a_4|a_1a_2a_3) = \frac{count(a_1a_2a_3a_4)}{count(a_1a_2a_3*)}\)
Measure success by log perplexity, or cross-entropy:
\(H_{\mathcal{M}}(s) = - \frac{1}{n} log~p_\mathcal{M}(a_1 ... a_n)\)
With your project team, build a simple n-gram model for answering Jeopardy questions. I’ve uploaded a sample you can use on Teams.
After preprocessing, we need to encode the tokens - a word like “function” - into a numeric representation.
function = [0.322, 0.113, 0.567,..]. Importantly, words that “mean” similar things (for our domain of interest) should be closer together on some distance metric.This works OK … but language is pretty complicated.
The word embedding model is restricted to looking a few tokens ahead or behind (the window/context).
That means learning more complex relationships (e.g., this phrase modifies a previous noun) are hard to do.
Can we improve on these approaches?
Another approach is to use deep learning via attention mechanisms in transformer models. This is how BERT, GPT4, T5 etc work.
Supervision is when a human labels or validates the machine’s results.
A transformer is a ML architecture that encodes an input and decodes output:


Earlier we discussed the problem of sliding windows. Attention is a way to make the model ‘remember’ what it saw before.
attention example from jalammar
Get low-paid gig workers to solve the knowledge acquisition bottleneck.
From https://huggingface.co/blog/rlhf:
the policy is a language model that takes in a prompt and returns a sequence of text (or just probability distributions over text). The action space of this policy is all the tokens corresponding to the vocabulary of the language model and the observation space is the distribution of possible input token sequences, which is also quite large given previous uses of RL. The reward function is a combination of the preference model and a constraint on policy shift.
What other RLHF approaches have you been subjected to?

source: https://huggingface.co/blog/rlhf
What constitutes a good test for LLM coding?
Test set pollution: we have to assume the training data includes problems from online (e.g. LeetCode)
Discuss: “match-based metrics are unable to account for the large and complex space of programs functionally equivalent to a reference solution”
What languages and programming problem types should be in the test set?

Neil Ernst ©️ 2024-5