2026-06-09
nernst@uvic.ca
Assoc Professor in Computer Science
Acting BSeng Director
ECS 560
Check in/attendance/roll call
This course is about applying Data Science to software engineering data. Data science is a broad term, but I will use it to describe using analytical techniques to support decision making. It combines hacking, statistics, domain expertise, and problem solving.
Updates? New topics?
Project in 6 weeks. Plan accordingly!
The syllabus is our contract. I won’t change things like mark distributions or assignment types, but I can change what we cover and when.
I’m looking for feedback on the topics we are covering, especially the latter part of the term.
I hope to minimize the amount of talking I do in favor of exercises. But that means two things on your end:
Project: the project was previously over 13 weeks. It is now half that time. Thus it will require a lot of work in a short period of time. Plan accordingly!
Accommodations: make sure CAL has your letters so I can adjust accommodations as necessary. If you can, let me know if there are other accommodations you may need beyond extended time.
The format for the slides
We will now spend some time getting the basic tooling for this class installed.
On a piece of paper1, write one or two points about what plausible implies when it comes to LLM-generated code.
Make sure your name and V# are clearly on the paper. Hand this in at the end of class.
Organize by topic areas:
What was one takeaway from the readings/videos? Write this down by yourself. Then turn to your neighbor and see what they thought.
With a colleague, select a quote to challenge (disagree with) and a quote to affirm (agree with) from the PDD article. How does this workflow relate to your previous approaches, co-op experiences, or known best practices?
Create a skill for your AI agent following the demo at https://agentskills.io/skill-creation/quickstart
We’ve seen how skills work. I’d like you to go to the Claude skills repository and install a useful skill for your project. For example, there is a research skill and a bibtex editing skill.
Deploy that into your machine following the instructions, and give a prompt that enables the skill.
We will do what Andrej Karpathy calls “vibe coding”: a zen-like use of autocomplete to try and get the AI to do something useful.
Kent Beck has this nice model of how this works. You add features, hurting modularity, then get the modularity - the options - back.
gemini.qr code for titanic data
On a piece of paper1, list the biggest change in software engineering you have come across.
Make sure your name and V# are clearly on the paper. Hand this in at the end of class.
Get into your project groups. If you do not have a group, now is the time to add yourself to one!
As a group, come up with one example of Goodhart’s Law in action. Ideally in software engineering, but any example could work. What is the outcome of this process?
sonar cloud qr code
go to tab main, then measures, then complexity
Visit https://sonarcloud.io/summary/new_code?id=mediawiki-core and explore the data science dashboard there.
Key concepts: power, effect size, sample size, alpha
Effect size: what is an effect size in the software context?
What are typical sample sizes in SE papers?
Load the two sample files (check Teams) into R, and run a t-test to evaluate the hypothesis that AI makes developers faster. Make sure to print out the descriptive stats first.
Design a sampling strategy for the following question:
and
1
You are the CTO at Spotify, concerned that your big AI investment is not being used. How can you measure the extent to which your developers are using AI, and whether it is a useful tool for them?
link to data
Take some time and convert the jm1 dataset to long form. Hint: the long form is of the form ‘id, variable, value’ and values have to be compatible. (I.e. there are only 3 columns total).
Long forms are usually more useful for data wrangling purposes.
Let’s say we are interested in how well Halstead complexity predicts defects.
What is Halstead complexity? What are these complexity metrics in general? Do some research with your partner.
What do you think about the utility of these metrics?
What distribution fits the data set? Hint: this is sometimes done with kernel density plots.
Furthermore, correlation - aka multi-collinearity - is something we try to remove in regression analysis as it makes the model overly sensitive and possibly inaccurate.
What types of correlation are likely in the data? Hint: think to first principles for the way the metrics get constructed, and how you know source code is created (your “data generating process”).
I like Weka for exploration but for bigger datasets and more extensive experimentation, the command line or a notebook (hence usually Python) is best.
Load your JM1 data into Weka and explore using a supervised classifier. What performance do you get?
Aside: SAVE YOUR EXPERIMENTS! Always record the steps and params you chose for a given exploration. It will save you headaches later. Tools like Data Version Control can help with this.
What is a valid baseline for a defect predictor?
On a piece of paper1, explain what hierarchical models are and why they are useful in looking at the AIDev dataset.
Make sure your name and V# are clearly on the paper. Hand this in at the end of class.
chalkboard
simulation
What is happening in the code (see Github).
qr code
With a partner, draw a DAG expressing causality in this experiment.
On a piece of paper1,
name one ethical implication of moving to GenAI for writing code.
Make sure your name and V# are clearly on the paper. Hand this in at the end of class.
With your group, spend 10 minutes examining the data source(s) for your project (i.e., AIDEv, SWE-Bench, etc.).
Then use either
to draft the section on ethics for your project report. Take a photo of your notes and commit it to your group’s Gitlab repo.
Assignment 2 is to generate a Black Mirror style movie pitch. We will spend some time in class developing these, and present them next week.
In your team, please discuss the following. We will then go thru each pair in turn to discuss the questions.
Build a simple n-gram model using the sample book text on Teams. I’ve uploaded a sample you can use on Teams. I suggest using the smaller “test.txt” data to speed things up.
Go to this issue: pylint 4551

With a partner, read through the logs of the tool I posted to Teams.
On a piece of paper1,
Explain what pass@k is, and why it works (or not) for software tasks.
Make sure your name and V# are clearly on the paper. Hand this in at the end of class.
Weight parameters are the fundamental, learned coefficients that define the network’s connections, while attention weights are dynamic, context-specific values.
We set up the model with these representations of (eventually) Queries, Keys, Values.
Our training phase will start predicting next words given the input, and when there’s an error, backprop updates the matrices to improve the results (billions of times).
In inference, we give a query and the model returns the most likely next word(s).
With a colleague, evaluate the code in this repo.
Take the six components described in the article, and examine the source code implementing those pieces. (see Lines 38-46). Take turns going through the 6 modules, and acting as explainer and questioner.
The goal is to be comfortable understanding how a tool like the mini-agent can make AI chat seem magical.
Answer the question with your colleague: > where is this mini agent going to have problems?
Go to this issue: pylint 4551

With a partner, read through the logs of the tool I posted to Teams.
On a piece of paper1,
TBD
Make sure your name and V# are clearly on the paper. Hand this in at the end of class.
Go through the two estimation readings (COCOMO and Agile Effort…).
Find at least one quote that you agree with (AFFIRM) and one quote you wish to CHALLENGE.
Form a group with 3 other students. Discuss the two quotes you found and the reasons you chose them.
Report back to the class with the group’s two chosen quotes.
How does this change in an LLM era?
A story point or a COCOMO output is just an estimate of what effort and cost will be required.
Ask an AI about the “no estimates” movement. What is a problem with no estimates?
TODO: add these?
One example: https://github.com/vuejs/core/pull/13550 and https://github.com/facebook/react/pull/30451 both deal with whitespace issues.
I’ve uploaded two files to the Teams channel. Each contains a method pair that might be a clone.
With a partner,
On a piece of paper1,
TBD
Make sure your name and V# are clearly on the paper. Hand this in at the end of class.
We have data on change rationale from people who edited StackOverflow (SO) questions. Questions can be edited by moderators or the original asker.
We want to know “Why do people edit a SO question?”?

← Course Home©️ Neil Ernst