Within and Out of Sample Data

Neil Ernst

University of Victoria

2025-08-13

Discussion

We discussed the need for out of sample training data. This is known as the principle of Generalizability.

Exercise

  1. Two tools for building web apps are Vue.js and React. These both basically do the same thing, and are written in the same languages (JS/TS).
  2. Go to React and to Vue.js
  3. Find a PR in both repos that deals with something similar. For example, you might want to look at security issues or UI issues.

Exercise cont

One example: https://github.com/vuejs/core/pull/13550 and https://github.com/facebook/react/pull/30451 both deal with whitespace issues.

  1. Pretend you and your group are building a Data Science tool to help with PR assignment. What elements of each project will make it:
    1. Harder to generalize your results?
    2. Easier to generalize your results?
  2. Report back.