Journey through AI: Weekly Lessons from the Undergraduate Classroom
Data → Models → Outcomes
Teaching AI4All: Weeks 1 & 2 of UNIV 182
This fall I launched something new at George Mason University: UNIV 182 – AI4All: Understanding & Building Artificial Intelligence, the first campus-wide course in AI literacy, open to every undergraduate, regardless of major. It satisfies the Mason Core requirement in Information Technology & Computing, and, more importantly, it’s meant to lower the barrier of entry into AI for every student on campus. This is not an appreciation course. We understand, we apply, we critique, we build. This course has a rhythm. Join us!
Starting with Big Questions
In Week 1, we began where every literacy journey should: with students’ own lived experiences. Where do they see AI in their daily lives? What’s hype, what’s helpful, and what’s worrying? From there we traveled back in time, tracing the origins of AI from Aristotle and Boole to Turing and McCarthy. The students quickly saw that AI is not just about machines—it’s a centuries-long conversation about reasoning, logic, and what it means to “think.” Here is the substack post documenting the first week, if you missed it.
From Philosophy to Practice: Breaking it Down
By Week 2, we shifted from history to mechanics: How AI works. Together we broke the process down: define the problem, gather examples (data), choose a learning approach, train a model, evaluate it, and improve.
Here was a cartograph I created for this lecture at the end of August with the help of GPT-5 to focus the students on the three key components: data, model, outcomes.
This was really useful, because it allowed us to focus first on the data, then on the what and how we can learn from the data.
Data: The New Gold
I could not resist pointing students to the ad from Salesforce, where Matthew McConaughey introduces the idea of data being the new gold. And yes, I did do my best impression of McConaughey, “alright alright alright,” but it was obvious I am two decades too late.
Anyway, this was fun, but allowed us to actually define a few things, such as data, data instance, labels, groups.
I did get a wonderful question from a student that went something like this: well, data is certainly important, but the big techs, don’t they compete over the best technology?
This was a fantastic question, because it cut to the heart of how AI ecosystems actually work. So, of course I had to unpack it in class (and doing so here even for you, my Substack readers) and draw out the two sides of the competition. Big tech companies don’t just compete on technology or on data. They compete on both. If data is indeed the raw fuel, the gold, then technology is the engine. Without high-quality, large-scale data, even the best models won’t work well. Without strong algorithms and computing infrastructure, raw data can’t be turned into outcomes. What gives the giants their edge is that they control both: massive proprietary datasets (think Scale AI slurped by Meta) and the cutting-edge architectures and hardware to use them. It’s really the combination that drives the competition. As much as we enjoyed talking about this to make sense of what is happening, it was time to get back to the data → model → outcomes.
Types of Data
We had to dig a bit deeper and understand what kind of data we can have through examples. Emails for spam filters, images for photo tagging or cancer detection, electronic health record data, blood pressure measurements, fitbit data, and more. These examples allowed us to understand key terms in data science: structured versus unstructured data.
Then, back to where this data goes. This slide brought our focus back to the organizing picture, data → model → outcomes.
Predictive or Generative?
The class discussion allowed us to see the two main umbrellas of what we do with the data: predict something about it or generate something from it. So we got into predictive tasks versus generative tasks, predictive AI versus generative AI. It was good for the students to see that not all of AI is generative AI. In fact, we repeated several times in class: Not all AI is Machine Learning, and not all Machine Learning is Deep Learning, and not all Deep Learning is Large Language Models, but I am sure the students will better appreciate this as we get deeper in this course. For now, they understood that not all AI is generative AI, and that there is something as as important, Predictive AI.
Classic Learning Paradigms
For the rest of the lecture we focused on Predictive AI. Right here, we took a short detour. Seeing from examples that spam filtering, photo tagging, breast cancer detection are all examples of predictive AI, because we are predicting something from the data, we used the opportunity to make some of these concepts crisper. So, students spent some time understanding the difference between training data, validation data, and testing data. We determined that for the moment being, we would not be distracted by how we learn from the data, what nuts and bolts hide inside this black box we are calling a model but will instead try to understand the big picture.
It helped students to understand:
Then we appreciated the data collectors, we talked about consent, about noisy, missing data that needed to be cleaned, organized and “wrapped in a bow” for the models. Yes, we talked about the data cycle pipeline:
Students laughed when I admitted that data collection isn’t really my thing. Honestly, it drains me. But, hey, if/when I have to, I’ll roll up my sleeves and do it. What I truly enjoy though is building models.
Determining that we had a whole-view understanding of data, we then turned our attention to how we learn from the data in the predictive setting. We focused on the three classic learning paradigms:
Supervised learning – teaching with examples and labels.
Unsupervised learning – finding structure when no labels exist.
Reinforcement learning – learning strategies through rewards and penalties.
We got into the flavors of supervised learning: classification versus regression; binary classification versus multi-class classification versus multi-label classification. There is no other way to learn these distinctions than to engage in lots of examples, so here is one we did in class:
Now, the last one was meant to stir, and it did. I asked students whether it was ok to just talk about it as “something we can do, and here is where it would fall.” We discussed whether it makes sense, whether we wanted something like that.” This conversation was a great spark into risky outcomes. I told students we would park this for the moment and return to it when it was time to assess models.
Reinforcement Learning
We spent some time here. This is a difficult one to get beyond the superficial. We dug into the vocabulary. We spent some time understanding why this is not generative but predictive AI. It was important to do it through lots of examples.
A really fascinating moment came when we were digging into examples. We were talking about puppies earning treats and robots learning to walk, making a distinction between states and action space, rewards, model alignment, reward hacking, and more, when a student raised their hand: ‘What about military applications?’
I smiled. I had a slide ready for that. The room shifted. The conversation went from playful analogies to some really weighty realities: autonomous weapons, what it would take to build them, and the risks they pose. I watched the students lean in, faces serious, nodding. It was clear that this wasn’t abstract anymore. AI felt immediate, consequential.
Is the Model Good?
And from there, it was the perfect segue into model evaluation. I asked the class how we can know if a model is good, and the answers varied. Risk was on the mind. But for the least risky applications, spam versus non-spam detection, how could we assess models? So, we walked through the basics: accuracy, precision, and recall.
Students saw how the stakes change depending on the context. In spam filtering, a false positive is annoying but survivable. In healthcare or defense, a false positive could be catastrophic. Yes, we got into the confusion matrix, but, more importantly, we opened the door to a deeper point: we, humans we, determine what matters.
It was important to put things in context. Models do not see people and values. They see numbers and functions by which we attempt to align their behavior with our values. And ultimately, models are only as good as three things:
Our ability to understand and articulate our values.
Our ability to translate those values into mathematical formulas.
The intrinsic ability of mathematics itself to capture values.
We decided that we would get back to this.
Ready, Set, Match!
For the rest of the week, we did two things. In one, the students took their first in-class assessment. “Ready, Set, Match!” Students had 30 minutes to work individually through a series of short problem statements. Their task: break down the problem. What is the data and of what type? What is the task, predictive versus generative? What is the learning paradigm, supervised, unsupervised, or reinforcement learning? If supervised, what flavor of it? And should success be measured by accuracy, recall, or precision, or something else? The exercise was less about memorization and more about pattern recognition and critical thinking. I am anxious to see how they did. I told them I was anxious to see how I had done so far in my ability to prepare them for this assessment.
Hands-on Activity: Smiling or not Smiling?
The rest of the class was time to have some fun. But interestingly, one cannot just have fun. So, we decided to pay with Google’s Teachable Machine. I told the students that our task was simple: build a binary classifier to classify images of people into smiling versus non-smiling. We would keep it very simply and have each image only be the face of one person (not other things going on in the image).
So we went to TeachableMachine. Now, the moment you go there, the arrow points to the webcam. Hmm, should we just strike photos of ourselves and train on ourselves? I asked the students whether this was ok? Yes, the first lesson! Remember that collection with consent? But it goes deeper: where does that data go? What does Google do with these images? Aren’t there security concerns? What about privacy concerns? This discussion was perfect, because in their first homework, students will pick an AI use case and reflect first on these very points regarding the data.
We said No to the webcam. Then we discussed what we could do. The students learned what synthetic data means. We got into another conversation. Are we comfortable with using image generation engines? We determined that we could do that in our controlled environment. This then became: how do you prompt effectively the agents in PatriotAI, our university’s AI gateway? What about going and accessing directly our licensed version of Microsoft’s CoPilot?
We saw that the image generation ones have lots of issues. They do not “listen.” They still have trouble with “do not”s. I showed them my agents to generate diverse positive examples (smiling) and diverse negative examples (not smiling). I showed how my CoPilot agent kept asking questions rather than getting stuff done. They could see now the hype versus actually getting stuff done. Anyway, I had prepared some data ahead of time, and we managed to train the tiny classifier, and then the evaluation began, and it was so much fun.
The students realized that the model correlated teeth exposure with smiling. It did badly on images of angry people. But students had fun, because they saw in action the importance of careful data preparation, and they also engaged in the conversations that need to occur before setting something up, before even getting to the model.
What’s Next
Next week we turn the lens from how models learn to what happens when models meet the world. We’ll anchor the discussion in healthcare, education, transportation, climate, and criminal justice and ask: Who’s included (in the data)? Who’s missing? Why doesn’t this model generalize? What’s the impact if we’re wrong, and on whom? We’ll use famous failures and guided prompts to surface trade-offs, measurement choices, and value judgments, then connect those to Mason Core outcomes around ethics, security, privacy, and critical consumption of digital information.
This is the rhythm of UNIV 182: understand, apply, critique, build. Step by step, module by module, we’re laying a shared foundation for AI literacy that’s both technically sound, rigorous, and socially aware.













