Journey through AI: Weekly Lessons from the Undergraduate Classroom
From Perceptrons to Patterns: When Students Start to Feel the Code
This fall I launched something new at George Mason University: UNIV 182 – AI4All: Understanding & Building Artificial Intelligence, the first campus-wide course in AI literacy, open to every undergraduate, regardless of major. It satisfies the Mason Core requirement in Information Technology & Computing, and, more importantly, it’s meant to lower the barrier of entry into AI for every student on campus. This is not an appreciation course. We understand, we apply, we critique, we build. This course has a rhythm. Join us!
I broke the course flow in my last post to relate the overdue “Two Things can be True” debates. But about two weeks ago, we started our journey in neural networks and deep learning in this course. In a previous post, “Walk this Way: Cue the Neural Networks,” I shared how we untangled the mighty perceptron and then tangled it again in the multi-layer perceptron (MLP).
The second half of our deep-learning module, which I am summarizing in this post, began with a shift in pace and perspective. Up to that point, students had learned about perceptrons and MLPs somewhat in the abstract: neurons, activations, backpropagation (even though we used desmos a lot). It was time to come back to the historical arc: from the 1950s perceptron to the architectures that could finally handle real-world data, such as images, sequences, language.
On screen was a timeline: Perceptron → MLP → CNN → RNN → Transformers. This timeline captures the rest of our technical journey in this course. I wanted to give the students concrete architecture that they could point to and connect with seminal steps in AI research and innovation. We currently just wrapped up the transformers, and I will summarize that interesting experience in the next post, but today I will relate here how we were able to journey through the CNNs and RNNs first.
Convolutional Neural Networks: Making Sense of Images
I believe it is important to go through the CNNs and give students a deep understanding of this beautiful architecture. This slide provided the framing and inspiration, the hook.
I told them about AlexNet (2012), the model that changed everything in image recognition. The slide showed Fei-Fei Li on one side and the classic CNN diagram on the other. An interesting side note: a few students recognized Ilya Sutskever and perked up, but I made sure to spend time on Fei-Fei Li, whose ImageNet dataset made it all come together. They had heard from Fei-Fei Li before. I had shared with them youtube videos where she masterfully breaks down what AI is and isn’t.
But in the rest of this lecture, there were core ideas to digest: convolution, activation, pooling, flattening, dense layer, output. Each deserved its own moment.
The lecture was deliberately slow. We took apart the CNN layer by layer. There are critical concepts to convey to students, each an important foundation stone in how we understand modern AI.
I’ve come to think of this as a kind of classroom dance: a question to challenge and get the gears moving, an inspirational analogy to open the door, a technically faithful one to keep us honest, then a closer look under the hood. Each step followed by an example, a pause, and a breath. Every concept, a milestone in a journey that should end with students feeling fulfilled in their search for deep understanding, and yet, seeking more.
I enjoy asking difficult questions, with a smile: Here is an image, what would you do with it? How would you feed it to a machine? That opens the door to chunking, the need to ingest important elements, which opens the door to convolution, and to convolved features. Every transition becomes a question: Now what? What can we do with this new matrix? How do we get to the end?
The students will sense we need some way to aggregate, but what do we retain, how do we compress? That opens the door to pooling. Then more challenges: That initial convolution filter is arbitrary, so what if we don’t know what to use? They get it now: we repeat, and the network learns to fix. Aha, so yes: repeat this, have multiple channels.
And then another connection, things that we can leave by the sideway in favor of a linear conceptual journey for a while: we have talked about activation functions. Now, where do they fit here, and why?
The dance of how we would do it, what makes sense, and what next introduces every concept not as arbitrary, but as an answer to a process already unfolding in their own reasoning. They see that these architectural decisions are not arbitrary and not the sole domain of a select few, but principled answers to meaningful questions, and reasonable ones that they hopefully see themselves being perfectly capable of giving.
Now, if you get a sense that this lecture is a demanding one, you would be correct. Board slides, board slides. Moving around made my fitbit super happy. Me, too. On a side note: I am clocking 500+ calories in a 75-minute class. But I hope you get a distinct sense that this course is not an appreciation of AI. That would not be my style.
The students know this. But I still reminded them that we would go technical, more technical, and really technical. I wanted them not to feel uncertain or nervous in any way, but to prepare for the journey ahead. If they wanted to stretch, move around, call a time out at any time, or anything they needed to prepare themselves mentally, they could do so, but we would go technical. I wanted them to see that CNNs aren’t magic but beautifully interwoven geometry and linear algebra, a way of exploiting spatial structure.
I think we did it. Students could trace data flow across the boards and the slides: an input image transforming into numbers, then feature maps, then classification. The “sliding-window” animation that reinforced my schematics on the white board showed how a filter moves across pixels, multiplying local patches into a new representation. The next slide showed max pooling, pulling out the strongest activations, reducing redundancy. Once they saw the 2×2 pooling window shrink a 4×4 grid into a smaller one, the abstraction melted away. They could see how the model becomes selective, how each layer captures more meaningful patterns.
I am very proud of what we were able to accomplish. I was anxious to see how this flow, this dance, would allow students, many of whom have no computational background, to engage with foreign concepts. But they advanced naturally, predicting the next step as prompted by challenging, open-ended questions.
An important moment tied everything together. We opened the 3D CNN visualizer, Adam Harley’s interactive MNIST demo, where students could draw a digit and watch activations light up through each layer. It’s one thing to talk about filters and feature maps; it’s entirely different to see them respond to your own handwriting in real time. The connection was complete. The students grasped why CNNs revolutionized computer vision: local patterns first, abstractions later, depth as the path to understanding.
Recurrent Neural Networks: Learning from Time
Once we had conquered CNNs, I paused for the next journey. We had spent an entire lecture learning how machines “see.” Now it was time to ask: What about remembering?
The transition slide carried only one compact expression:
F(xᵢ | xᵢ₋₁, …, x₁)
That single formula became the bridge from space to time. I told the students that everything we had learned about CNNs applied to data organized in two dimensions, pixels arranged in a grid. But not all data live in space. Some live in sequence: language, music, stock prices, speech. For those, order matters.
The first slide showed a simple series of stock prices. “How might a model predict what comes next?” I asked. I made a joke about Nvidia stock, how predicting always-up was probably right most of the time. We started there.
I believe RNNs are difficult to introduce properly to undergraduate students, at least if you actually want to go really deep. But we struck the right balance. We started with the need to remember previous inputs, not through storage or copying, but through a hidden state that carries information forward. The concept of a hidden state is a difficult one. So, you can use an analogy, but I would not be happy if we just stayed there.
So, I showed the compact RNN block, followed by its “unrolled” version across time steps.
I told them that unrolling was not a duplication that actually happened internally, but instead a visual metaphor to help us understand better. The computer doesn’t make many copies; it simply reuses the same parameters again and again. That’s the key: shared weights across time.
On the next slide, we wrote the formula together:
hₜ = f(Wxₜ + Uhₜ₋₁ + b)
Before I explained it, I asked them what this reminded them of. I prodded them that we had seen this before. And yes, they identified it. They had seen it before, in the humble but mighty perceptron. This was key. It was important to both encourage them to dig deeper, stay with me, but also to see that we build over familiar structures in AI, and that if they understand the foundations, they are sure to be able to build more complex constructs over them.
So, yes, we stayed with this formula for a while. We explained it slowly, term by term. W multiplies the new input. U multiplies the memory from before. b shifts the baseline. f is the activation function (often tanh, sometimes ReLU, both activation functions they had seen and characterized before) that decides how much of the past to keep. We challenged it. We turned it over this way and that way.
They saw that the output of one step becomes part of the input to the next. That small detail, feeding back the hidden state, the heart of recurrence, had now been related, and we could build from there.
The rest of the task was to explore what this recurrence enables. When you feed the network a sentence, each word updates the hidden state. By the end of the sequence, that state holds a distilled memory of everything that came before. If you ask it to predict sentiment, it uses that accumulated context to decide “positive” or “negative.” When you ask it to generate text, it uses the same loop, but instead of predicting a final label, it predicts the next word. That output becomes the next input.
Yes, RNNs are quite an undertaking in an undergraduate course, but they are an important vehicle to relate to students, for various reasons. First, because it allows them to see both the difference between prediction and generation as two sides of the same mechanism, but also the versatility of the RNNs to support both. And, perhaps just as importantly, because I truly believe that students will better understand the need for and the journey to the transformer if we go through the RNN first. That is, their shortcomings.
We talked about the “vanishing gradient,” where memory fades as sequences grow longer. We talked about how LSTMs and GRUs (architectures we brushed over) were designed to address that: how to remember longer without drowning in computation. I reminded them that every new architecture in AI, whether CNN, RNN, Transformer, was born from constraints. When a concept or a model only takes us so far, we innovate.
The final slide brought it all together:
CNNs learn from space. RNNs learn from time. Transformers learn from context.
That sentence stayed up as we closed the lecture.
From Abstraction to Action: Homework 2 – Feel the Code
By the time we reached the end of the RNN lecture, students had gone from the geometry of images to that linear, recurrent rhythm of sequences. It was the perfect moment to bring code into the picture. Yes, code.
I want to set this up for you. Most students in this class are not computer science students. The majority of them have never seen code. And yet, I wanted them to be able to see and appreciate code, and even tweak it. That is why I titled the homework “Feel the Code.” I completely pumped this up. I told them they would love, love this. They could tell their friends and potential companies where I am encouraging them to get summer internships that they, “like,” coded up their own CNN (with a bit of an attitude).
Google Colab is perfect for this. The task: build a simple CNN for digit recognition. I provided the template for them, a fully functional CNN for the MNIST (digit) dataset, mostly prewritten and ready to run.
Their homework was broken down into key tasks:
First, run the model, and explain what you see. We had talked about training, validation, and test data. Now they could see performance reported for each. They could even see salient maps. The first task was just to situate themselves with what was happening, what had the training done?
Their second task, to explain, in plain English, what each line in the code does: where convolution happens, where pooling occurs, what each parameter controls. I asked them for specificity, to connect with concepts we had learned in class, with specific slides even. This was an important opportunity to reinforce concepts, to amplify understanding.
Their final task was to make small, guided modifications: changing filter counts, removing pooling, swapping activation functions, adjusting training epochs. I encouraged them that this would be fun. This was their time to play and to see the impact of different decisions. This was their discovery.
For students new to computation, this simple homework is transformative. They recognize concepts from the lectures now visible in code. They see how the model trains, how accuracy changes, how an abstract lecture turns into a live system learning before their eyes.
This homework assignment is powerful. It is a quiet confidence builder. They can do things. They can build things. They can go beyond understanding.
What is it we did next? We just wrapped up the transformer architecture. Yes, it took three lectures, because we visited for a while, we stayed through that intentional dance of a question to challenge and get the gears moving, an inspirational analogy to open the door, a technically faithful one to keep us honest, then a closer look under the hood. Each step followed by an example, a pause, and a breath. Every concept, a milestone in a journey that should end with students feeling fulfilled in their search for deep understanding, and yet, seeking more.
A brief reflection. How do I know these lectures worked? A teacher knows. You read the room, the head nods, the eyes that follow the argument across the board, the stillness when an idea lands. But sometimes, a moment cuts through all of that.
After this particular class, a student came up, thanked me for the effort, and said something that stopped me short: “You’re a poet of AI. That’s what you are!”
In the moment, I didn’t know what to do with that. It made me pause, a little uncomfortable, but also unexpectedly moved.
My father taught literature. He taught me that words could hold entire worlds, that clarity and beauty are not separate pursuits. Walking back to my office, I thought of my father, long gone. I realized that, in some quiet way, I’ve been carrying his lessons forward into code, into models, into classrooms that he never imagined for me in the US but that he would have understood instantly.
Maybe this is where my father and I still meet, between the lines of a poem and the lines of code.
Missed our other posts tracking the course? You can find them here:
Journey through AI: Weekly Lessons from the Undergraduate Classroom
Journey through AI: Weekly Lessons from the Undergraduate Classroom
Journey through AI: Weekly Lessons from the Undergraduate Classroom
Journey through AI: Weekly Lessons from the Undergraduate Classroom Walk
Journey through AI: Weekly Lessons from the Undergraduate Classroom






It's interesting how comprehensively this 'AI4All' course traces the AI journey from early perceptrons to current CNNs. How you approach the varying technical backgrounds of students when tackling topics like backpropagation or activaton functions?