Journey through AI: Weekly Lessons from the Undergraduate Classroom
Walk this Way: Cue the Neural Networks
This fall I launched something new at George Mason University: UNIV 182 – AI4All: Understanding & Building Artificial Intelligence, the first campus-wide course in AI literacy, open to every undergraduate, regardless of major. It satisfies the Mason Core requirement in Information Technology & Computing, and, more importantly, it’s meant to lower the barrier of entry into AI for every student on campus. This is not an appreciation course. We understand, we apply, we critique, we build. This course has a rhythm. Join us!
I have yet to reflect on the class debates that followed the “Through Student Eyes: The Promises and Perils of AI” homework. In the meantime, I will share in this post how students started a new chapter in this class. We are now venturing deeper in the science and engineering of AI. Enter: Neural Networks.
How do you introduce neural networks in a meaningful way to a class of undergraduates, which includes freshmen? You start with the biology, the original inspiration, and the history. You sit for a while there, so that then the model of the perceptron can make sense to them, as an imperfect model, as our aspiration to model neurons. The more I thought about this way of introducing students, the more I was convinced this was the healthy way. To start with the ideal and then to relate the imperfect proxy. That way, assumptions going in were calibrated. That way, they could have answers to “why did McCulloch and Pitt” think of it this way?
I then took a short break and told them about Frank, who was decades too early with his amazing invention. Students chuckled when I asked them to compare their phone to a 5-ton computer. But they also realized the ingenuity. This simple model that we had yet to unravel and lay out on the board had been demonstrated useful for an albeit trivial binary classification task. The seed had been laid.
Then, it was time to get serious. So, I went to the board. I asked students, how might we write down a model for the neuron? What do we need? We need inputs, then a thing for what happens or ought to happen inside, and finally we need outputs. Let’s put the fingers in, let’s draw the thing inside as a disk, and then let’s say only one output, something that goes out, for now. Let’s keep it simple.
Then I asked, what now? What goes in? Some numbers. Some attributes. Where do we get them? These are input data. What goes out? A number? What math do we do inside to cook these input numbers together into an output? Here we were stuck. So we went to this slide.
We went over the mathematical model of the mighty perceptron. Why mighty? Because Rosenblatt was able to do binary classification with this very simple, very naive model of a neuron. We asked, why a sum? The answer: seems like the easiest thing to do to aggregate inputs. A student asked: How do you control this thing? The answer: through the weights. What we referred to in earlier classes as “training the model over the training data” now will become figuring out what the actual values of these weights should be. Another asked: how do you actually make those binary decisions? Ah, a simple threshold, a Ferris Wheel of sorts: you need to be this tall to ride. We understood the threshold gating. We then said we will call this the step function, because we will realize that there are different ways to be a gatekeeper, different activation functions.
But let’s finish with what goes in the perceptron first. We need to talk about the bias. Why do we need that? “The bias is the resting potential,” I said, “a biological inheritance, a whisper of readiness.”
And that’s when a student shot up and said, “Wait, that’s the ghost in the machine! The book! I read that book. The bias is the ghost in the machine!”
For a moment I froze. The enthusiasm caught me off guard. Then I smiled.
“Yes,” I said, “in a way. But the book’s ghost is larger; it’s about whether consciousness itself can live in mechanism.”
It was hard to transition from a high-excitement moment to talking shop, but we had to do it. I wanted students to see these things in action. So, I told them to open up their laptops and set up a simple perceptron. A colleague suggested Desmos a while ago. It worked beautifully. Students played with the weights, the bias. They saw the linear decision boundary.
Here was a funny moment. I asked students whether they remembered the equation of the line from high school. One of them, a student that loves math, said: which one do you want? There are like, three different ways. We pointed to the way the perceptron was set up in Desmos. They saw the equation of the line. The students could finally see that the perceptron can only trace lines. I told them this is what researchers mean when they colloquially say things like: “the perceptron is only a linear model.”
It was necessary to then expand our understanding and go through different activation functions that broadened our options beyond the step function. When you want to teach about the whys and hows here, I suggest you have analogies ready. The analogies seemed to get head nods. I was not losing them. They were following.
We went back to “still a linear model” even with fancy activation functions. And then we traced a perceptron with a tanh activation function by hand. Yes, by hand; sort, of, with the help of a tanh calculator.
We finally saw how input data washed over the network in forward passes. And then, when the output was wrong, we asked: What do we do now? Time to introduce backpropagation. I told students upfront that I would say at that moment the scariest word they would ever hear out of me in that lecture. All eyes on. Ok, brace for it: “gradient.” They looked at me again. Then I said, “derivative.” They looked again. Then I said “and this is why we always told you Calc I would be useful one day.” They laughed. A good moment. We were able to understand the backpropagation algorithm. We used words such as weight nudging and tweaking. We employed the analogy of the accelerator, of hitting the gas pedal, or the breaks. Did we overshoot? Slow down. Did we undershoot? Raise those weights. How exactly, which one, how much? So, we talked about error as loss. And then we made the distinctions in classification versus regression, so we could appreciate the loss function(s). And no, we did not calculate gradients, but we did understand the spirit and usage in the backpropagation algorithm.
Then it was time to say goodbye to the mighty perceptron and start stacking them, so that out of linear boundaries we could make complex, curvy ones. We stacked layers of perceptrons by putting them in a mesh, in a network. Yes, neural network. Repeat with me. A network of neurons. Now we got it. And now, cue in the multi-layer perceptron, the simplest neural network that will allow us to get into more interesting architectures soon.
We set one up in Desmos, too, so we could appreciate it. But then, we wanted to do more interesting things, so we went through a brief history of neural network architectures. We talked about how images are perfect for convolutional neural networks, and how convolutional neural networks are just made for images. We opened up the curtain a bit into recurrent neural networks, as precursors to language models. In our next class, we will go in great detail over the convolutional neural network. We will also dig deeper into the recurrent architecture, a must, because what follows them both will be the architecture that finally leapfrogged natural language understanding, the transformer architecture.
The students are excited. They feel that they are now getting into the more complex things. Now they truly understand the difference between architecture and model. They even understand that the architecture restricts the family of models we can explore. They saw that line moving in the Desmos example but never bending. They understand that chasing gradients gets you to one model in that space, and they finally connected concretely one aspect of why we kept saying that training over the training data does not guarantee similar performance over test data.
There is a lot that we cover in this course, beyond the slides, because the class is a conversation. I love building things with whys and hows on the board. We are part explorers, part detectives. When stuck, we go to the slides. When not, we keep exploring forward, and very often, we get ahead of questions and answers that future lectures will setup in greater detail.
We are still having fun! In this course, as in learning, rhythm matters. We keep the beat, question → build → reflect, and walk this way, together.
Missed our other posts tracking the course? You can find them here:








