On Evidence in Learning Programming

As we build a new digital and computational program (Digital and Computational Studies, or DCS) at Bates, I wrestle with a simple question: what role should evidence play in the design and development of DCS? Specifically, today, my reflection will ultimately focus on the language and tools we use to introduce students to computing. However, I will begin with a short digression regarding teaching in the classroom, because most people will be familiar with the experience of “taking in a class,” whether it is at the primary, secondary, or post-secondary level.

Evidence-Based Practices In the Classroom

Next fall I will be teaching a CURE: a Course-based Undergraduate Research Experience. This is a course that is structured around students engaging in research inquiry: we will attempt to answer a question the answer to which is unknown (and, ideally, of interest to people other than just us). There is evidence in the literature regarding the value of research experiences for undergraduates, which is why these kinds of experiences are being pushed into the student’s curricular experiences. There are also questions we still do not know the answer to regarding the efficacy of CUREs as an instructional vehicle. (That makes sense; there is a great deal we do not know about teaching and learning, so to say there are unknowns or things yet to learn anywhere in the space of teaching and learning is, in truth, not meant to be a stone thrown.)

There are, however, many things I need to keep track of in the classroom if I value evidence in my practice as an educator. For example, my classroom practice—how I interact with my students—is a critical space for me to focus. Just a few examples:

When I ask questions to the class, I should make sure I count (one Mississippi, two Mississippi... to roughly 10 seconds), and give students time to think.
Or, perhaps I have my students think, then pair up and discuss, and then share out. This lets them explore ideas in a small group before hazarding the (to some, intimidating) sharing of ideas in a large group.
I should randomize my selection of students using an external aid—perhaps a deck of cards with their names on the cards—so that I don't make a habit of calling on only women in the class, or men, and so on.

(As an aside, regarding the last bullet… I had a colleague who learned she only called on students on the left-hand side of the class… and she only learned that after years of teaching because she allowed her classroom to be videoed. It was an ingrained habit that was invisible to her, and clearly left the right-hand side of the room out of every conversation she facilitated in her class.)

This list of evidence-based practices is actually small; there’s 20-30 I could keep track of, and at this point in my career, I use many of them on a regular basis. There are still practices I don’t use on a reflexive basis, and that’s something to continue working on. (And, I don’t even track their use in a way that I could consider evidence if I was going to write up a report on the work that I do in the classroom.) In other words, I am aware of places in my practice where my engagement with students can still be improved based on evidence-based practices, and the amount of work I would need to do in order to communicate that evidence to others.

Evidence-Based Practices in Teaching Programming

All of this, however, leads up to a space that has been invariably difficult in every program and department I have ever taught in: the choice of the programming language we use to teach novice programmers. In truth, it is more complex than “just the language.” We need to consider:

the tools we use to program in that language
the computing environment that exists around those tools (be it UNIX, Windows, Mac, or the WWW)
the text(s) that support the learning of those tools.
the resources available in the community-at-large (video, weblogs, etc.) from learners and practitioners to support ongoing exploration and multiple perspectives
support for transitions to/from the language and tools
support from colleagues across campus in the foundational choices being made outside of their department(s)

The list can get very long. My point here that these are complex tools, that involve complex ideas at every level, and that complexity is a cross product of tools, languages, environment, support resources, and the socio-cultural context of the institution, meaning the complexity is (in no small part) a result of the system of considerations that need to be made, and not just any one dimension. It is often the case that our literature regarding novice programmers fails to peel apart this complexity, or worse, fails to engage in good scholarship, and instead appeals authority as a rationale for our actions when it comes to teaching novices.

On Authority and Evidence

When it comes to discussing these languages and tools, it is commonplace for computer scientists (and any practitioners who work in computational spaces) to appeal to authority when making decisions about how (and what, and why) to teach programming. That authority might be in the literature, but more often than not, it is personal authority (years of experience), or a limited set of experiences with a particular environment or book (but no evidentiary inquiry), or (perhaps the most dangerous) an appeal to the current marketplace: what is “popular” right now with employers in the post-graduate marketplace, as opposed to what tools are best for introducing students to the learning of computing and programming.

My colleague Mark Guzdial, recently moved to Michigan State from Georgia Tech, wrote a piece for the Communications of the ACM that began to explore the idea of authority and evidence in the teaching of programming. His article was essentially exploring two themes.

One theme of Mark's article was to rebut the myopic and sexist perspectives in an article that was making the rounds at the time. It is important that Mark engaged in this rebuttal, but I don't want to give more oxygen to the small-minded and rediculous belief that women—simply because they are women—cannot excel in computing. There has never been any evidence of this, nor will there be. This is an important theme unto itself, I agree 100% with Mark that there is nothing in the learning sciences literature that even remotely suggests any biological/physiological difference between human beings when it comes to learning programming, and it is not the thread of my argument here.
The second theme of Mark's article was the preoccupation of computing, as a discipline, to appeal to authority. I want to explore this further.

This came up just yesterday (November 14th, 2018) on a disciplinary mailing list; in particular, the question was asked:

I’m teaching an intro to programming class this coming spring for students with zero background in coding. I plan to use Python to ease them into the basic programming concepts (not sure about the IDE yet), and then transition to Visual Basic to give them access to a nice GUI builder and also the ability to use some of these skill for possible scripting in MS Office or other automation tasks. The second language also serves to demonstrate how much of the knowledge learned in one language can transfer to another.
...
Finally, if anyone would be willing to share their syllabus, or project ideas that were highly engaging and fun for students in a similar course I would be very appreciative. Right now I'm thinking data manipulation/analysis type tasks mostly for Python, while VB and the GUI might be nice for some small utility or db type programs perhaps - open to suggestions.

There’s so much to unpack in this question. I won’t do it justice, but I’ll try and summarize the key issues.

Language Choice. What rationale does the asker have for using Python? What evidence is there to support its use in the classroom? They do go on to mention that the rest of the curriculum is taught in Java...
Tools. The asker has no idea what tools they will use for teaching Python... yet, tools matter a great deal when learning to program. We'll come back to this.
Multiple Languages. What rationale does the asker have for using two (very) different programming languages in a 15-week span of time?
Motivation. The asker suggests that they want "fun" projects. What does the asker mean by "fun?" How does this relate to their goals and outcomes for the course, and (more broadly) for their department and institution?

There is more to unpack in those two paragraphs, but this is a starting point that gets to the core issues and challenges I see in using evidence-based practices in the first teaching of programming at the college level. Mark responded to the thread (referencing his previous CACM article), and reminded us of some importing points (which I paraphrase/expand on here):

The language matters. It shapes how students think about what they are doing, there are languages that are easier to learn than others (because they were designed, intentionally for learners), and that we can study this (and have).
The UNIX command-line is not simple. It was developed by experts for experts. There are many HCI design principles that are not at work in the UNIX command line. It is effectively a language unto itself, and therefore should be treated as a complex learning space just like the act of programming itself.
Professional programming environments are too complex. Environments like R Studio, which is a popular choice (or nearly the only choice) for writing R scripts for data analysis was designed by and for experts. (Actually, it is unclear whether the people who developed R were expert software developers with any knowledge of usability. They may have been biologists who learned to write code.)
There are programming environments designed for novices. There are environments like BlueJ and Dr. Racket, MakeCode, and Scratch, and App Inventor (to name a few) that are designed, top-to-bottom, with the beginner in mind. We have good research about (some/most) of these environments, and we have empirical evidence they make a difference in the learning our students engage in, the ability for our students to retain that learning, and their desire to keep on taking courses with us and continue learning more.

We can dive deep into any of these dimensions, but I want to continue to pause on the original question posed on the SIGCSE mailing list: what language do I choose? In particular, I’m going to reflect briefly on the kinds of pressures we often feel as educators in an institutional context when it comes to these kinds of decisions.

Language Choice: Pressures

The rationales for language choice are often motivated by pressures from colleagues, students, and the marketplace. I want to consider each of these briefly.

The marketplace is fickle: every few years, something new is “hot,” and “the thing to learn.” Currently, the flavor-of-the-week might be Google’s Go, which is intended to be a concurrent answer to systems programming languages like C. Or, perhaps it isn’t a language, but instead “machine learning,” suggesting that it is important to know how to use Tensor Flow (a library for doing machine learning work), or some other tool that was just released last week that I haven’t heard about yet. Either way, the marketplace has nothing to do with the teaching of people who have never written code before; it is the space of experts who spend 40+/hours week on their task, and have the time to master complex, and sometimes rapidly changing, tools.

While I have a great deal of respect for my students, the few who have strong opinions about what language we should be using probably have had minimal experience using the tools they profess would be best. Or, they have read a blog about the most recent Thing to appear in the marketplace, and therefore they believe that is critical for us to learn. Students do not walk into Calculus and insist we use some new notation; they expect the Leibniz notation to be used (if they have any expectations at all), and that’s that. But they walk into courses involving programming full of ideas. That’s wonderful, but it isn’t evidence.

Colleagues know the tools they know. They’re generally overworked, and rarely have interest in learning new tools. From their perspective—especially if your course is a “feeder” to their courses—it would be best if your course taught the tools they are using. It does not matter if your institution has faculty using multiple tools… any one colleague will want your students to learn the tool they use. The choice of tool that your colleague uses is rarely evidence based, but instead is what their research group used, or what they learned as an undergraduate, or what the marketplace is currently centered on within their discipline.

At Bates, we use STATA in Economics (and some R), R in Politics, SPSS is used in Psychology, Python and Matlab in Mathematics and Physics (and probably some C/C++), and Isadora and Max/MSP (amongst other programmatic tools for multimedia work) in Art/Music/Dance. No one is casually prepared to retool their teaching or research, but it is probably the case that most faculty would prefer that, if there is going to be an introduction to computation and programming, that it would prepare students for their particular flavor of computation and programming. The fact that these are radically different contexts, with radically different tools being used is generally secondary in the thinking of any one faculty member or department.

If it was so simple as to make an evidence-based choice, I would likely ground students’ experiences in a block-based environment in a first course, and have two courses that further introduced them to the structured approach to programming that is epitomized in How to Design Programs, which anchors the (evidence-based) Bootstrap curriculum (for middle-school learners) and a design-centric approach to software construction at the college level. However, these choices (when made in a department or on campus) tend to be political and negotiated, and it isn’t clear that the notion of research and evidence necessarily is enough to convince colleagues that the tools and environments they know might not be the right tools and environments for their students when their students are taking their first steps on a journey that the faculty took so long ago, they’ve forgotten what it was like.

Language Choice: Evidence

Weintrop and Wilensky recently published a marvelous study of 4000+ students and their first learning of programming using block-based languages. Their question was the following:

How does block-based programming compare to text-based programming in highschool introductory computer science classes with respect to learning outcomes, attitudes, and interest in the field of computer science?

The paper is worth a read. The essence of their results is that students gained more confidence pre/post with block-based environments, demonstrated greater learning gains on content using block-based environments, enjoyed themselves more (block-based), and were substantially more interested in taking further computing courses.

There are few other programming languages and environments that have a body of research around them that is coherent and evidentiary. BlueJ has scholarship around its objects-first approach, including the very coherent STREAM process that Michael Caspersen and Michael Kolling have published (which effectively represents a culmination—though not a stoppoing point—of this line of work). A great deal of research undergirds the development of the Racket programming language, its associated (free) text How To Design Programs, and the tower of languages that are provided to support learners (from the Beginner language, to the Intermediate language, and so on)—each of which was designed, based on evidence from use, to support learners from the syntax and structure through to the kinds of errors they can experience. Kathi Fisler’s work around the Rainfall problem (The Recurring Rainfall Problem, Sometimes Rainfall Accumlulates) are studies that capture the current state of inquiry around this ecosystem of language and environment that has seen continuous use, development, and study for over 20 years. (Arguably, because Racket is a close design descendant of Scheme, we have been studying these tools and their use with students since the late 1960’s.)

In Closing: What To Do?

Sally Fincher et al. looked at how we, as educators, change our practice. In their paper Stories of Change: How Educators Change their Practice, they asked 99 educators (mostly computer science educators or closely related) to address the following question:

Can you think of a time when something—an event, an article, a conversation, a reflection, an idea, a meeting, a plan—caused you to make a change in your teaching? What was it? What happened?

The work led them to the following result:

Of the 99 change stories analyzed, only three demonstrate an active search for new practices or materials on the part of teachers, and published materials were consulted in just eight of the stories. Most of the changes occurred locally, without input from outside sources, or involved only personal interaction with other educators.

Bringing this all the way back from the global to the local, I would claim Fincher’s article should give us pause as we develop a new computational program at Bates. The article raises difficult questions regarding the role of evidence in the design and development of courses, our choices of tools and languages in teaching computing, and how we engage across disciplinary boundaries as we engage in the design and development of a new computational program at Bates.

Perhaps, through intentional design, and a willingness to commit to new learning on the part of ourselves and our colleagues (an expensive proposition in time), we might decide that evidence matters. However, we might also decide that the evidence is not “good enough,” in which case we will help ourselves feel comfortable doing what we “know best,” because we decide the evidence is not of sufficient quality or rigor. In other words, it is easy for all of us to make the comfortable choice of privileging our own knowledge and expertise, making a kind of “internal appeal to authority” when faced with change or the unknown.

I believe the most dangerous reason to make choices is because because we are in a hurry. If we rush, we are unlikely to actually explore and discuss evidence-based practices in computing, and will instead “just teach Python and R,” because it is a safe set of choices in the current climate, both campus and in the marketplace. (These languages are, after all, the languages of machine learning and data science!) But neither of these tools have a rich base of evidentiary research in the novice programming context, and both lack infrastructure to scaffold the learner well. We could build that infrastructure, and develop the associated research… but that, itself, is a monumental undertaking.

In short, as a computer scientist and computing education researcher who cares deeply about understanding the what, the why, and the how of my teaching… I’m uncertain what is the best course of action when it comes to engaging in what feels very much like a campus-wide (or certainly multiple-department) dialogue around the teaching and learning of programming. How (and, even, if) to advance the state of evidence, and weather the attendant questioning and attacks, is hard.

The question is, in short, should evidence play a role in language and tool choice as we design a new digital and computational program at Bates? I feel like I know how I would want to engage with that question, but that is different than what the department or even community might want to engage its time and energies.

Matt Jadud cv | linkedin | github | bitbucket