Example: Improving an Exercise
Case study of refining an exercise
Context
Tutorial Topic: R, Intro to dplyr: select, filter, mutate
Context: we are working with a data frame called "police" that contains information on traffic stops by the police in Evanston, IL.
Concept: the filter function
Goal of exercise: practice filtering using boolean statements
Note: I taught the workshop with the initial exercise - that's how I realized all of the problems! Refining exercises over time is to be expected. Don't worry about perfecting your exercises in your first draft, or even for the first time you teach the materials.
Initial Exercise
Filter police
to choose the rows where location is not 60201 or 60202.
Hints:
Think: Not 60201 and not 60202
!=
is the not equals operator, or you could use!
in combination with%in%
The "or" operator is
|
; the "and" operator is&
Answer
filter(police, location != 60201 & location != 60202)
Problems
This tutorial isn't about writing complex conditional statements! It is a tutorial on using dplyr.
In teaching with this exercise, I got questions about "not" and %in%
instead of focusing on filter
I was afraid a straightforward exercise with no twist would be too easy, but I was wrong. Being straightforward is better.
The question opens up a potential digression: %in%
can produce different results than ==
when there are missing values, which isn't the case here, but complicates explaining the code, especially since people had questions about how %in%
works. If there were missing values, participants could have ended up with different results from a small change in the code, which will lead to confusion.
With all of the hints, I was leading people to about 4 different possible answers of how to write this code. There may be multiple ways to do it, but the example code in the teaching materials leading up to the exercise should be pointing them to 1 clear answer.
Rewritten Exercise
Filter
police
to choose the rows where location is 60201 or 60202Filter
police
to choose the rows where location is 60201 or 60202 and subject_sex is "male"
Hints: the "or" operator is |
Challenge: Filter to choose the rows where location is NOT 60201 or 60202
New answers
filter(police, location == 60201 | location == 60202)
filter(police, location == 60201 | location == 60202, subject_sex == "male")
Challenge: filter(police, !(location == 60201 | location == 60202))
Note: there are other ways to do this
Changes
I didn't make any reference to %in%
. Learners could have still used it if they were familiar with that operator, but I didn't open myself up to questions on a topic unrelated to the tutorial.
I didn't have to introduce a new operator (!=
, which I didn't use in the lead-in examples). I used or |
in the examples leading into the exercise so it was familiar, and I reminded them of what that operator was.
I went with two more simple, straightforward steps instead of trying to make one complicated one. This makes the connection to the key concepts cleaner and allows participants to work with 2 variants of using filter
instead of just one.
By breaking it into two parts that were related, instead of just giving them #2 (which also requires them to do #1), I'm also guiding them in how to break down something complicated -- take care of each condition in turn.
To make myself feel better about the difficulty of the question (personal issue: I don't want participants to not be challenged enough), I put my original exercise as a challenge exercise. This way if people did find the exercise too simple, they had something to do.
Additional Notes
I used the actual names of the columns they'd need to reference (
location
,subject_sex
) instead of terms like "zip code" and "sex", so they don't have to figure out which columns to use. Figuring out which columns is not the skill I want them to practice here.I put male in "" ("male") because they'd need to quote it in the expression - a subtle reminder, and easier for copying and pasting -- the zip codes don't need to be quoted.
The challenge question, in addition to being more difficult, connects to the first part of the exercise and is a natural extension of it. It's a question someone might ask if they just saw the first part of the exercise: "But what if I want all zip codes except those?"
There are other answers to #2 above as well, but I don't call that out or focus on them. We weren't looking at the full tutorial here, but in the examples leading to the exercise, I show them that comma-separated expressions are joined together with "and". So I'm sticking with that approach in the "official" answer. If someone asks about using
&
as an alternative, I can answer that quickly without getting sidetracked.
Last updated