I've had this idea for a lightweight pair programming app for a while now. One of the main inspirations is the
iOS letterpress app. If you don't know the game it's a letter square, and you try to make the longest word you can from the letter square. Pretty straightforward stuff, except that you can play and challenge people all round the world. You take turns to make the longest word you can. What's really cool about the app is that you can have a little game with a total stranger. At least I assume you are actually playing with someone rather than against a robot. I'm not sure how drop outs are managed. When I think about it, playing against robots would be a more reliable experience, but I don't know the stats for drop outs. I've certainly dropped out of games, but perhaps I'm a sociopath and 99% of people carry on playing with each other till the end?
Anyway, these days there are lots of games where you compete and even team up with strangers, e.g. Splatoon, League of Legends (LoL) and so on. I'd love to learn more about how these games match people up to try and maximise the user experience as I think we have a related problem with matching people up for pairing sessions in our "Agile Development using Ruby on Rails" MOOC. If Splatoon/LoL are the gaming equivalent of full screenshare pairing, then simpler games like letterpress would correspond to a sort of micro-pairing experience on a small toy problem.
Ever since I've looked into the different styles of
ping-pong pairing I've been fascinated how protocols like "one undermanship" have a game like feel. They remind me somehow of turn based games like chess. So I keep thinking of a letterpress style micro-pair programming game where you are involved in light-weight ping-pong pairing sessions with people all round the world.
Maybe there's nowhere near the number of people interested in pair programming as there are in playing word games, so maybe there would never be the critical mass to make it fun, ... unless, robots? I finally spent some time on a micro-pairing robot on a plane recently. There are a couple of challenges; one is working out the rules of this ping-pong pairing game I'm imagining, and another is getting a robot to pair program sensibly with you.
An instance of a pairing game might run like this:
Spec involves input/output, e.g. 2 => 4, 3 => 6, 4 => 8 (in this case a numeric doubler). The "one undermanship" protocol involves writing the absolute minimum amount of code (which we could perhaps measure in terms of characters? code-complexity?)
Pair A writes:
describe 'doubler' do it 'doubles a number' do
expect(double(2)).to eq 4
end
end
Initial failing test. So Pair B works to write the absolute minimum code to make this test pass, e.g.
and then writes a new test that will fail
describe 'doubler' do it 'doubles a number' do
expect(double(2)).to eq 4
end
it 'doubles a number' do
expect(double(3)).to eq 6
end
end
So pair A writes the minimum to make this pass, in this case being intentionally obtuse (they could write
number*2
which would be fewer characters, but perhaps we can give them points for passing only the existing tests and not others?):
def double(number) 4 if number == 2 6
end
then pair A adds another failing test:
describe 'doubler' do it 'doubles a number' do
expect(double(2)).to eq 4
end
it 'doubles a number' do
expect(double(3)).to eq 6
end
it 'doubles a number' do
expect(double(4)).to eq 8
end
end
And finally pair B writes the general solution:
def double(number) number * 2end
Of course pair B could write:
def double(number) 4 if number == 2 6 if number == 3
8
end
But it would be nice if we could somehow end on the more general case, with a stronger set of tests (for edge cases etc.? could build those into the initial input/outputs). The thing about making this an enjoyable game might be a scoring system? So that you get points for one-undermanship obtuseness to an extent, but that past a certain point there's a refactoring bonus. Maybe the sensible approach is to only score a single round of hard coding where the complexity is actually less than the general solution?
So there's also the issue of coming up with a range of simple coding problems that make this more interesting than the most trivial cases - I guess there's enough complexity in a few basic arithmetic problems, and we can collect more over time - there are great repositories like code wars. Anyway, with any multi-player game we have the classic bootstrap problem that if we had a great game that lots of people were playing, then there would be lots of people to play with and it would be great; but initially there are no people playing it. So in the meantime can we scaffold the gameplay with pretend people? Can we write a robot pairer than can make a test pass, and generate a new chunk of code to move the ping pong on?
For a restricted set of cases I think the answer is yes. At least what I started on the plane was a chunk of code that would take and analyse a ruby exception and write the necessary code to make it pass. It's not very complex at the moment, it's basically this:
def fix_code(e) if e.class == NoMethodError /undefined method \`(.*)\' for main\:Object/ =~ e.message
eval "def #{$1}; 'robot method' ; end "
elsif e.class == ArgumentError
/wrong number of arguments \(given (.*), expected \d\)/ =~ e.message
num_args = $1.to_i # could use class or arg to auto-infer an approprate name?
arg_string = (0..num_args-1).map {|i| "arg#{i}"}.join(',')
/\(eval\)\:1\:in \`(.*)\'/ =~ e.backtrace.first
method_name = $1
eval "def #{method_name}(#{arg_string}); 'robot method' ; end"
else
puts "cannot handle error class #{e.class}; #{e.message}"
end
end
What it does at the moment is take NoMethodErrors and ArgumentErrors and fix things up so the specified method is created with the correct number of arguments. Assuming that the game involves working through a set of input/output values on a range of basic arithmetic problems I can imagine it being fairly easy to extend to make the majority of failing tests pass. Given an input/output pair, generating an RSpec test is pretty trivial. So a little more work here and one could have a basic ping pong pairing partner. I don't fool myself that it wouldn't break fairly quickly, but I think rounds of polishing could make it work reasonable well for a range of introductory problems. Would it create a fun game that many people would want to play? Probably not ... Might it be a good learning experience for some people? ... maybe? I think the process of stack-trace/error analysis is quite interesting and a nice feature would be to have the robot be able to explain why it does what it does - they would be canned explanations, but they could highlight how the stacktrace/error has been analysed in order to work out what to do next.
I guess the best initial interface would be to make it a command line game that you could play and the robot would edit the file that you are both working on perhaps? Having started it I'm kind of interested in extending it; we'll see if anyone else thinks this is anything other than mindless naval gazing :-)
So on Friday I followed through with my
plans to get the rest of the FeatureGrader to expose errors in the students’ code to students, rather than just having it respond with “your code timed out or had an error” and I think I was largely successful.
At least I got those few extra code changes deployed into production and my manual tests through the edX interface showed me that my test examples would display full errors for RSpec failures, migrations failures, and Rails failures. Of course I’m blogging before I’ve reviewed how things faired over the weekend, but it feels like a step in the right direction. Even if the students can’t understand the errors themselves, they can copy and paste the output and perhaps a TA has an increased chance of helping them.
I also wrapped my spike in tests like:
Scenario: student submits a HW4 with migration_error Given I set up a test that requires internet connection
Given an XQueue that has submission "hw4_migration_error.json" in queue
And has been setup with the config file "conf.yml"
Then I should receive a grade of "0" for my assignment
And results should include
"SQLException: duplicate column name: director: ALTER TABLE"
to check that the errors would actually be displayed even as we mutate the code. I have a selection of similar scenarios which feel like they are crying out to be DRYed out with a scenario template. Similarly, with these tests in place I wonder if I can’t improve some of the underlying grading code. Maybe we can re-throw these TestFailedError custom errors that look like they might have been intended for communicating submitted code errors back up to the grader. I found myself spending the time I could have been doing further refactoring reaching out to individual students on the forums and in GitHub to add some details about where the grader had been getting stuck for them, and encouraging them to re-submit since the grader had changed and they should now be able to see more details.
I just sneaked a peak at the
GitHub comment thread, and while there are some new issues that could distract me from completing this blog, at the very least I can see some students deriving value from the new level of grader feedback. So grader refactoring? I continue to feel negative about that task. The nested sandboxes of the feature grader … the fear is that refactoring could open new cans of worms and it just feels like we miss a huge chance by not having students submit their solutions via pull request.
So how would a PR-based grader work? Well, reflecting on the git-immersion grader that we developed for the AV102 Managing Distributed Teams course, we can have students submit their GitHub usernames and have the grader grab details from GitHub. We can get a
list of comments from a PR and so if we had code-climate, CI etc. set up on a repo and had students submit their solutions as pull-requests we could pull in relevant data using a combination of the repo name and their GitHub username.
Making pull-requests would require students to fork rather than clone repos as they were originally encouraged to do. Switching back to that should not be a big deal. I was keen to remove forking since it didn’t really contribute to the experience of the assignment and was just an additional hoop to jump through. However if submission is by PR then we want students to understand forking and pulling; and of course that’s a valuable real world skill.
This means all the solutions to the assignments exist in much larger numbers in GitHub repos, but they exist in a lot already, so not much change there. What we might have though is students submitting assignments through a process that’s worth learning, rather than an idiosyncratic one specific to edX and our custom auto graders.
With a CI system like Travis or Semaphore we can run custom scripts to achieve the necessary mutation grading and so forth; although setting that up might be a little involved. The most critical step however is some mechanism for checking that the students are making git commit step by step. Particularly since the solutions will be available in even greater numbers, what we need to ensure is that students are not just copying a complete solution verbatim and submitting in a single git commit. I am less concerned about the students ability to master an individual problem completely independently, and more concerned being able to follow a git process where they write small pieces of code step by step (googling when they get stuck) and commit each to git.
So for example in the Ruby-Intro assignment I imagine a step that checks that each individual method solution was submitted in a separate commit and that that commit comes from the student in question. Pairing is a concern there, but perhaps we can get the students set up so that the pairing session involves author and committer so that both are credited.
But basically we’d be checking that the first
sum(arr)
method was written and submitted in one commit, and then that
max_2_sum(arr)
was solved in a separate commit, and that the student in question was either the committer or the author on the assignment. In addition we would check that the commits were suitably spaced out in time, and of a recent date. The nature of the assignment changes here from being mainly focused on “can you solve this programming problem?”, to “can you solve this code versioning issue?”. And having the entire architecture based around industry standard CI might allow us to reliably change out the problems more frequently; something that feels challenging with the current grader architecture. The current grader architecture is set up to allow the publication of new assignments, but the process of doing so is understood by few. Maybe better documentation is the key there, although I think if there is a set of well tested assignments, then the temptation for many instructors and maintainers is just to use the existing tried and tested problems and focus their attention on other logistical aspects of a course.
Using existing CI systems effectively removes a large portion of the existing grader architecture, i.e. the complex sandboxing and forking of processes. This then removes a critical maintenance burden … which is provided reliably and free by the many available CI services (Travis, Semaphore, CodeShip etc.). Students now start to experience industry standard tools that will help them pass interviews and land jobs. The most serious criticism is the idea is that students won’t be trying to solve the problems themselves, but google any aspect of our assignments and find links like
this. The danger of the arms race to keep solutions secret is that we burn all our resources on that, while preventing students from learning by reviewing different solutions to the problem.
I’m sure I’m highly biased but it feels to me that having students submitted a video of themselves pairing on the problem, along with a technical check to ensure they’ve submitted the right sorts of git commits will reap dividends in terms of students learning the process of coding. Ultimately the big win would be checking that the tests were written before the code, which could be checked by asking students to commit the failing tests, and then commit the code that makes them pass. Not ideal practice on the master branch but acceptable for pedagogical purposes perhaps … especially if we are checking for feature branches, and then even that those sets of commits are squashed onto master to ensure it always stays green …
Footnote:
I also reflect that it might be more efficient to be using web hooks on the GitHub Repos in question, rather than repeatedly querying the API (which is rate limited). We’d need our centralised autograder to be storing the data about all the student PRs so that we could ensure that the student’s submission was checked in a timely fashion.
Comment Wall (7 comments)
You need to be a member of TechHui to add comments!
Join TechHui
Since you are into Artificial Intelligence, I thought you might be into games development as well? My name is Gorm Lai, and I am one of the founders of the Global Game Jam, which takes place at companies and universities all over the world on the weekend of January 28th to 30th 2011. In 2010 we had 138 locations and 4300+ participants from all over the world. We are right now searching for a site in Hawaii for the 2011 event.
A game jam is basically a space where talented individuals, professionals, hobbyists and the new generation of game developers meet on common ground and work together in a fun and intense atmosphere of creativity and experimentation.
If this sounds interesting and you would like to help us find a site for the 2011 event, please let me know. Even if you don't think you have the time, but you might know someone who could be interested, please let me know.
You can see more about the global game jam here http://globalgamejam.org/, and a fun YouTube video about it here http://www.youtube.com/watch?v=GjnAJSmvmZw
Apologies if this post reads like spam,
Gorm
I've been visiting Virtual Manoa and have come up with a few questions.
1) Are there currently only two buildings? POST and Holmes. Does only the LILT lab have content?
2) How do I put up a PowerPoint poster like you have?
3) How do I put up a Web page poster like you have for Diane?
4) How do I put up my twitter status like Aaron and Phil?
That's all for now.
Thanks,
Dave
I'm definitely going to try deploying some custom features, possibly using the OpenSocial API.
BTW - Feel free to invite others :-) This only becomes interesting when we hit a certain critical mass.