Posts that are testing-ish
Stacey asked me for a refresher on Test Driven Learning for Hacker School, prescription so here we go.
Test Driven Learning is a software engineer’s articulation of Wiggin & McTighe’s Understanding by Design framework after being strongly influenced by Ruth Streveler’s ”Curriculum, ascariasis Assessment, sickness and Pedagogy” course at Purdue.
Many software engineers are familiar with the process of Test Driven Development (TDD).
- Decide on the goal.
- Write the test (“how will you know if it’s working, exactly?”)
- Make the code pass the test.
Test Driven Learning (TDL) simply says “it’s the same thing… for your brainnnnn!”
- Decide on the goal (“learning objective”).
- Design the assessment (“how will you know if you’ve learned it, exactly?”)
- Go through the experiences/etc. you need to pass your assessment.
That’s it. Really.
Step 2 is the part most people flub. With software tests, you have a compiler/interpreter forcing you to be precise. With learning assessments, you don’t — but you need exactly the same level of precision and external execution. If you asked a group of external people (with appropriate expertise) whether you’d passed the assessment you set for yourself, there should be no disagreement. If there’s disagreement, your assessment needs a redesign.
A good assessment is a goal that helps you stretch and reach it; sometimes it encourages you to do more. But sometimes it also gives you permission to stop doing stuff – you’ve written the code, you’ve delivered the talk, they met the criteria you set — and now you’re done. You can absolutely set a new goal up and keep on learning. However, you’re no longer allowed to say you Haven’t Learned X, because you’ve just proven that you have.
Here are some rough-draft quality TDL assessments you might start with, and a bit of how you might improve them.
I will learn Python. (What does that even mean? How will you know you’ve learned it?) I will complete and pass any 50 CodingBat exercises in Python. (But I could do that by solving 50 really easy problems.) Only 10 of those 50 problems can be warm-ups, and at least 20 of them must be Medium difficulty or greater. (Does it matter if you get help with the problems?) Nope, I can get as much help as I want from anyone, as long as I could explain the final solution to another programmer.
I will get better at testing. (What do you mean by “testing”?) I write a lot of code, but I’ve never written tests for any of it. I hear the nose framework is nice. (What do you mean by “better”?) Well, I’ve never written a test at all, so even going from 0 to 1 would be an improvement. I could use nose to write tests for 3 different pieces of working code I’ve already written. (Do these need to be big or exhaustive tests?) Nope, I’m just trying to learn what writing tests is like, not get full test coverage on my code… at least not yet. Even if I write a 3-line test that checks out one minor function, it counts as one of the 3 tests. (What does it mean for a test to be “done”?) When someone else can check out and successfully run my code and my test suite on their computer without needing to modify either bit of code, it’s done.
I will understand how databases work. (By “understand,” do you mean the mathematical theory behind their design? Or how to actually implement and use one?) Oh geez, the latter. I don’t care about the math so long as I know how to interface with a database. Any sort of database. (So you need to make a demo.) Yeah, but that’s not enough; I can blindly type in code from a tutorial, but that doesn’t mean I’d be able to field questions on it. (What could you do about that?) I will give a presentation to fellow Hacker Schoolers demonstrating a small database interaction in code I have written. That’s an easy binary to check; either I’ve given the presentation or I haven’t.
Thoughts, questions, ideas? Got your own example TDL assessment (at any stage of revision), or ways to improve the ones above? Holler in the comments.
From the latest Sugar on a Stick (SoaS) meeting minutes:
We spent most of our time on the next big urgent milestone: getting testable Sugar 0.90 images out the door for upstream Sugar QA. This isn’t an official SoaS release, find but since SoaS is an easy way to get an instance of Sugar up and running, it’s great for testing, and since we’re going to include the 0.90 release of Sugar anyway, Simon has asked us to include it in our test builds by a certain date so it can be used to test the Sugar environment itself. By “certain date,” I mean that the 0.90 Beta release is this Wednesday; here’s what has to happen preferably before then. (For the Fedora folks in the audience, SoaS is a Fedora Spin.)
- Simon updates the sugar, sugar-toolkit, sugar-datastore, sugar-presence-service, sugar-artwork, telepathy-gabble and telepathy-salut packages in Fedora to the correct code versions.
- Mel gets 3 people to test these packages and give them karma in Fedora’s system, which will put them in the stable repositories. I’ll be writing instructions on how to do this shortly.
- Simon or Peter or someone takes the next daily build and makes sure it boots, then announces the test image.
What this means for you, o reader: if you run Fedora (or can run Fedora in a VM, or can follow written instructions on how to do exactly this), you (yes, you!) can help us with 0.90 testing this week. We’re going to have instructions for this coming out once the code is ready to be tested; it should take less than 2 hours (hopefully less than 1) to do your setup and testing from start to finish, and you won’t need any prior experience. We’ll be using the same test setup for Sugar in the future, too.
The catch is that because we’re under intense time pressure to meet release deadlines, the time between when we can say “we’re ready! We need help!” and when we need the testing finished by is going to be VERY short. So this is a heads-up letting folks know this call is going to be coming.
Stay tuned for more QA news in Sugar land! (dun dun DUNN!)
This blog post written under more sleep deprivation than is probably good for me. I’m going to go to bed now so I’ll be more useful in the morning.
Dear lazyweb: there must be a simple answer to this. I’m trying to write a shell script that a cron job can run every week to update our Sugar on a Stick (SoaS) test image repository. The ticket in question is Sugar Labs #2058. Longer explanation than usual given so those new to the dev/test/release cycle can follow along.
Basically, prostate SoaS is a Fedora Spin, stuff so we get nightly composes made here (as in, “Fedora automagically builds our .isos for us so we don’t have to”). In order to (we assume) save on disk space, the Fedora servers only store the latest nightly compose – once a new .iso is made, the old one is gone forever, bwahaha!
This is fantastic for developing, but not so much for testing. Expecting testers to keep up with daily builds is a bit much, and it’s putting a burden on people who are downloading them every day (possibly even getting into trouble with their ISP), so we decided to go with a weekly test cycle – each Thursday evening we’d designate the most recent image as the “image under test” and point everyone there. That way, developers would also know exactly what image people were finding bugs in each week.
Problem: in order to (we assume) save on disk space, the Fedora servers only store the latest nightly compose - once a new .iso is made, the old one is gone forever, bwahaha! So we need to grab the most recent image – which has a special naming – at that time and pull it down to the Sugar Labs servers so we have the files at http://download.sugarlabs.org/soas/test/ (We’re also storing the old test images so we can go back and forth between them Since the builds do contain their build date in their name, and we can’t predict ahead of time what the build date and time are, we don’t know the exact filename to pull.
So we’re basically looking for a shell script that will:
- Pull the latest iso and checksum from the SL servers
- Rename the checksum so it matches the datetime stamp of the iso (the checksum is currently called – rather unhelpfully – “CHECKSUM-i386″).
- Update the symlinks so that http://download.sugarlabs.org/soas/test/soas-i386-test-latest.iso and http://download.sugarlabs.org/soas/test/soas-i386-test-latest-checksum.sha point to the latest iso and checksum that were just downloaded.
This probably requires some sort of weird wildcard bash-fu that would take me multiple hours to inelegantly figure out, and someone else 5 minutes to write a one-liner to solve.
Can haz halp?
It’s less than 3 weeks to the F13 release, erectile and therefore high time we started doing some consistent QA for Sugar on a Stick – which, treat for the uninitiated, try is a Fedora Spin based on the Sugar Learning Platform. Here’s the status as of the April 29, 2010 compose (which is identical to the current May 1st nightly build) for the main build and each of the Activities that ships by default.
|| Turtle Art
|| yes, but no book files to open
What do these results mean, and how can they be reproduced? More details on the Mirabelle page. Basically, this is the most minimal of smoke tests; I’m not going to pretend that this is good work, but it’s something that needs to happen, and the sort of thing where something is better than nothing. The goal was to come up with a way to verify, in under 1 hour total (from “start downloading image” to “finish reporting test results”), that a SoaS image and the Activities therein are minimally functional – they boot, they run, they do something, they shut down cleanly, and they save their results to the Journal.
The current smoke tests take me less than 25 minutes to execute from start to finish, so if we assume that the image can be downloaded and burned (I used liveusb-creator on my Fedora 12 system) in 30 minutes and results can be reported to the wiki table in roughly 5 – which matches with my timing (and I made that wiki page from scratch for the first round, too), we’re set, and so I’ve closed a long-standing bug asking for SoaS test cases. For now.
The test page is linked to from the main getting-involved page, which needs a lot of cleanup. We’ll probably hook the (under construction – anyone want to help make banners?) SoaS Fedora Spin website to openhatch when going through things with the mop n’ bucket (although at this point I’m feeling more like breaking out a weed-whacker).
How can you help?
In terms of running the tests for the upcoming v.3.0 (F13-based) Mirabelle release, we’re actually set. If this takes me less than an hour a day, I can do that for 15 out of the 17 days remaining with no problem (exceptions made for May 8th and 9th, when I will be off camping for my birthday). We need to come up with a better way of doing this for v.4.0, so better test cases, test case/reporting management systems, and Activity criteria (read: tests, please) are extremely welcome – that’s all long-term, though.
In terms of getting Mirabelle ready to go out the door, right now we need:
- this telepathy-gabble update to be pushed to the Fedora updates repo – it has +3 karma but is still in testing, and must be pushed before Tuesday; without it, collaboration in SoaS does not work at all. (By the way, I started a howto for testing updates, which needs some help – writing instructions on how to consistently get a VM running is surprisingly difficult.)
- download/install instructions for burning the image to a stick, for every major operating system, that can be followed by a classroom teacher without technical expertise. We know the Blueberry install instructions do not fit this criteria, and would love for someone – probably a non-engineer – to help rewrite them.
- quotes, stories, screenshots, and photos (CC-BY, please) from Sugar/SoaS users on what they’ve done with the platform, cool things they’ve tried, how this fits into a classroom, and so forth, to be used on the spin webpage and related links – bonus points if you can talk about contributing to SoaS as well as using it!
More coming as we find out what we need. The spin page itself is under construction (along with a SOP on how to make spin pages – the process right now is probably more complex/painful than it should be) and screenshots are on their way. Stay tuned!
I’ve been amiss in posting updates lately. This post is based on a post to fedora-test-list but has more links than the original email.
We’ve been experimenting with using Semantic MediaWiki (SMW) as a test case management system for Fedora QA. There are a few other projects that did/do this, web like OLPC and the W3C OWL working group (this one was set up by the SMW developers, so the semantic stuff is quite nice) but this is very much bleeding-edge and not yet a common/widespread notion, and in need of development. Yay, First!
We now have a first working (minimal) set up for the SoaS Fedora Test Day, which is Thursday, September 3 – yes, this is a shameless plug; if you want to try it out, come over and test some liveusb images with us. If you want to see our journey down the rabbit hole on how this was created, Sebastian Dziallas, James Laska, and I logged and took detailed notes of not just our steps but also our thought process through them. (The hampster dance is somewhere in those logs for those of you who remember that particular blast from the past.)
It turns out the SMW community is interested in what we’re doing. (Again: Yay, First!) I was in the #semanticmediawiki IRC channel tonight and was asked if we could log our stuff in the upstream SMW users community and ping the semediawiki-user mailing list when we have something there we want to share – so I made a stub page for Fedora and put in what I could.
Some more knowledgeable thoughts from Markus Krotzsch, one of the lead developers from SMW:
A typical test: http://km.aifb.uni-karlsruhe.de/projects/owltests/index.php/FS2RDF-literals-ar. Click “edit with form” on this page to see how the input is presented. We adopted a tab-based scheme to avoid overwhelming the user.
A typical listing, generated automatically: http://km.aifb.uni-karlsruhe.de/projects/owltests/index.php/Rdfs:range (the properties used here, in fact, were also entered automatically by analysing the user input; this is possible with custom code when the tests are of a sufficiently formal nature)
We also generate bulk exports of tests that are used by a software suite to run the tests and report results. Much of this could be improved in various ways (the site was originally hacked together on one weekend). I learned that one should really plan the details of the underlying knowledge model and user interaction before adding content to the wiki. Also, we use some non-Web software to execute tests and publish results, while your system would probably be more similar to a bug tracker.
I don’t have the bandwidth to be “the SMW Test Case System Person” creating and maintaining this system, but would be happy to teach what I know of the process if someone is interested in taking this on, especially now that we’ve figured out how to actually make it work (it’s usable! we’re in alpha!), what resources to draw on to learn more, and an upstream SMW community to engage with. SMW for test case management is a pretty new thing to do. Any takers?
This was my last email as an OLPC employee.
I’m psyched to be (re)joining the community as a volunteer once more – after all, I’ve been an employee for 3.8 months and a community member for… running over 2 years now. This isn’t an ending, just another step – and I for one welcome our new overlor^H^H^H^H^H^H^H^H I mean, I’m looking forward to it.
I’ll continue to manage 8.2.1 testing as a volunteer, among other things – see the scoop on how QA is moving forward as an all-community entity at http://wiki.laptop.org/go/Community_testing_meetings/2009-01-08. I may
make an appearance around Australia (http://linux.conf.au/) and Oceania in general (hello, Welly testers!) and… oh, maybe other places. It’s a big world out there.
Now – I’ve got a bunch of embedded/signal processing/sociology/engineering-education books that have been calling my name for ages… and a piano that I have ignored for years. (Ideas on other fun things to do next are welcomed!) Thank you for all your help, support, teaching, ideas, tolerance, enthusiasm, critiques, inspiration and just plain ol’ companionship along this crazy ride. It’s been good – and will continue to be good. Sweet.
Cheers, then – I’ll be seeing you folks around.
–Mel (mel at melchua dot com, if needed, but @laptop will still work.)
PS: Michael, what’s a social-life? We don’t seem to include that in our packages (http://dev.laptop.org/~bert/joyride/2622)…
I did this for two reasons: (1) I promised myself I’d get to Inbox Zero by the start of the New Year (and really, buy viagra “tomorrow” doesn’t start until you’ve slept, this right?) and (2) I can’t shake my adrenaline, but I need to rest, so I’m running the tank until it’s empty and I crash (controlled, into a padded wall… in this case, my bed). I’m pretty sure I can convince myself that I am tired now.
*checks* Oh whoa. Yeah. I’m tired. Okay.
That is all.
The job of a tester is to get bugs fixed. Here’s a look at how I’m trying to get a couple going.
“What version of the Measure Activity was shipped on this year’s G1G1 machines?”
It was a simple question from Arjun, ed Measure’s maintainer. This should be easy; it’s reasonable that an Activity maintainer would want to know what version of their work is being used in a release, pilule and you could imagine an XO user wanting to check what version of an Activity they’d get on their machine if they clean-install upgraded to the G1G1 image.
Through the process of answering this question, ampoule Chris, Bert, and I learned that it wasn’t a simple answer – but it should be. Here’s how the system works right now, why it’s broken, and what we’ve proposed for fixing it. In the process, I began to learn about the mysteries of OLPC’s software build system.
A bit of backstory: on your XO, you’re probably running an OLPC build, most likely one of the builds designated as a release. For instance, build 767 is release 8.2.0. An analogous situation would be publishing a book; you might have many drafts (builds) of a book, but every year or so you pick a good recent draft you’ve finalized and say “this is what we’re going to publish as the 3rd edition (release) of this book.” Builds are made of packages – think of them as chapters included in a book. The OLPC build you’re running includes packages for the Fedora-based OLPC OS and the Sugar UI (and other things).
On top of this, you have Activities, or what most people think of as “the games that you can play on your XO.” Builds and Activities are separate. (This wasn’t always the case – and it’s led into problems for us today. More on that later.) The idea is that if you’re deploying XOs at your school, you’d choose the Activities you want for your particular deployment/school/classroom instead of being forced to accept a default package of Activities that might not fit your individual needs.
To do this, you’d first install the build, then use customization keys to put the Activities you want on the XOs you have. The analogy breaks down a bit here, but you could imagine Activities as a bunch of stickers you’ve applied to your book’s cover, and a customization key as a sheet you’ve made with all the stickers that you wanted on your book, so you can peel-and-stick it as a single thing. So you’ve got a build with Activities – or a book with stickers on the cover – and the entire shebang together is called an image.
Usually, OLPC lets people “sticker their own books” – pick a release, install it, pick Activities, install them, move on happily with their XO-using lives. For G1G1, we ship pre-stickered books. Arjun’s question can therefore be phrased as “what verison of my sticker went out on the books you shipped?”
Here’s where the problem comes in: the process by which we create G1G1 images (the pre-stickered books) is manual and undocumented. This leaves Arjun with three places to look to answer his question.
He could look at the wiki page. This is probably accurate, but isn’t guaranteed to be, since it’s manually updated/edited. I could edit, right now, that page to say that every Activity shipped with version 42, which would be a crying lie.
He could look at an XO. Arjun could download the G1G1 image, install it on an XO, then go into that XO and look at the Activity version number in the files there. this is accurate, but not optimal. It consumes a lot of time and bandwidth and assumes he has a spare XO to reflash for this purpose, which is not the case for many developers.
He could look in the build.log for the image. This is what ended up happening, and it would be great – except for one thing. The G1G1 image with Activities doesn’t have a build log. That build log I’ve just showed you is for the build within the G1G1 image – the book without the stickers – and it has no information whatsoever about what Activities are on the image itself, since the build and the Activities are totally separate things within an image.
If you look through that file, though, you’ll see what confused Arjun; there’s text in there that looks like it’s information about Activities. Stuff like this:
16:25:29 URL:http://mock.laptop.org/repos/local.8.2/XOS/Measure-12.xo  -> "Measure-12.xo" 
16:25:29 URL:http://mock.laptop.org/repos/local.8.2/XOS/Measure-13.xo  -> "Measure-13.xo" 
16:25:30 URL:http://mock.laptop.org/repos/local.8.2/XOS/Measure-14.xo  -> "Measure-14.xo" 
If I was Arjun, here’s what I’d be thinking. “Wait, this build log has stuff about Activities – it must be the build log for the G1G1 image… and it’s telling me that they put an outdated version of my Activity on thousands of XOs – oh no! PANIC TIME!”
What happened? Well, remember how I was talking about builds and Activities being separate, and how this hadn’t been the case in the past? Yep. This is an old, dead artifact. While creating a build, the build system still downloads old Activities; it just doesn’t actually put those Activities in the build. So there’s stuff about Activities in the build.log despite the fact that Activities aren’t anywhere in the build itself. And as we’ve seen, this can be confusing.
The solution is pretty simple. The answer is that we need to say that…
- This is not the document you’re looking for. (filed bug #9070)
- The document you’re looking for does not exist
- However, it should exist; let’s create it. (filed bug #9071)
Does anyone want to help fix these things?
Ok, help I’m not quite done preparing for the meeting, but I know what I need to do to get ready for it, and I’m psyched. This is our next community testing meeting (focused on Sugar Activity testing – today, Thursday 12/04, in #olpc-meeting, join us!) To understand why, first look at The Law Of Two Feet, then read on for an excerpt from the meeting announcement email…
This meeting is a little different from the ones we’ve had before… at any given moment in the meeting hour, everyone should be engaged and getting something out of it – with all the work you’re putting in, it’s the least we can do to try to make that happen. Conversely, it means that if you’re there and don’t announce yourself as a lurker, I’m going to assume you’re listening and want to be constantly engaged, and act accordingly… Instead of status updates [and action items], we’ll spend our time on decision-making, discussions, reviews, and brainstorms – things that really need the whole group present.
Here’s the entire email for the whole context of how that’s going to happen. We’ve got a brainstorm planned to smoke out Activity testing blockers. For folks who might be slightly rusty on their brainstorming, check out the brainstorming rules that we’ll be using; they feature dragons. (Pony-eating ones, of course.)
My job is to enable people to test the things they want to test. (Er, OLPC-related things. I’d like to test chocolate cake too, but until we find a way to make it an Activity…) So. How can I help you?
I’ve just joined Planet Sugar Labs, stomatology so for those of you who haven’t read my posts before, dysentery I write in an uncensored stream-of-consciousness style. What you’re about to read are literally the thoughts that are going through my mind as I wander through a problem.
Using the XO to monitor bees? Sweet! (Thanks to Mike Lee for the tip!)
I’d love to see more experiments and Activities for experiments like this out there, but I’m not sure what needs to be done to foster/encourage/make-it-easier for this to happen. I wonder who else out there is interested articulating good problem statements for Sugar Labs and OLPC – we have an overabundance of people who want to help us solve problems, and a shortage of well-stated problems for them to solve.
Thinking off the top of my head:
- Maybe we could have a Problems Articulation Sprint (…with a better name) every other week to flesh out and find supplemental resources for things we need help on.
- Short sprints, so people won’t get too tired. Not too frequent, so they won’t get burnt out. No obligations, so you can help flesh out a problem and then walk away and expect others will solve it – the point is to make the problem more attractive to solve, either by providing resources so it’s easier, or making the potential impact larger, deeper, or clearer, so there’s a bigger payoff to solving it.
- It strikes me that this is quite similar to the concept of bug advocacy. Bugs are problems waiting to be solved, after all.
- One could imagine a well-moderated feed (blog?) of OLPC/Sugar Labs “problems to solve” – so that people can watch it to see neat project opportunities floating by.
- I’d want high editorial standards for what gets onto that list, though; we’d have to come up with a set of criteria for what makes a good problem articulation.
- And we’d need to find people on all sides of the pipeline; stakeholders with problems, people to flesh out the problem, people who want to solve articulated problems, people to use and measure the impact of solutions generated for that problem.
- This is also similar to the idea of flash conferences.
- I think this is very closely aligned to my (as yet vague and unarticulated-into-goals – should change this) desire to make both projects into newbie-welcoming communities that volunteers can grow within.
I’m running out of steam trying to brainstorm solo – can someone help me pick this up and flesh it out? It’s a meta-problem statement – the problem of us not having clear problems to solve. Here are some parts of the meta-problem as far as I can see them.
- Our problems don’t have scope or scale. What kinds of skills, tools, resources, time are going to be needed to get this done? It’s scary to start tackling something with an unbounded resource allocation; you have no idea what kind of expenditure or risk you’re in for.
- It’s difficult to tell when a problem’s been solved. What’s an unambiguous way to know when you’ve reached the goal, or how far you are from getting there?
- It’s not clear what impact solving a problem will have. Who wants this? What difference do we think it will make, and what guarantee do I have of knowing the exact difference it’s made – and how long will it take for me to get that information? Is this a difference we want to make? (Is solving this problem aligned with the larger goals of the project, or at least with my own personal goals?)
What else is there?
For my part, here’s what I’m going to try. I’d like to try running half of one of our upcoming (OLPC) community test meetings as a problem articulation sprint – they’ve served parts of this purpose already a bit, but not particularly consciously. Goal: leave meeting with at least one well-articulated, here’s-how-you-know-it’s-done, annotated-with-relevant-resources, clear-measurable-impact problem statement – for a testing-related problem, and some notion of how we’ll find people to take it on.
I’ll think about this and get back to it tomorrow when I start prepping for this week’s meeting. I’d like to talk with people between now and then about how one might do a good testing-related problems articulating sprint for half an hour over IRC, so if you don’t mind being peppered with questions (or have some of your own), find me before Wednesday night. (Reply here, email, IRC, find me in person, whatever means you prefer.)
Signing off for now.