Overlapping types of contribution

Screen Shot 2014-08-21 at 14.02.27TL;DR: Check out this graph!

Ever wondered how many Mozfest Volunteers also host events for Webmaker? Or how many code contributors have a Webmaker contributor badge? Now you can find out

The reason the MoFo Contributor dashboard we’re working from at the moment is called our interim dashboard is because it’s combining numbers from multiple data sources, but the number of contributors is not de-duped across systems.

So if you’re counted as a contributor because you host an event for Webmaker, you will be double counted if you also file bugs in Bugzilla. And until now, we haven’t known what those overlaps look like.

This interim solution wasn’t perfect, but it’s given us something to work with while we’re building out Baloo and the cross-org areweamillionyet.org (and by ‘we’, the vast credit for Baloo is due to our hard working MoCo friends Pierros and Sheeri).

To help with prepping MoFo data for inclusion in Baloo, and by  generally being awesome, JP wired up an integration database for our MoFo projects (skipping a night of sleep to ship V1!).

We’ve tweaked and tuned this in the last few weeks and we’re now extracting all sorts of useful insights we didn’t have before. For example, this integration database is behind quite a few of the stats in OpenMatt’s recent Webmaker update.

The downside to this is we will soon have a de-duped number for our dashboard, which will be smaller than the current number. Which will feel like a bit of a downer because we’ve been enthusiastically watching that number go up as we’ve built out contribution tracking systems throughout the year.

But, a smaller more accurate number is a good thing in the long run, and we will also gain new understanding about the multiple ways people contribute over time.

We will be able to see how people move around the project, and find that what looks like someone ‘stopping’ contributing, might be them switching focus to another team, for example. There are lots of exciting possibilities here.

And while I’m looking at this from a metrics point of view today, the same data allows us to make sure we say hello and thanks to any new contributors who joined this week, or to reach out and talk to long running active contributors who have recently stopped, and so on.

Trendlines and Stacking Logs

TL;DR

  • Our MoFo dashboards now have trendlines based on known activity to date
  • The recent uptick in activity is partly new contributors, and partly new recognition of existing contributors (all of which is good, but some of which is misleading for the trendline in the short term)
  • Below is a rambling analogy for thinking about our contributor goals and how we answer the question ‘are we on track for 2014?’
  • + if you haven’t seen it, OpenMatt has crisply summarized a tonne of the data and insights that we’ve unpicked during Maker Party

Stacking Logs

I was stacking logs over the weekend, and wondering if I had enough for winter, when it struck me that this might be a useful analogy for a post I was planning to write. So bear with me, I hope this works…

To be clear, this is an analogy about predicting and planning, not a metaphor for contributors* 😀

So the trendline looks good, but…

Screen Shot 2014-08-19 at 11.47.27

Trendlines can be misleading.

What if our task was gathering and splitting logs?

Vedstapel, Johannes Jansson (1)

We’re halfway through the year, and the log store is half full. The important questions is, ‘will it be full when the snow starts falling?

Well, it depends.

It depends how quickly we add new logs to the store, and it depends how many get used.

So let’s push this analogy a bit.

Firewood in the snow

Before this year, we had scattered stacks of logs here and there, in teams and projects. Some we knew about, some we didn’t. Some we thought were big stacks of logs but were actually stacked on top of something else.

Vedstapel, Johannes Jansson

Setting a target was like building a log store and deciding to fill it. We built ours to hold 10,000 logs. There was a bit of guesswork in that.

It took a while to gather up our existing logs (build our databases and counting tools). But the good news is, we had more logs than we thought.

Now we need to start finding and splitting more logs*.

Switching from analogy to reality for a minute…

This week we added trendlines to our dashboard. These are two linear regression lines. One based on all activity for the year to-date, and one based on the most recent 4 weeks. It gives a quick feedback mechanism on whether recent actions are helping us towards to our targets and whether we’re improving over the year to-date.

These are interesting, but can be misleading given our current working practices. The trendline implies some form of destiny. You do a load of work recruiting new contributors, see the trendline is on target, and relax. But relaxing isn’t an option because of the way we’re currently recruiting contributors.

Switching back to the analogy…

We’re mostly splitting logs by hand.

Špalek na štípání.jpg

Things happen because we go out and make them happen.

Hard work is the reason we have 1,800 Maker Party events on the map this year and we’re only half-way through the campaign.

There’s a lot to be said for this way of making things happen, and I think there’s enough time left in the year to fill the log store this way.

But this is not mathematical or automated, which makes trendlines based on this activity a bit misleading.

In this mode of working, the answer to ‘Are we on track for 2014?‘ is: ‘the log store will be filled… if we fill it‘.

Scaling

Holzspalter 2

As we move forward, and think about scale… say a hundred-thousand logs (or even better, a Million Mozillians). We need to think about log splitting machines (or ‘systems’).

Systems can be tested, tuned, modified and multiplied. In a world of ‘systems’ we can apply trendlines to our graphs that are much better predictors of future growth.

We should be experimenting with systems now (and we are a little bit). But we don’t yet know what the contributor growth system looks like that works as well as the analogous log splitting machines of the forestry industry. These are things to be invented, tested and iterated on, but I wouldn’t bet on them as the solution for 2014 as this could take a while to solve.

I should also state explicitly that systems are not necessarily software (or hardware). Technology is a relatively small part of the systems of movement building. For an interesting but time consuming distraction, this talk on Social Machines from last week’s Wikimania conference is worth a ponder:

Predicting 2014 today?

Even if you’re splitting logs by hand, you can schedule time to do it. Plan each month, check in on targets and spend more or less time as required to stay on track for the year.

This boils down to a planning exercise, with a little bit of guess work to get started.

In simple terms, you list all the things you plan to do this year that could recruit contributors, and how many contributors you think each will recruit. As you complete some of these activities you reflect on your predictions, and modify the plans and update estimates for the rest of the year.

Geoffrey has put together a training workshop for this, along with a spreadsheet structure to make this simple for teams to implement. It’s not scary, and it helps you get a grip on the future.

From there, we can start to feed our planned activity and forecast recruitment numbers into our dashboard as a trendline rather than relying solely on past activity.

The manual nature of the splitting-wood-like-activity means what we plan to do is a much more important predictor of the future than extrapolating what we have done in the past, and that changing the future is something you can go out and do.

*Contributors are not logs. Do not swing axes at them, and do not under any circumstances put them in your fireplace or wood burning stove.

MoFo Contributor Dashboard(s) – switching to Plan A

tl;dr:

  • We’re wrapping up work on the MoFo Interim Dashboard
    • The only other data source we’ll add is the badge counts for webmaker mentors & hive community members
    • This is still our MoFo working document for the time being
  • Then, we’ll switch our development efforts into integrating with Project Baloo
    • Baloo is where we will de-dupe contributors across teams/tools etc.
    • areweamillionyet.org will become our working document in time (‘time’ is TBC)

Switching to Plan A

The MoFo contributor dashboard we’ve been working with this year is our *interim* counting solution, and just as we’re “completing” it we’re now in a position to switch from an interim solution to a fully integrated system which is properly integrated with MoCo. This is pretty good timing, but it’s a change in scope for our immediate work so is worth a status update.

Within MoFo we’ve deliberately been working on a Plan B solution. This is a relatively crude, not de-duped, not a single-source-of-truth dashboard, to give us the quickest visibility we could get into contributor trends. But as we’ve noted from the start, this solution keeps in mind Plan A so that we can transition to it when we’re ready.

We started our work with this Plan B because two existing projects were underway that could be our Plan A and we didn’t want to duplicate these efforts and/or sit around idle as we didn’t know for sure when either of these could be used ‘in anger’.

Project Baloo (formerly Wormhole/Blackhole) was the most closely aligned piece of work, but there was also work to issue contributor badges which was another possible place to count the number of people contributing. Both of these projects are complex because they span many teams and systems and the philosophically-quicksand-like question “what counts as contribution?”. So we didn’t know exactly when we could use either of these as our data-store.

As of last week, Baloo is a functioning data-warehouse with working aggregations setup for a number of MoCo teams. Which means we can begin our switch from Plan B to Plan A (or more accurately our *transition* to Plan A, as this will take time).

The dashboard at areweamillionyet.org which was being demo’d with purely github data when we launched it a couple of weeks ago, is now using data from Baloo’s de-duped aggregations across Github, Bugzilla and SUMO activity. For now, the output of this work is sitting in this Google Doc, and graphed with the help of a little node app.

There are many more teams and systems to integrate into Baloo going forward (across MoCo and MoFo), and there’s a fair bit of automation to work out, but this dashboard is a real example of how this can work. This is Plan A in action, and where we can focus our efforts next.

So, for MoFo colleagues, what does this mean in practical terms?:

We should:

  • Continue using the existing dashboard as our working document
  • Continue using the adhoc logger for counting contributions that don’t have a record anywhere else

I will limit the open work on the Interim Dashboard to:

  • Minimal update to the interface
  • Adding in numbers for Webmaker Mentors and Hive members via badges data
  • “Won’t-fixing-ing” some bugs where the effort is now better spent on Plan A

I will file bugs and start working on:

  • Getting raw contribution data into Project Baloo
  • Documenting the conversion points in a way that’s more consistent with MoCo’s
  • Joining up the documentation for this with MoCo’s existing info

When ‘less than the sum of our parts’ is a good thing

areweamillionyetHere’s a happy update about our combined Mozilla Foundation (MoFo) and Mozilla Corporation (MoCo) contributor dashboards.

TL;DR: There’s a demo All Mozilla Contributor Dashboard you can see at areweamillionyet.org

It’s a demo, but it’s also real, and to explain why this is exciting might need a little context.

Since January, I’ve been working on MoFo specific metrics. Mostly because that’s my job, but also because this/these organisations/projects/communities take a little while to understand, and getting to know MoFo was enough to keep me busy.

We also wanted to ship something quickly so we know where we stand against our MoFo goals, even if the data isn’t perfect. That’s what we’ve built in our *interim* dashboard. It’s a non de-duped aggregation of the numbers we could get out of our current systems without building a full integration database. It gives us a sense of scale and shows us trends. While not precise and high resolution yet, this has still been helpful to us. Data can sometimes look obvious once you see it, but before this we were a little blind.

So naturally, we want to make this dashboard as accurate as possible, and the next step is integrating and de-duping the data so we can know if the people who run Webmaker events are the people who commit code, are the people who file bugs, are the people who write articles for Source, are the people who teach Software Carpentry Bootcamps, etc, etc.

The reason we didn’t start this project by building a MoFo integration database, is because MoCo were already working on that. And in less than typical Mozilla style (as I’m coming to understand it), we didn’t just build our own version of this. 😉 (though there might be some scope and value for integrating some of this data within the Foundation anyway, but that’s a separate thought).

The integration database in question is MoCo’s project Baloo, which Pierros, and many people on the MoCo side have been working on. It’s a complex project influenced by more than than just technical requirements. Scoping the system is the point at which many teams are first looking at their contributor data in detail and working out what their indicators of contribution look like.

Our plan is that our MoFo interim dashboard data-source can eventually be swapped out for the de-duped ‘single source of truth’ system, at which point it goes from being a fuzzy-interim thing to a finalized precise thing.

While MoCo and ‘Fo have been taking two different approaches to solving this problem, we’ve not worked in isolation. We meet regularly, follow each other’s progress and have been trying to find the point where these approaches can merge into a unified cross Mozilla solution.

The demo we shipped yesterday was the first point where we’ve joined up this work.

Dial-up modems

I want to throw in an internet based analogy here, for those who remember dial-up modems.

Let’s imagine this image shows us *all* the awesome Mozilla contributors and what they are doing. We want there to be 20k of them in 2014.

unknown

It’s not that we don’t know if we have contributors. We’ve seen individual contributors, and we’ve seen groups of contributors, but we haven’t seen them all in one place yet.

So to continue the dial-up modem analogy, let’s think of this big-picture view of contribution as a large uncompressed JPEG, which has been loading slowly for a few months.

The MoFo interim dashboard has been getting us part of this picture. Our approach has revealed the MoFo half of this picture with slowly increasing resolution and accuracy. It’s like an interlaced JPEG, and is about this accurate so far:

interlaced

The Project Baloo approach is precise and can show MoCo and MoFo data, but adds data source at a time. It’s rolling out like a progressive JPEG. The areweamillionyet.org dashboard demo isn’t using Baloo yet, but the data it’s using is a good representation of how Baloo can work. What you can see in the demo dashboard is a picture like this:

progressive

(Original Photo Credit: Gen Kanai)

About areweamillionyet.org

This is commit data extracted from cross team/org/project repositories via github. Even though code contribution is only one part of the big-picture. Seeing this much of the image tells us things we didn’t know before. It gives us scale, trends and ways to ask questions about how to effectively and intentionally grow the community.

The ‘commit history’ over time is also a fascinating data set, and I’ll follow up with a blog post on that soon.

Less that the sum of our parts? When 5 + 5 = 8

With the goal of 20k active contributors this year, shared between MoCo and MoFo, we’re thinking about 10k active contributors to MoCo and to MoFo. And if we counted each org in isolation we could both say “here’s 10k active contributors”, and this would be a significant achievement. But, if we de-dupe these two sets it would be really worrying if there wasn’t an overlap between the people who contribute to MoCo and the people who contribute to MoFo projects.

Though we want to engage many many individual contributors, I think a good measure of our combined community building effectiveness will be how much these ‘pots’ of contributors overlap. When 10k MoFo contributors + 10k MoCo contributors = 15k combined Mozilla contributors, we should definitely celebrate.

That’s the thing I’m most excited about with regards to joining up the data. Understanding how contributors connect across projects; how they volunteer their time, energy and skills is many different ways, and understanding what ‘Many Voices, One Mozilla’ looks like. When we understand this, we can improve it, for the benefit of the project, and the individuals who care about the mission and want to find ways into the project so they can make a difference.

While legal processes define ‘Corporations’ and ‘Foundations’, the people who volunteer and contribute rarely give a **** about which org ‘owns’ the project they’re contributing too; they just want to build a better internet. Mozilla is bigger than the legal entities. And the legal entities are not what Mozilla is about, they are just one of the necessities to making it work.

So the org dashboards, and team dashboards we’re building can help us day-to-day with tactical and strategic decisions, but we always need to keep them in the context of the bigger picture. Even if the big picture takes a while to download.

Here’s to more cross org collaboration.

Want to read more?

Contributor Dashboard Status Update (‘busy work’?)

While I’m always itching to get on with doing the work that needs doing, I’ve spent this morning writing about it instead. Part of me hates this, but another realizes this is valuable. Especially when you’re working remotely and the project status in your head is of no use to your colleagues scattered around the globe.

So here’s the updated status page on our Mozilla Foundation Contributor Dashboard, and some progress on my ‘working open‘.

Filing bugs, linking them to each other, and editing wiki pages can be tedious work (especially wiki tables that link to bugs!) but the end result is very helpful, for me as well as those following and contributing to the project.

And a hat-tip to Pierros, whose hard-work on the project Baloo wiki page directly inspired the formatting here.

Now, back to doing! 🙂

Progress on Contributor Dashboard(s)

Latest dashboard screenshotWe’re seeing real progress getting data into the MoFo Contributor Dashboard now, but we need to keep in mind that counting existing contributors and engaging new contributors are two separate tasks that will help us move this line upwards.

The gains we are seeing right now are counting gains rather than contribution gains.

Getting this dashboard fully populated with our existing contributor numbers will be an achievement, but growing our contributor numbers is the real end goal of this work.

40-50% done?

Using our back-of-the-napkin numbers from 2013 as a guide the current data sources shown on the dashboard today capture about 40% of the numbers we’re expecting to see here. Depending on how good our estimates were, and how many new contributors have joined in Q1, we expect this will be near the 5k mark by the time it’s all hooked up.

Ad-hoc contribution?

AdhoctributionWe think about 10% of the contribution we want to count doesn’t currently exist in any database, so I’ve written a simple tool for logging this activity which will feed into this dashboard automatically.

MoFo staff can play with this tool now to test out the UX, but we can’t start logging data with this until it’s been through a security and privacy review (as this will store email addresses). Follow progress on the security review here.

The next biggest pot of data? Badges.

badgesA significant chunk of the remaining 50% of current contributors we want to count, and a significant number of the new contributors we expect to engage this year will be acknowledged by issuing badges.

This starts with Webmaker Super Mentor badges that will be issued through Badgekit.

My next task here is to work with the Badges team to expose a metrics API endpoint that lets us count the number of people we issue particular badges too.  Initial thinking on this is here.

Along with the tooling to hook up the data, the badges will also need to be designed and issued before these contributors are counted.

Tracking progress?

The dashboard status wiki page is the best place to track current progress about data sources being added (this is linked from the top of the live dashboard). Also within that wiki page is a link to a Working Document with even more detail.

Trending with caution?

For some of the things we’re displaying on this dashboard, we are logging them now for the very first time, even if contributors may have been contributing for a while. This will unavoidably skew the graph in the short term.

This means our latest Active Contributor numbers will be meaningful, but the rate at which it increases may be misleading while we start counting some of these things for the first time.

Initial upward trends may slow as we finish logging existing contributors for the first time, though this may also be balanced out by the real increases that will occur as we engage new contributors.

What’s coming next?

Today, I’m working on front-end dashboards for each of the MoFo teams which may be more helpful on a week-to-week basis in team meetings.

This is only interim?

This dashboard is starting to look useful. For instance, we have the beginnings of our first contributor trend line (and the good news is it’s going upwards). But this is still only our interim solution.

We are adding up counts from a number of different data-sources but not joining up the data. This data is not de-duped and when the data sources are all joined up we will need to add potential margin-of-error warnings to the final graph (the real numbers will be smaller than our graph).

This interim solution is a deliberate decision because it’s more useful to have non-de-duped data now to see trends quick enough to inform our plans for the year than it is to have everything joined up perfectly too late in the year to adjust our activities.

In parallel to this interim solution, we are working with MoCo on Project Baloo which will allow us to de-dupe contributors for more accurate counting in the long run.

Want to know more? (Or get involved)

Here are the components that we have wired together so far:

More updates to follow.

Who’s teaching this thing anyway?

This is an idea for Webmaker teacher dashboards, and some thoughts on metadata related to learning analytics

This post stems from a few conversations around metrics for Webmaker and learning analytics and it proposes some potential product features which need to be challenged and considered. I’m sharing the idea here as it’s easy to pass this around, but this is very much just an idea right now.

For context, I’m approaching this from a metrics perspective, but I’m trying to solve the data gathering challenge by adding value for our users rather than asking them to do any extra work.

These are the kind of questions I want us to be able to answer

and that can inform future decision making in a positive way…

  • How many people using Webmaker tools are mentors, students, or others?
  • Do mentors teach many times?
  • How many learners go on to become mentors?
  • What size groups do mentors typically work with?
  • How many mentors teach once, and then never again? (their feedback would be very useful)
  • How many learners come back to Webmaker tools several days after a lesson?
  • Which partnership programme reached the greatest number of learners?

And the particularly tricky area…

  • What data points show developing competencies in Web Literacy?

Flexible and organic data points to suit the Webmaker ecosystem

The Webmaker suite of tools are very open and flexible and as a result get used by people for many different things. Which personally, I like a lot. However, this also makes understanding our users more difficult.

When looking at the data, how can we tell if a new Thimble Make has come from a teacher, a student, or even an experienced web developer who works at Mozilla and is using the tool to publish their birthday wishes to the web? The waters here are muddy.

We need a few additional indicators in the data to analyze it in a meaningful way, but these indicators have to work with the informal teaching models and practices that exist in the Webmaker ecosystem.

On the grounds that everyone has both something to teach and to learn, and that we want trainers to train trainers and so on, I propose that asking people to self-identify as mentors via a survey/check-box/preferences/etc will not yield accurate flags in the data.

The journey to identifying yourself as a mentor is personal and complex, and though that process is immensely interesting, there are simpler things we can measure.

The simplest measure is that someone who teaches something is a teacher. That sounds obvious, but it’s very slightly different from someone who thinks of themselves as a teacher.

If we build a really useful tool for teaching (I’m suggesting one idea below) and its use identifies Webmaker accounts as teacher(s) and/or learner(s) then we’d have useful metadata to answer almost all of those questions asked above.

When we know who the learners are we can better understand what learning looks like in terms of data (a crucial step in conversations about learning analytics).

If anyone can use this proposed tool as part of their teaching process, and students can engage with it as students. Then anyone can teach, or attend a lesson in any order without having to update their account records to say “I first attended a Maker Party, then I taught a session on remixing for the web, and now I’m learning about CSS and next I want to teach about Privacy”.

A solution like this doesn’t need 100% use by all teachers and learners to be useful (which helps the solution remain flexible if it doesn’t suit). It just needs enough people to use it to use it that we have a meaningful sample of Webmaker teachers and learners flagged in the database.

With a decent sample we can see what teaching with Webmaker looks like at scale. And with this kind of data, continually improve the offering.

An idea: ‘Teacher Lesson Dashboards’

I think Teacher Lesson Dashboards would catch the metadata we need, and I’ll sketch this out here. Don’t get stuck on any naming I’ve made up right now, the general process for the teacher and the learner is the main thing to consider.

1. Starting with a teacher/mentor

User logs in to Webmaker.org

Clicks an option to “Create a new Lesson”

Gets an interface to ‘build-up’ a Lesson (a curation exercise)

Adds starter makes to the lesson (by searching for their own and/or others makes)

e.g. A ‘Lesson’ might include:

  • A teaching kit with discussion points, and a link to X-ray goggles demo
  • A thimble make for students to remix
  • A (deliberately) broken thimble make for students to try and debug
  • A popcorn make to remix and report back what they have learned

They give their lesson a name

Add optional text and an image for the lesson

Save their new Lesson, and get a friendly short URL

Then point students to this at the beginning of the teaching session

2. The learner(s) then…

Go the URL the mentor provides

Optionally, check-in to the lesson (and create a Webmaker account at the same time if required)

Have all the makes and activities they need in one place to get started

One click to view or remix any make in the Lesson

Can reference any written text to support the lesson

3. Then, going back to the mentor

Each ‘Lesson’ also has a dashboard showing:

  • Who has checked-in to the lesson
    • with quick links to their most recent makes
    • links to their public profile pages
    • Perhaps integrating together.js functionality if you’re running a lesson remotely?
  • Metrics that help with teaching (this is a whole other conversation, but it depends first on being able to identify who is teaching who)
  • Feedback on future makes created after the lesson (i.e. look what your session led to further down the line)

4. And to note…

‘Lessons’ as a kind of curated make, can also me remixed and shared in some way.

Useful?

I’m not on the front-lines using the tools right now, so this is a proposal very much from a person who wants flags in a database 🙂

  • Does this feel like it adds value to mentors and/or learners?
  • Do you think is a good way to identify who’s teaching and who’s learning? (and who’s doing both of course)

 

What I see in these graphs of Github contribution

Context: Last week I shared a few graphs (1, 2, 3, 4) looking at data from our repositories on Github, extracted using this Gitribution app thing, as part of our work to dashboard contributor numbers for the Mozilla Foundation.

I didn’t comment on the graphs at the time because I wanted time for others to look at them without my opinions skewing what they might see. This follow up post is a walk-through of some things I see in the graphs/data.

The real value in looking at data is finding ways to make things better by challenging ourselves, and being honest about what the numbers show, so this will be as much about questions as answers…

Also, publishing this last week flagged up some missing repositories and identified some other members of staff so these graphs are based on the latest version of the data (there was no impact on shapes, but some numbers will be different).

What time of day do people contribute (UTC)?

By Hour of DayOur paid staff who are committing code are mostly in US/Canadian timezones and it make sense that most of their commits are during these hours (graphed by UTC). But, what caught my attention here is that the volunteer contribution times follow the same shape.

Questions to ask:

  • Do volunteer contributions follow the same shape because contributing code has a dependency on being able to talk in real time with staff? For example in IRC. If so, is this a bottleneck for contributing code?
  • If not, what is creating this shape for volunteer contributors? Perhaps it’s biased to timezones where more people are interested in the things we are building, and potentially biased by language? But looking at support for Maker Party and other activities there is a global audience for our tools.
  • What does a code contribution pathway look like for people in the 0300-1300UTC times? Is there anything we can do to make things easier or more appealing?

The shape of volunteer contributions

ShapeThe shape of this graph is pretty typical for any kind of volunteering or activity involving human interactions. It’s close to a power law graph with a long-tail.

If you’ve not looked at a data set like this before, don’t panic that so many people only make a single contribution. At the same time, don’t use the knowledge that this is typical not to ask questions about how we can be better.

Lots of people want to get involved in volunteering projects but often their good intentions don’t align with their actual available free time. I say this as someone who signs up for more things than fit into my available hours for personal projects.

The two questions I want to ask of this graph are:

  1. Where could our efforts to support contributors best influence the overall shape?
  2. What does this look like at 10 x scale?

So, starting with where we could influence shape… my opinion (no data here) says to think about people in this range.Shape HighlightTo the left of this highlighted area people are already making code contributions over and above even many staff. Shower them in endless gratitude! But I don’t think they don’t need practical help from us.  To the right of this highlighted area is the natural long tail. Supporting that bigger group of people for single-touch interactions is about clear documentation and easy to follow processes. But I think the group of people roughly highlighted in that graph are people we can reach out to. These people potentially have capacity to do more. We should find out what they are interested in, what they want to get out of contribution and build relationships with them. In practical terms, we have finite time to invest in direct relationships with contributors. I think this is an effective place to invest some of that time.

I think the second question is more  challenging. What does this look like at 10 x scale?

In 2013, ~50 people made a one-time contribution.

  • What do we need in place for 500 people to make a one-time code contribution?
  • Do we have 500 suitable ‘first’ bugs for 2014?
  • Is the amount of setup work required to contribute to our tools appropriate for people making a single contribution?
  • If not, is that a blocker to growing contributor numbers?

In 2013, there were ~1,500 code commits by volunteers.

  • What do we need in place for 15,000 activities on top of planned staff activity?
  • How does this much activity align towards a common product roadmap?
  • How is it scheduled, allocated, reviewed and shipped?

When planning to work with 10 x contributor numbers, possibly the biggest shift to consider is the ratio of staff to volunteers:

ContributorRatio

  • How does impact on time allocated for code reviews?
  • How do we write bugs?
  • How we prioritize bugs? Etc.
  • Even, what does an IRC channel or a dev maling list look like after this change?

Other questions to ask:

  • What do we think is the current ‘ceiling’ on our contributor numbers for people writing code?
    • Is it the number of developers who know about our tools and want to help? (i.e. a ‘marketing’ challenge to inspire more people)
    • Is it the amount of suitable work ready and available for people who want to help? (are we losing people who want to help because it’s too hard to get involved?)
    • Both? With any bias?

 What do you think?

I’m only one set of eyes on this, so please challenge my observations and feel free to build on this too.

Also, as the data in here is publicly accessible already I think I can publish this Tableau view as an interactive tool you can play with, but I need to check the terms first.

Gitribution

Click to embiggen. This was a check to see how well being a member of the github organisation flags someone as being staff.
Click to embiggen. How well does being a member of a github organisation flag someone as being staff?

Over the last week or so I’ve been building a thing: Gitribution. It’s an attempt to understand contributions to Mozilla Foundation work that happen on Github. It’s not perfect yet, but it’s in a state to get feedback on now.

Why did I build this?

For these reasons (in this order):

  1. Counting: To extract counts of contributor numbers from Github across Foundation projects on an automated ongoing basis
  2. Testing: To demo the API format we need for other sources of data to power our interim contributor dashboard
  3. Learning: To learn a bit about node.js so I can support metrics work on other projects more directly when it’s helpful to (i.e. submitting pull-requests rather than just opening bugs)

1. Counting

The data in this tool is all public data from the Github API, but it’s been restructured so it can be queried in ways that answer questions specific to our goals this year, and has some additional categorization of repositories to query against individual teams. The Github API on it’s own couldn’t answer our questions directly.

This also gives me data in a format that can be explored visually in Tableau (I’ll share this in a follow up blog post). We can now count Github contributors, and also analyze contributions.

2. Testing

Part of our interim dashboard plans include a standard format for reporting on numbers of active and new contributors for a given activity. Building this tool was a way to test if that format makes sense. The output is an API that you can ping with a date and see:

  1. The number of unique usernames to contribute in the 12 months prior (excluding those users who are members of the Github organization that owns the repositories – ie Mozilla or openNews)
  2. The number of those who only contributed in the 7 days prior (i.e. new contributors)

You can test the API here (change the date, or the team names – currently webmaker, openbadge, openNews)

We can use this in the dashboard soon.

Learning

I know a lot more about node.js than I did last week. So that’s something 🙂

I started out writing this as though it was a python app using JavaScript syntax before grasping the full implications of node’s non-blocking model.

I descended into what I later found out is called callback hell and felt much better when I learned that callback hell is a shared experience!

I tried an extreme escape from callback hell by re-building the app in a fire-and-forget process that kicked off several thousand pings to the Github API and paid no attention to whether or not they succeeded (clearly not a winning solution).

And I’ve ended up with something that isn’t too hellish but uses callbacks to manage the process flow. The current process is pretty linear, so I was able to sense check what it’s doing but it also works mostly on one task at a time so isn’t getting the potential value out of node’s non-blocking model.

Next steps

  • Tweaks to the categorization of ‘members=staff’
    • See the attached image of contributions by username. There are some members of staff with many contributions who are not members of Mozilla on Github. This is not material when counting number of contributors in relation to targets, but when we analyze contribution activity those users with a lot of contributions skew the data significantly.
  • Check and correct the list of repos assigned to each team
    • Currently a best guess based on my limited knowledge and some time trawling through all the repos on the main Mozilla Github page
  • Work out how to use this with Science Lab projects
    • as Software Carpentry use Github as part of their training (which I love) it means the data in their account doesn’t represent the same kinds of activities in the other repos. I need to think about this.
  • Pick the brains of my knowledgeable colleagues and get a review of this code

What else is this good for?

The immediate value of working in the open

I’m both excited and a tiny bit nervous about how “open” Mozilla are about the way they work.

As I’m getting to know the Foundation, and the projects and priorities, and to make sense of what exactly I’ll be doing here I’ve been reading lots of Etherpads. If you don’t know what an Etherpad is, it’s a bit like a Google Doc (the ‘word doc’ variety) but less slick and more open. If you give someone a link to an Etherpad, the barrier to them contributing to the document is almost non-existent.

Anyway, the value of this open working process somewhat blew my mind today. While lots of these docs have been useful in a general sense, today I read the documents from the initial planning around MoFo metrics that led to recruiting for my role (so it was pretty relevant!). The final document is a fine and very useful thing, like most summary documents, but what was really useful was the option to ‘replay’ the creation of the doc.

As I watched it being outlined, revised and then shared for comment I was able to see many of my new colleagues jump in to add the points they care most about and to challenge and contribute to the rest of the document.  Better than just seeing the final document is seeing how it started and where it changed direction and who was involved.

Even watching the hesitations and re-phrasing of particular sentences tells you something about where the subtleties and complications in the process exist.

I probably spent 15-20 minutes watching the replay of the writing of this document, and think I got more indirect information than I would otherwise pick up in even two-weeks of introduction meetings.

There is so much information in the history of that document that would be lost in any closed system. If for example, that document was a PDF on an intranet behind a password, the value I would have gotten from it as a new employee today would have be greatly reduced.