What I see in these graphs of Github contribution

Context: Last week I shared a few graphs (1, 2, 3, 4) looking at data from our repositories on Github, extracted using this Gitribution app thing, as part of our work to dashboard contributor numbers for the Mozilla Foundation.

I didn’t comment on the graphs at the time because I wanted time for others to look at them without my opinions skewing what they might see. This follow up post is a walk-through of some things I see in the graphs/data.

The real value in looking at data is finding ways to make things better by challenging ourselves, and being honest about what the numbers show, so this will be as much about questions as answers…

Also, publishing this last week flagged up some missing repositories and identified some other members of staff so these graphs are based on the latest version of the data (there was no impact on shapes, but some numbers will be different).

What time of day do people contribute (UTC)?

By Hour of DayOur paid staff who are committing code are mostly in US/Canadian timezones and it make sense that most of their commits are during these hours (graphed by UTC). But, what caught my attention here is that the volunteer contribution times follow the same shape.

Questions to ask:

  • Do volunteer contributions follow the same shape because contributing code has a dependency on being able to talk in real time with staff? For example in IRC. If so, is this a bottleneck for contributing code?
  • If not, what is creating this shape for volunteer contributors? Perhaps it’s biased to timezones where more people are interested in the things we are building, and potentially biased by language? But looking at support for Maker Party and other activities there is a global audience for our tools.
  • What does a code contribution pathway look like for people in the 0300-1300UTC times? Is there anything we can do to make things easier or more appealing?

The shape of volunteer contributions

ShapeThe shape of this graph is pretty typical for any kind of volunteering or activity involving human interactions. It’s close to a power law graph with a long-tail.

If you’ve not looked at a data set like this before, don’t panic that so many people only make a single contribution. At the same time, don’t use the knowledge that this is typical not to ask questions about how we can be better.

Lots of people want to get involved in volunteering projects but often their good intentions don’t align with their actual available free time. I say this as someone who signs up for more things than fit into my available hours for personal projects.

The two questions I want to ask of this graph are:

  1. Where could our efforts to support contributors best influence the overall shape?
  2. What does this look like at 10 x scale?

So, starting with where we could influence shape… my opinion (no data here) says to think about people in this range.Shape HighlightTo the left of this highlighted area people are already making code contributions over and above even many staff. Shower them in endless gratitude! But I don’t think they don’t need practical help from us.  To the right of this highlighted area is the natural long tail. Supporting that bigger group of people for single-touch interactions is about clear documentation and easy to follow processes. But I think the group of people roughly highlighted in that graph are people we can reach out to. These people potentially have capacity to do more. We should find out what they are interested in, what they want to get out of contribution and build relationships with them. In practical terms, we have finite time to invest in direct relationships with contributors. I think this is an effective place to invest some of that time.

I think the second question is more  challenging. What does this look like at 10 x scale?

In 2013, ~50 people made a one-time contribution.

  • What do we need in place for 500 people to make a one-time code contribution?
  • Do we have 500 suitable ‘first’ bugs for 2014?
  • Is the amount of setup work required to contribute to our tools appropriate for people making a single contribution?
  • If not, is that a blocker to growing contributor numbers?

In 2013, there were ~1,500 code commits by volunteers.

  • What do we need in place for 15,000 activities on top of planned staff activity?
  • How does this much activity align towards a common product roadmap?
  • How is it scheduled, allocated, reviewed and shipped?

When planning to work with 10 x contributor numbers, possibly the biggest shift to consider is the ratio of staff to volunteers:

ContributorRatio

  • How does impact on time allocated for code reviews?
  • How do we write bugs?
  • How we prioritize bugs? Etc.
  • Even, what does an IRC channel or a dev maling list look like after this change?

Other questions to ask:

  • What do we think is the current ‘ceiling’ on our contributor numbers for people writing code?
    • Is it the number of developers who know about our tools and want to help? (i.e. a ‘marketing’ challenge to inspire more people)
    • Is it the amount of suitable work ready and available for people who want to help? (are we losing people who want to help because it’s too hard to get involved?)
    • Both? With any bias?

 What do you think?

I’m only one set of eyes on this, so please challenge my observations and feel free to build on this too.

Also, as the data in here is publicly accessible already I think I can publish this Tableau view as an interactive tool you can play with, but I need to check the terms first.

Contribution Graphs part 4: Contributions by Contributors over time

I’m posting a quick series of these without much comment on my part as I’d love to know what you see in each of them.

This is looking at activity in Github (commits and issues), for the repositories listed here. It’s an initial dive into the data, so don’t be afraid to ask questions of it, or request other cuts of this. In the not so distant future, we’ll be able to look at this kind of data across our combined contribution activities, so this is a bit of a taster.

Click for the full-size images.

Contributions by Contributors over time

Last but not least for today, I think there are some stories in this one…

Contributions by Contributors over Time

Is anything here a surprise? What do you see in this?

Contribution Graphs part 3: Distribution of contributions

I’m posting a quick series of these without much comment on my part as I’d love to know what you see in each of them.

This is looking at activity in Github (commits and issues), for the repositories listed here. It’s an initial dive into the data, so don’t be afraid to ask questions of it, or request other cuts of this. In the not so distant future, we’ll be able to look at this kind of data across our combined contribution activities, so this is a bit of a taster.

Click for the full-size images.

Distribution of contributions (excluding staff work)

Here are a couple of ways of visualizing this same data.

Distribution 2Distribution 1

Is anything here a surprise? What do you see in this?

Contribution Graphs part 2: By hour of the day

I’m posting a quick series of these without much comment on my part as I’d love to know what you see in each of them.

This is looking at activity in Github (commits and issues), for the repositories listed here. It’s an initial dive into the data, so don’t be afraid to ask questions of it, or request other cuts of this. In the not so distant future, we’ll be able to look at this kind of data across our combined contribution activities, so this is a bit of a taster.

Click for the full-size images.

By hour of the day

By hour of the day

Is anything here a surprise? What do you see in this?

Contribution Graphs part 1: Contributions over time

I’m posting a quick series of these without much comment on my part as I’d love to know what you see in each of them.

This is looking at activity in Github (commits and issues), for the repositories listed here. It’s an initial dive into the data, so don’t be afraid to ask questions of it, or request other cuts of this. In the not so distant future, we’ll be able to look at this kind of data across our combined contribution activities, so this is a bit of a taster.

Click for the full-size images.

Contributions over time

1 combined Over time

Broken down by teams

2 By team

Broken down further by repository

3 By Repo

Is anything here a surprise? What do you see in this?