Progress on Contributor Dashboard(s)

Latest dashboard screenshotWe’re seeing real progress getting data into the MoFo Contributor Dashboard now, but we need to keep in mind that counting existing contributors and engaging new contributors are two separate tasks that will help us move this line upwards.

The gains we are seeing right now are counting gains rather than contribution gains.

Getting this dashboard fully populated with our existing contributor numbers will be an achievement, but growing our contributor numbers is the real end goal of this work.

40-50% done?

Using our back-of-the-napkin numbers from 2013 as a guide the current data sources shown on the dashboard today capture about 40% of the numbers we’re expecting to see here. Depending on how good our estimates were, and how many new contributors have joined in Q1, we expect this will be near the 5k mark by the time it’s all hooked up.

Ad-hoc contribution?

AdhoctributionWe think about 10% of the contribution we want to count doesn’t currently exist in any database, so I’ve written a simple tool for logging this activity which will feed into this dashboard automatically.

MoFo staff can play with this tool now to test out the UX, but we can’t start logging data with this until it’s been through a security and privacy review (as this will store email addresses). Follow progress on the security review here.

The next biggest pot of data? Badges.

badgesA significant chunk of the remaining 50% of current contributors we want to count, and a significant number of the new contributors we expect to engage this year will be acknowledged by issuing badges.

This starts with Webmaker Super Mentor badges that will be issued through Badgekit.

My next task here is to work with the Badges team to expose a metrics API endpoint that lets us count the number of people we issue particular badges too.  Initial thinking on this is here.

Along with the tooling to hook up the data, the badges will also need to be designed and issued before these contributors are counted.

Tracking progress?

The dashboard status wiki page is the best place to track current progress about data sources being added (this is linked from the top of the live dashboard). Also within that wiki page is a link to a Working Document with even more detail.

Trending with caution?

For some of the things we’re displaying on this dashboard, we are logging them now for the very first time, even if contributors may have been contributing for a while. This will unavoidably skew the graph in the short term.

This means our latest Active Contributor numbers will be meaningful, but the rate at which it increases may be misleading while we start counting some of these things for the first time.

Initial upward trends may slow as we finish logging existing contributors for the first time, though this may also be balanced out by the real increases that will occur as we engage new contributors.

What’s coming next?

Today, I’m working on front-end dashboards for each of the MoFo teams which may be more helpful on a week-to-week basis in team meetings.

This is only interim?

This dashboard is starting to look useful. For instance, we have the beginnings of our first contributor trend line (and the good news is it’s going upwards). But this is still only our interim solution.

We are adding up counts from a number of different data-sources but not joining up the data. This data is not de-duped and when the data sources are all joined up we will need to add potential margin-of-error warnings to the final graph (the real numbers will be smaller than our graph).

This interim solution is a deliberate decision because it’s more useful to have non-de-duped data now to see trends quick enough to inform our plans for the year than it is to have everything joined up perfectly too late in the year to adjust our activities.

In parallel to this interim solution, we are working with MoCo on Project Baloo which will allow us to de-dupe contributors for more accurate counting in the long run.

Want to know more? (Or get involved)

Here are the components that we have wired together so far:

More updates to follow.

A quick update on the interim contributor dashboard

I’ve just updated the main wiki page tracking our contributor dashboard project, so I won’t repeat everything here.

The quick update is that the puzzle pieces that will make our interim contributor dashboard work are coming together now.

Which means we have a live dashboard front-end, (with a few data-holes we need to plug!). This screenshot is just data from Github.

It may be missing loads of data, but what's there is real and updating automatically :)

It may be missing loads of data, but what’s there is real and updating automatically :)

Let’s gather some more numbers…

Who’s teaching this thing anyway?

This is an idea for Webmaker teacher dashboards, and some thoughts on metadata related to learning analytics

This post stems from a few conversations around metrics for Webmaker and learning analytics and it proposes some potential product features which need to be challenged and considered. I’m sharing the idea here as it’s easy to pass this around, but this is very much just an idea right now.

For context, I’m approaching this from a metrics perspective, but I’m trying to solve the data gathering challenge by adding value for our users rather than asking them to do any extra work.

These are the kind of questions I want us to be able to answer

and that can inform future decision making in a positive way…

  • How many people using Webmaker tools are mentors, students, or others?
  • Do mentors teach many times?
  • How many learners go on to become mentors?
  • What size groups do mentors typically work with?
  • How many mentors teach once, and then never again? (their feedback would be very useful)
  • How many learners come back to Webmaker tools several days after a lesson?
  • Which partnership programme reached the greatest number of learners?

And the particularly tricky area…

  • What data points show developing competencies in Web Literacy?

Flexible and organic data points to suit the Webmaker ecosystem

The Webmaker suite of tools are very open and flexible and as a result get used by people for many different things. Which personally, I like a lot. However, this also makes understanding our users more difficult.

When looking at the data, how can we tell if a new Thimble Make has come from a teacher, a student, or even an experienced web developer who works at Mozilla and is using the tool to publish their birthday wishes to the web? The waters here are muddy.

We need a few additional indicators in the data to analyze it in a meaningful way, but these indicators have to work with the informal teaching models and practices that exist in the Webmaker ecosystem.

On the grounds that everyone has both something to teach and to learn, and that we want trainers to train trainers and so on, I propose that asking people to self-identify as mentors via a survey/check-box/preferences/etc will not yield accurate flags in the data.

The journey to identifying yourself as a mentor is personal and complex, and though that process is immensely interesting, there are simpler things we can measure.

The simplest measure is that someone who teaches something is a teacher. That sounds obvious, but it’s very slightly different from someone who thinks of themselves as a teacher.

If we build a really useful tool for teaching (I’m suggesting one idea below) and its use identifies Webmaker accounts as teacher(s) and/or learner(s) then we’d have useful metadata to answer almost all of those questions asked above.

When we know who the learners are we can better understand what learning looks like in terms of data (a crucial step in conversations about learning analytics).

If anyone can use this proposed tool as part of their teaching process, and students can engage with it as students. Then anyone can teach, or attend a lesson in any order without having to update their account records to say “I first attended a Maker Party, then I taught a session on remixing for the web, and now I’m learning about CSS and next I want to teach about Privacy”.

A solution like this doesn’t need 100% use by all teachers and learners to be useful (which helps the solution remain flexible if it doesn’t suit). It just needs enough people to use it to use it that we have a meaningful sample of Webmaker teachers and learners flagged in the database.

With a decent sample we can see what teaching with Webmaker looks like at scale. And with this kind of data, continually improve the offering.

An idea: ‘Teacher Lesson Dashboards’

I think Teacher Lesson Dashboards would catch the metadata we need, and I’ll sketch this out here. Don’t get stuck on any naming I’ve made up right now, the general process for the teacher and the learner is the main thing to consider.

1. Starting with a teacher/mentor

User logs in to Webmaker.org

Clicks an option to “Create a new Lesson”

Gets an interface to ‘build-up’ a Lesson (a curation exercise)

Adds starter makes to the lesson (by searching for their own and/or others makes)

e.g. A ‘Lesson’ might include:

  • A teaching kit with discussion points, and a link to X-ray goggles demo
  • A thimble make for students to remix
  • A (deliberately) broken thimble make for students to try and debug
  • A popcorn make to remix and report back what they have learned

They give their lesson a name

Add optional text and an image for the lesson

Save their new Lesson, and get a friendly short URL

Then point students to this at the beginning of the teaching session

2. The learner(s) then…

Go the URL the mentor provides

Optionally, check-in to the lesson (and create a Webmaker account at the same time if required)

Have all the makes and activities they need in one place to get started

One click to view or remix any make in the Lesson

Can reference any written text to support the lesson

3. Then, going back to the mentor

Each ‘Lesson’ also has a dashboard showing:

  • Who has checked-in to the lesson
    • with quick links to their most recent makes
    • links to their public profile pages
    • Perhaps integrating together.js functionality if you’re running a lesson remotely?
  • Metrics that help with teaching (this is a whole other conversation, but it depends first on being able to identify who is teaching who)
  • Feedback on future makes created after the lesson (i.e. look what your session led to further down the line)

4. And to note…

‘Lessons’ as a kind of curated make, can also me remixed and shared in some way.

Useful?

I’m not on the front-lines using the tools right now, so this is a proposal very much from a person who wants flags in a database :)

  • Does this feel like it adds value to mentors and/or learners?
  • Do you think is a good way to identify who’s teaching and who’s learning? (and who’s doing both of course)

 

What I see in these graphs of Github contribution

Context: Last week I shared a few graphs (1, 2, 3, 4) looking at data from our repositories on Github, extracted using this Gitribution app thing, as part of our work to dashboard contributor numbers for the Mozilla Foundation.

I didn’t comment on the graphs at the time because I wanted time for others to look at them without my opinions skewing what they might see. This follow up post is a walk-through of some things I see in the graphs/data.

The real value in looking at data is finding ways to make things better by challenging ourselves, and being honest about what the numbers show, so this will be as much about questions as answers…

Also, publishing this last week flagged up some missing repositories and identified some other members of staff so these graphs are based on the latest version of the data (there was no impact on shapes, but some numbers will be different).

What time of day do people contribute (UTC)?

By Hour of DayOur paid staff who are committing code are mostly in US/Canadian timezones and it make sense that most of their commits are during these hours (graphed by UTC). But, what caught my attention here is that the volunteer contribution times follow the same shape.

Questions to ask:

  • Do volunteer contributions follow the same shape because contributing code has a dependency on being able to talk in real time with staff? For example in IRC. If so, is this a bottleneck for contributing code?
  • If not, what is creating this shape for volunteer contributors? Perhaps it’s biased to timezones where more people are interested in the things we are building, and potentially biased by language? But looking at support for Maker Party and other activities there is a global audience for our tools.
  • What does a code contribution pathway look like for people in the 0300-1300UTC times? Is there anything we can do to make things easier or more appealing?

The shape of volunteer contributions

ShapeThe shape of this graph is pretty typical for any kind of volunteering or activity involving human interactions. It’s close to a power law graph with a long-tail.

If you’ve not looked at a data set like this before, don’t panic that so many people only make a single contribution. At the same time, don’t use the knowledge that this is typical not to ask questions about how we can be better.

Lots of people want to get involved in volunteering projects but often their good intentions don’t align with their actual available free time. I say this as someone who signs up for more things than fit into my available hours for personal projects.

The two questions I want to ask of this graph are:

  1. Where could our efforts to support contributors best influence the overall shape?
  2. What does this look like at 10 x scale?

So, starting with where we could influence shape… my opinion (no data here) says to think about people in this range.Shape HighlightTo the left of this highlighted area people are already making code contributions over and above even many staff. Shower them in endless gratitude! But I don’t think they don’t need practical help from us.  To the right of this highlighted area is the natural long tail. Supporting that bigger group of people for single-touch interactions is about clear documentation and easy to follow processes. But I think the group of people roughly highlighted in that graph are people we can reach out to. These people potentially have capacity to do more. We should find out what they are interested in, what they want to get out of contribution and build relationships with them. In practical terms, we have finite time to invest in direct relationships with contributors. I think this is an effective place to invest some of that time.

I think the second question is more  challenging. What does this look like at 10 x scale?

In 2013, ~50 people made a one-time contribution.

  • What do we need in place for 500 people to make a one-time code contribution?
  • Do we have 500 suitable ‘first’ bugs for 2014?
  • Is the amount of setup work required to contribute to our tools appropriate for people making a single contribution?
  • If not, is that a blocker to growing contributor numbers?

In 2013, there were ~1,500 code commits by volunteers.

  • What do we need in place for 15,000 activities on top of planned staff activity?
  • How does this much activity align towards a common product roadmap?
  • How is it scheduled, allocated, reviewed and shipped?

When planning to work with 10 x contributor numbers, possibly the biggest shift to consider is the ratio of staff to volunteers:

ContributorRatio

  • How does impact on time allocated for code reviews?
  • How do we write bugs?
  • How we prioritize bugs? Etc.
  • Even, what does an IRC channel or a dev maling list look like after this change?

Other questions to ask:

  • What do we think is the current ‘ceiling’ on our contributor numbers for people writing code?
    • Is it the number of developers who know about our tools and want to help? (i.e. a ‘marketing’ challenge to inspire more people)
    • Is it the amount of suitable work ready and available for people who want to help? (are we losing people who want to help because it’s too hard to get involved?)
    • Both? With any bias?

 What do you think?

I’m only one set of eyes on this, so please challenge my observations and feel free to build on this too.

Also, as the data in here is publicly accessible already I think I can publish this Tableau view as an interactive tool you can play with, but I need to check the terms first.

Contribution Graphs part 4: Contributions by Contributors over time

I’m posting a quick series of these without much comment on my part as I’d love to know what you see in each of them.

This is looking at activity in Github (commits and issues), for the repositories listed here. It’s an initial dive into the data, so don’t be afraid to ask questions of it, or request other cuts of this. In the not so distant future, we’ll be able to look at this kind of data across our combined contribution activities, so this is a bit of a taster.

Click for the full-size images.

Contributions by Contributors over time

Last but not least for today, I think there are some stories in this one…

Contributions by Contributors over Time

Is anything here a surprise? What do you see in this?

Contribution Graphs part 3: Distribution of contributions

I’m posting a quick series of these without much comment on my part as I’d love to know what you see in each of them.

This is looking at activity in Github (commits and issues), for the repositories listed here. It’s an initial dive into the data, so don’t be afraid to ask questions of it, or request other cuts of this. In the not so distant future, we’ll be able to look at this kind of data across our combined contribution activities, so this is a bit of a taster.

Click for the full-size images.

Distribution of contributions (excluding staff work)

Here are a couple of ways of visualizing this same data.

Distribution 2Distribution 1

Is anything here a surprise? What do you see in this?

Contribution Graphs part 2: By hour of the day

I’m posting a quick series of these without much comment on my part as I’d love to know what you see in each of them.

This is looking at activity in Github (commits and issues), for the repositories listed here. It’s an initial dive into the data, so don’t be afraid to ask questions of it, or request other cuts of this. In the not so distant future, we’ll be able to look at this kind of data across our combined contribution activities, so this is a bit of a taster.

Click for the full-size images.

By hour of the day

By hour of the day

Is anything here a surprise? What do you see in this?

Contribution Graphs part 1: Contributions over time

I’m posting a quick series of these without much comment on my part as I’d love to know what you see in each of them.

This is looking at activity in Github (commits and issues), for the repositories listed here. It’s an initial dive into the data, so don’t be afraid to ask questions of it, or request other cuts of this. In the not so distant future, we’ll be able to look at this kind of data across our combined contribution activities, so this is a bit of a taster.

Click for the full-size images.

Contributions over time

1 combined Over time

Broken down by teams

2 By team

Broken down further by repository

3 By Repo

Is anything here a surprise? What do you see in this?

Is being a member of the mozilla ‘organization’ on github a good proxy indicator of being staff?

Following on from the post about Gitribution, these are my notes around my initial exploration of the data extracted from Github.

One of the challenges of counting volunteer contributors to Mozilla is working out who is a volunteer and who is paid-staff. The concept of a volunteer contributor in itself is full of complications, as paid staff will volunteer their free time on other projects they care about, and contributors become employees, or employees will work using their personal email addresses and so on. The fidelity of tracking that would be required to *perfectly* identify when someone does something on a ‘voluntary’ basis would not be proportionate to the impact this would have on the usefulness of the final reporting. So perfect tracking is not the goal here.

My first pass at filtering out staff from contributor counts on github was to look at whether someone is a member of the mozilla organization on github. I thought this would be a good proxy for ‘staff’, and doing this gave us this breakdown:

Without manually checking usernames, this is how the data is split between staff and contributors

Without manually checking usernames, this is how the contribution counts are split between staff and contributors

However, in this non-staff contributor segment of the data, there are a few names I know are definitely staff, and as I don’t know all of Mozilla’s staff I assume others in here are staff too.

Some names definitely in the wrong buckets at significant scale

Some names here are definitely in the wrong buckets, with significant contribution numbers linked to them

So, it’s safe to say that the inverse of our question is false. That is: not being a member of the org on github is not a good enough proxy to say someone is not a paid member of staff.

This is less critical when counting the number of people. For example this is the split of volunteers to staff using this github membership status as the proxy measure:

There might be 10 people who technically need to move from the blue to the orange bar, but that's not important if the aim is growing the blue bar 10x without much change to the orange bar.

There might be ~10 people who technically need to move from the blue to the orange bar, but that’s not important if the aim is growing the blue bar 10x without much change to the orange bar.

But if we want to analyze contribution activity (we do!) I need to manually (with a little automation in Tableau) check these github accounts, and add those who are staff to an extra list within Gitribution to cross check when saving the data:

4 Manually Check

These are the most significant accounts to check for people who are staff

Getting back to the original question… Is being a member of the mozilla ‘organization’ on github a good proxy indicator of being staff? 

The quick no-data-query-required test is to click through to a few profiles and look for examples of people who are not staff: https://github.com/orgs/mozilla/members. I found a few on the first page alone. But as stated earlier, it can also be hard to tell! Mozillians are a connected bunch who often work on other projects too. However, I found enough people in that list employed at other organizations to assume they are not all staff (though in some cases they used to be staff but are not now).

So to answer the question in it’s strictest sense, the answer is no. Being a member of the github organisation is not a certain indicator of being a paid member of staff.

But our context is more specific than this, so I need to refine the question: Is being a member of the mozilla ‘organization’ on github a good proxy indicator of being staff with regards to people actively contributing to Foundation projects on Github?

For this we go back to the data to check the most significant buckets of activity…

These are the priority accounts to manually check as they could skew the overall stats

These are the priority accounts to manually check as they could skew the overall stats

I can manually check this list of usernames making up the biggest chunks of contribution activity from those marked as ‘staff’.

There are a couple of people in here who are not current staff (and some former staff with less than 100 activities), but this would not skew the data enough that we should need to maintain yet another list of exceptions. There is also a further ‘grey area’ in the overlap between Mozilla contribution, and CDOT-supported/funded contribution to Mozilla.

I think for now at least, I will leave this list as it is, and say that the check against membership of the github organization is a meaningful filter, but we also need to maintain an extra list of ‘further people who are staff’.

So, I made these amends to Gitribution. Rebuilt the database and ran the queries again which gets us to here:

Comparing contributor numbers of staff to volunteers is barely changed, but the contribution activity is significantly different and will make our next analysis phase more accurate.

Comparing contributor numbers of staff to volunteers has barely changed, but the contribution activity is significantly different and will make our next analysis phase more accurate.

With the data in reasonable shape, we can do some more interesting analysis, which we’ll save for another post.

Gitribution

Click to embiggen. This was a check to see how well being a member of the github organisation flags someone as being staff.

Click to embiggen. How well does being a member of a github organisation flag someone as being staff?

Over the last week or so I’ve been building a thing: Gitribution. It’s an attempt to understand contributions to Mozilla Foundation work that happen on Github. It’s not perfect yet, but it’s in a state to get feedback on now.

Why did I build this?

For these reasons (in this order):

  1. Counting: To extract counts of contributor numbers from Github across Foundation projects on an automated ongoing basis
  2. Testing: To demo the API format we need for other sources of data to power our interim contributor dashboard
  3. Learning: To learn a bit about node.js so I can support metrics work on other projects more directly when it’s helpful to (i.e. submitting pull-requests rather than just opening bugs)

1. Counting

The data in this tool is all public data from the Github API, but it’s been restructured so it can be queried in ways that answer questions specific to our goals this year, and has some additional categorization of repositories to query against individual teams. The Github API on it’s own couldn’t answer our questions directly.

This also gives me data in a format that can be explored visually in Tableau (I’ll share this in a follow up blog post). We can now count Github contributors, and also analyze contributions.

2. Testing

Part of our interim dashboard plans include a standard format for reporting on numbers of active and new contributors for a given activity. Building this tool was a way to test if that format makes sense. The output is an API that you can ping with a date and see:

  1. The number of unique usernames to contribute in the 12 months prior (excluding those users who are members of the Github organization that owns the repositories – ie Mozilla or openNews)
  2. The number of those who only contributed in the 7 days prior (i.e. new contributors)

You can test the API here (change the date, or the team names – currently webmaker, openbadge, openNews)

We can use this in the dashboard soon.

Learning

I know a lot more about node.js than I did last week. So that’s something :)

I started out writing this as though it was a python app using JavaScript syntax before grasping the full implications of node’s non-blocking model.

I descended into what I later found out is called callback hell and felt much better when I learned that callback hell is a shared experience!

I tried an extreme escape from callback hell by re-building the app in a fire-and-forget process that kicked off several thousand pings to the Github API and paid no attention to whether or not they succeeded (clearly not a winning solution).

And I’ve ended up with something that isn’t too hellish but uses callbacks to manage the process flow. The current process is pretty linear, so I was able to sense check what it’s doing but it also works mostly on one task at a time so isn’t getting the potential value out of node’s non-blocking model.

Next steps

  • Tweaks to the categorization of ‘members=staff’
    • See the attached image of contributions by username. There are some members of staff with many contributions who are not members of Mozilla on Github. This is not material when counting number of contributors in relation to targets, but when we analyze contribution activity those users with a lot of contributions skew the data significantly.
  • Check and correct the list of repos assigned to each team
    • Currently a best guess based on my limited knowledge and some time trawling through all the repos on the main Mozilla Github page
  • Work out how to use this with Science Lab projects
    • as Software Carpentry use Github as part of their training (which I love) it means the data in their account doesn’t represent the same kinds of activities in the other repos. I need to think about this.
  • Pick the brains of my knowledgeable colleagues and get a review of this code

What else is this good for?