Over the last week or so I’ve been building a thing: Gitribution. It’s an attempt to understand contributions to Mozilla Foundation work that happen on Github. It’s not perfect yet, but it’s in a state to get feedback on now.
Why did I build this?
For these reasons (in this order):
- Counting: To extract counts of contributor numbers from Github across Foundation projects on an automated ongoing basis
- Testing: To demo the API format we need for other sources of data to power our interim contributor dashboard
- Learning: To learn a bit about node.js so I can support metrics work on other projects more directly when it’s helpful to (i.e. submitting pull-requests rather than just opening bugs)
The data in this tool is all public data from the Github API, but it’s been restructured so it can be queried in ways that answer questions specific to our goals this year, and has some additional categorization of repositories to query against individual teams. The Github API on it’s own couldn’t answer our questions directly.
This also gives me data in a format that can be explored visually in Tableau (I’ll share this in a follow up blog post). We can now count Github contributors, and also analyze contributions.
Part of our interim dashboard plans include a standard format for reporting on numbers of active and new contributors for a given activity. Building this tool was a way to test if that format makes sense. The output is an API that you can ping with a date and see:
- The number of unique usernames to contribute in the 12 months prior (excluding those users who are members of the Github organization that owns the repositories – ie Mozilla or openNews)
- The number of those who only contributed in the 7 days prior (i.e. new contributors)
You can test the API here (change the date, or the team names – currently webmaker, openbadge, openNews)
We can use this in the dashboard soon.
I know a lot more about node.js than I did last week. So that’s something 🙂
I descended into what I later found out is called callback hell and felt much better when I learned that callback hell is a shared experience!
I tried an extreme escape from callback hell by re-building the app in a fire-and-forget process that kicked off several thousand pings to the Github API and paid no attention to whether or not they succeeded (clearly not a winning solution).
And I’ve ended up with something that isn’t too hellish but uses callbacks to manage the process flow. The current process is pretty linear, so I was able to sense check what it’s doing but it also works mostly on one task at a time so isn’t getting the potential value out of node’s non-blocking model.
- Tweaks to the categorization of ‘members=staff’
- See the attached image of contributions by username. There are some members of staff with many contributions who are not members of Mozilla on Github. This is not material when counting number of contributors in relation to targets, but when we analyze contribution activity those users with a lot of contributions skew the data significantly.
- Check and correct the list of repos assigned to each team
- Currently a best guess based on my limited knowledge and some time trawling through all the repos on the main Mozilla Github page
- Work out how to use this with Science Lab projects
- as Software Carpentry use Github as part of their training (which I love) it means the data in their account doesn’t represent the same kinds of activities in the other repos. I need to think about this.
- Pick the brains of my knowledgeable colleagues and get a review of this code
What else is this good for?
- This might be useful as one of the ways we get data into the upcoming project Baloo.