Is being a member of the mozilla ‘organization’ on github a good proxy indicator of being staff?

Following on from the post about Gitribution, these are my notes around my initial exploration of the data extracted from Github.

One of the challenges of counting volunteer contributors to Mozilla is working out who is a volunteer and who is paid-staff. The concept of a volunteer contributor in itself is full of complications, as paid staff will volunteer their free time on other projects they care about, and contributors become employees, or employees will work using their personal email addresses and so on. The fidelity of tracking that would be required to *perfectly* identify when someone does something on a ‘voluntary’ basis would not be proportionate to the impact this would have on the usefulness of the final reporting. So perfect tracking is not the goal here.

My first pass at filtering out staff from contributor counts on github was to look at whether someone is a member of the mozilla organization on github. I thought this would be a good proxy for ‘staff’, and doing this gave us this breakdown:

Without manually checking usernames, this is how the data is split between staff and contributors
Without manually checking usernames, this is how the contribution counts are split between staff and contributors

However, in this non-staff contributor segment of the data, there are a few names I know are definitely staff, and as I don’t know all of Mozilla’s staff I assume others in here are staff too.

Some names definitely in the wrong buckets at significant scale
Some names here are definitely in the wrong buckets, with significant contribution numbers linked to them

So, it’s safe to say that the inverse of our question is false. That is: not being a member of the org on github is not a good enough proxy to say someone is not a paid member of staff.

This is less critical when counting the number of people. For example this is the split of volunteers to staff using this github membership status as the proxy measure:

There might be 10 people who technically need to move from the blue to the orange bar, but that's not important if the aim is growing the blue bar 10x without much change to the orange bar.
There might be ~10 people who technically need to move from the blue to the orange bar, but that’s not important if the aim is growing the blue bar 10x without much change to the orange bar.

But if we want to analyze contribution activity (we do!) I need to manually (with a little automation in Tableau) check these github accounts, and add those who are staff to an extra list within Gitribution to cross check when saving the data:

4 Manually Check
These are the most significant accounts to check for people who are staff

Getting back to the original question… Is being a member of the mozilla ‘organization’ on github a good proxy indicator of being staff? 

The quick no-data-query-required test is to click through to a few profiles and look for examples of people who are not staff: https://github.com/orgs/mozilla/members. I found a few on the first page alone. But as stated earlier, it can also be hard to tell! Mozillians are a connected bunch who often work on other projects too. However, I found enough people in that list employed at other organizations to assume they are not all staff (though in some cases they used to be staff but are not now).

So to answer the question in it’s strictest sense, the answer is no. Being a member of the github organisation is not a certain indicator of being a paid member of staff.

But our context is more specific than this, so I need to refine the question: Is being a member of the mozilla ‘organization’ on github a good proxy indicator of being staff with regards to people actively contributing to Foundation projects on Github?

For this we go back to the data to check the most significant buckets of activity…

These are the priority accounts to manually check as they could skew the overall stats
These are the priority accounts to manually check as they could skew the overall stats

I can manually check this list of usernames making up the biggest chunks of contribution activity from those marked as ‘staff’.

There are a couple of people in here who are not current staff (and some former staff with less than 100 activities), but this would not skew the data enough that we should need to maintain yet another list of exceptions. There is also a further ‘grey area’ in the overlap between Mozilla contribution, and CDOT-supported/funded contribution to Mozilla.

I think for now at least, I will leave this list as it is, and say that the check against membership of the github organization is a meaningful filter, but we also need to maintain an extra list of ‘further people who are staff’.

So, I made these amends to Gitribution. Rebuilt the database and ran the queries again which gets us to here:

Comparing contributor numbers of staff to volunteers is barely changed, but the contribution activity is significantly different and will make our next analysis phase more accurate.
Comparing contributor numbers of staff to volunteers has barely changed, but the contribution activity is significantly different and will make our next analysis phase more accurate.

With the data in reasonable shape, we can do some more interesting analysis, which we’ll save for another post.