Getting Bicho Running as a process on Heroku with a Scheduler

By FĂ©licien Victor Joseph Rops (Belgium, Namur, 1833-1898) [Public domain], via Wikimedia Commons

“Ou la lecture du grimoire”

For our almost complete MoFo Interim Dashboard, I’m planning to use an issue tracker parsing tool called Bicho to work out how many people are involved in the Webmaker project in Bugzilla. Bicho is part of a suite of tools called Metrics Grimoire which I’ll explore in more detail in near future. When combined with vizGrimoire, you can generate interesting things like this which are very closely related to (but not exactly solving the same challenge) as our own contribution tracking efforts.

I recently installed a local copy of Bicho, and ran this against some products on Bugzilla to test it out. It generates a nicely structured relational database including the things I want to count and feed into our contributor numbers.

This morning I got this running on Heroku, which means it can run periodically and update a hosted DB, which can then feed numbers into our dashboard.

This was a bit trial and error for me as all the work I’ve done with Python was within Google App Engine’s setup, and my use of Heroku has been for Node apps, so these notes are to help me out some time in the future when I look back to this.

Getting this working on Heroku

$ pip freeze

generates a list of the requirements from your working localenv e.g.

BeautifulSoup==3.2.1
MySQL-python==1.2.5
feedparser==5.1.3
python-dateutil==2.2
six==1.6.1
storm==0.20
wsgiref==0.1.2

Copy this into a requirements.txt file in the root of your project

But remove the line: Bicho==0.9 (or it tries to install this via pip, which fails)

Heroku’s notes on specifying dependencies.

You can now push this to Heroku.

Then, I ran:

$ heroku run python setup.py

But I’m actually not sure if that was required.

Then you can run Bicho remotely via heroku run commands

$ heroku run python bin/bicho --db-user-out=yourdbusername --db-password-out=yourdbuserpassword --db-database-out=yourdbdatabase --db-hostname-out=yourdbhostname -d 5 -b bg --backend-user 'abugzilla@exampleuser.com' --backend-password 'bugzillapasswordexample' -u 'https://bugzillaurl.com?etc'

As a general precaution for anything like this, don’t use a user account that has any special privileges. I create duplicate logins that have the same level of access available to any member of the public.

Once you’ve got a command that works here, cancel the running script as it might have thousands of issues left to process.

Then setup a scheduler https://devcenter.heroku.com/articles/scheduler

$ heroku addons:add scheduler:standard
$ heroku addons:open scheduler

copy your working command into the scheduler just without the ‘heroku run’ part

python bin/bicho --db-user-out=yourdbusername --db-password-out=yourdbuserpassword --db-database-out=yourdbdatabase --db-hostname-out=yourdbhostname -d 5 -b bg --backend-user 'abugzilla@exampleuser.com' --backend-password 'bugzillapasswordexample' -u 'https://bugzillaurl.com?etc'

If you set this to run every 10 mins, the process will cycle and get killed periodically but in the logs this usefully shows you how the import is progressing.

I’m generally happy with this as a solution for counting contributors in Webmaker’s issue tracking history, but would need to work on some speed issues if this was of interest across Mozilla projects.

Currently, this is importing about 400 issues an hour, which would be problematic to process 1,000,000+ bugs in bugzilla.mozilla.org. But that’s not a problem to solve right now. And not necessarily the way you’d want to do that either.

One response to Getting Bicho Running as a process on Heroku with a Scheduler

  1. Manrique says:

    If you need any help, just let us know. I’ve just joint the MoFo Metrics mailing list, so let’s see if we can help somehow. Feedback and community is always welcome in our grimoire tools ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *