The past weekend I attended my second Random Hacks of Kindness event in the space of a few months. I went to the Oxford event which was kindly organized by Oxfam and White October. My initial plan was actually to go to the Belgium event which was organized in my old home town of Antwerp. But my travel plans ended up being changed so Oxford it was.
The organization was the same as other hackathons I had been to: organizers welcome everybody, idea owners pitch their ideas, chaos, teams are formed. I quickly settled on the NGO Clarity idea by TechSoup Global Data Acquisitions Manager Dinesh Venkateswaran. With the drive to increasing transparency in the aid sector, there is a lot of data made available on NGOs. The problem is that the format and quality of the available data varies wildly between and within countries. Standardization efforts like IATI are trying to change this, but there is still a long way to go.
The goal of the NGO Clarity problem was to setup a system that would analyse the available data on a particular NGO and, depending on a set of rules, output a number of scores indicative of the quality of the provided data (notof the quality of the NGO itself). Such a system would allow a donor or organization like TechSoup quickly evaluate how much effort would be needed to evaluate an NGO. Are there just a few small details missing or are there large gaping holes/inconsistencies in the data? Solving this involved a three step process:
- Data Aquisition: Dinesh had about 40000 records on Indian NGO’s with him that needed to be loaded and pre-processed
- Inference: using heuristics, predefined rules & machine learning to assign scores to each NGO
- Presentation: presenting the results in an intuitive manner in the form of a website
While we could not solve every step in detail, my hope was to at least setup an end to end proof of principle.
Our initial team consisted of 4 people, including Dinesh, but unfortunately one soon left to join another project. After some discussion it was decided that all three steps would be implemented using XPath/XQuery & the XML eXist database. Since this was the particular specialty of 3rd team member Chris Wallace. By the first show and tell we had a prototype working, but unfortunately Chris had to leave by then and would not return the next day. Two others (Russel and Tim) were then kind enough to join me to continue the work.
However, the problem was that none of us was particularly familiar with XQuery. I setup the eXist database & managed to run Chris’ scripts but we were pretty much stuck trying to extend it in some meaningful manner. Russel then gave up and left, & Tim had another project to attend to, so it was essentially just me to continue. After some more poking at the code I also decided to give up as well. Progress was just too slow and as nobody else at the event had any knowledge of XQuery or was willing to help pick it apart I would never get it finished.
Thus, on the train home I decided to start from scratch. Since I’ve been doing a lot of python lately it was natural to use that. Though if I were to start this from scratch as a hobby project I would have used Rails or Play, just to get to know those as well. Anyways, I used Django on the backend and jQuery/Bootstrap on the frontend. About 2 hours later, just when my train arrived I was at feature parity with the xquery stuff. Some more furious hacking the next morning, Ben Foxall helping out with the templates, and about 10 seconds before the presentation was due to start I pushed the demo code to github. I tried to get a map view to work as wel (geocoding the NGO addresses & plotting them using Google Maps) but couldn’t quite get it to work in time).
Our final presentaion, excellently prepared by Dinesh is up on slideshare:
In sum, it was another great learning experience. In good hackathon tradition the code is rather hairy so I plan to take some time to clean things up & hopefully improve the rules & structure. I think its a great idea so I would like to continue working on it as there is much that still needs to be done (see slides). Unfortunately time is in short supply these days, but let see how it goes… 🙂