Open Positions

Data Governance Fellowships

The Masakhane Research Foundation, through the support of FAIR Forward, an initiative of the German Development Cooperation (GIZ), is glad to announce two Data Governance Fellowship positions.

Data sovereignty, African-centricity, sustainability, inclusivity and ownership are values fundamental to Masakhane. As it stands, Masakhane is regularly involved in language data collection activities. These are often volunteer efforts driven by researchers in a bid to create the datasets they need for their own research. We have also participated in dataset creation activities that have been funded by various organisations. Eg. We have been the recipient of a handful of Lacuna Fund grants to create language datasets and have collaborated with other organisations in the implementation of this work.


While data governance is a topic that is regularly addressed in our various meetings, we believe it requires dedicated resources to adequately explore, document and disseminate our learnings. These learning are often additional outputs through our various activities and therefore run the risk of not being intentionally documented. The ‘Data Governance’ Fellowships will be a step towards making sure of the intentionality of this task. We especially want to hear from women and historically marginalised candidates with a passion for inclusive AI for sustainable development and a strong background in digital and AI-related topics.

Activities and Responsibilities

This handbook will compile learnings and recommendations from the language data collection experiences of those amongst our ranks and others within our network who have undertaken this work. These learnings will broadly address community involvement looking to enable participatory dataset creation, curation and management through discussing issues such as;

Lanfrica is a platform that catalogues and links African language resources in order to mitigate the difficulty encountered in discovering African works by creating a centralised catalogue. For instance, if you’re looking for resources (linguistic datasets or research papers) in a particular African language, Lanfrica will point you to the different sources on the web that have such datasets in the desired language.
This project has adopted a participatory and community led approach, which is in-line with the fundamental values of Masakhane.
As the project platform is already set up, this part of the work will focus on;

In our work, we have been encouraging members of Masakhane to upload the datasets created in the course of their work onto an African NLP community on Zenodo. Additionally, we ask that the dataset be accompanied by data sheets. Having noted the lack of adherence to these requirements, this work would create a tool to streamline the process of creating metadata for datasets. We envisage a web tool that allows users to to enter project information and also be prompted in a step by step way to create a Data Sheet draft or markdown file. The tool starts with a few questions with drop downs. Given these responses, it then creates a document for you to create a version 1 of a datasheet.

The tool will be open source and will be a requirement of datasets that individuals wish to upload on the Zenodo African NLP community. As the community feature on Zenodo allows for the curator to not accept or make public a request to publish a dataset, this requirement will be enforceable on the platform. 


Professional Requirements of the Candidate:

Number of Individuals: 2

Time commitment: part-time (approximately 18 hours a week)

Remuneration: KES 2,730,240 before tax (approximately USD 20,984 using the current exchange rate - 130.11. As the funding is received in KES, payment will be made using the exchange rate made available by the Central Bank of Kenya on the day of payment)

Duration: 9 months

Location: Remote

To apply, please share a copy of your latest CV and a one page (A4) motivation letter to before 22h00 GMT on Friday, 19 April 2024.