A grassroots NLP community for Africa, by Africans
Masakhane is a grassroots organisation whose mission is to strengthen and spur NLP research in African languages, for Africans, by Africans. Despite the fact that 2000 of the world’s languages are African, African languages are barely represented in technology. The tragic past of colonialism has been devastating for African languages in terms of their support, preservation and integration. This has resulted in technological space that does not understand our names, our cultures, our places, our history.
Masakhane roughly translates to “We build together” in isiZulu. Our goal is for Africans to shape and own these technological advances towards human dignity, well-being and equity, through inclusive community building, open participatory research and multidisciplinarity.
Umuntu Ngumuntu Ngabantu - loosely translated from isiZulu means “a person is a person through another person” or “I am because you are”. This philosophy calls for collaboration and participation and community. It proposes relationality, over individualism for stronger social cohesions towards sustainable communities. It believes we share our successes and one’s personhood is evaluated based on their contributions to the community.
African-centricity. We centralize the narratives of Africans as a remedy to the effects of Euro-centricism on our beliefs. This way we reassert a new way of looking at information from a African perspective and shun any attempts to devalue our knowledge and stories
Ownership - We believe that Africans should be in charge of owning, driving and participating in the NLP research process, rather than as observers or data providers.
Openness - We believe in sharing our ideas and progress openly, especially on the African continent, for Africans. We’re against research that takes African contributions or data and puts them behind a paywall that is infeasible for Africans to access.
Multidisciplinarity - We truly believe that participation from all fields and experience and that multidisciplinarity leads to a more robust and more inclusive society
Everyone has valuable knowledge - We believe that each person’s individual experiences have value and each person is worth listening too and has something to contribute.
Kindness - We believe that being considerate, friendly and generous within our community is the best way to support it and encourage more inclusivity
Responsibility - We believe that each person in the technology process has an ethical responsibility to what they produce in the world. For this reason, we actively wreckon with the ethical impacts of our work
Data sovereignty - We believe Africans should be able to decide what data represents our communities globally, retain ultimate ownership of that data, and know how it is used
Reproducibility - We believe in reproducible research. As a result, we publish our code and data from our research so that others can reproduce and build upon it.
Sustainability - We believe that sustainability is necessary for societal change - that small daily efforts, over a long time are what truly change the world. To that, we aim for sustainability of our work, by being fully integrated with technological stakeholders to ensure the community continues to thrive into the future
Current State of NLP in Africa
Even in the forums which aim to widen NLP participation, Africa is barely represented - despite the fact that Africa has over 2000 languages. The 4th Industrial revolution in Africa cannot take place in English. It is imperitive that NLP models be developed for the African continent
As per Martinus (2019), some problems facing NLP in African languages are as follows:
Focus: According to Alexander (2009), African society does not see hope for indigenous languages to be accepted as a more primary mode for communication. As a result, there are few efforts to fund and focus on support of these languages, despite their potential impact
Low Resourced: The lack of resources for African languages hinders the ability for researchers to do NLP
Low Discoverability: The resources for African languages that do exist are hard to find. Often oneneeds to be associated with a specific academic nstitution in a specific country to gain access to the language data available for that country. This reduces the ability of countries and institutions to combine their knowledge and datasets to achieve better performance and innovations. Often the existing research itself is hard to discover since they are often published in smaller African conferences or journals, which are not electronically available nor indexed by research tools such as Google Scholar.
Lack of publicly-available benchmarks: Due to the low discoverability and the lack of research in the field, there are no publicly available benchmarks or leaderboards to new compare NLP techniques to
Reproducibility: The data and code of existing research are rarely shared, which means researchers cannot reproduce the results properly.
We propose to change that! Only by working together across the African continent can we do this!
Inclusive Community Building - We have a very active Google Group, Slack , with weekly meetings, talks and socials. There are no pre-requisites for joining except abiding by our Code of Conduct. Here we find collaborators, help each other, and build together.
Creating resources - We are building expertise around data gathering and performing "data archeology" to discover and create datasets. We experiment with more inclusive methodologies of data gathering to ensure that the data is truly representative of the culture.
Publication - Our research is already unearthing interesting findings. We write papers together to be submitted to workshops and conferences.
Lowering barriers to participation- By creating easy-to-use Google Colab notebook using Joey NMT, running workshops, and weekly meetings, the goal is to get up and running in NLP as easily as possible.
Facilitating Collaboration - Our community serves as a perfect place to find others to work with. In line with our values, through our weekly meetings and slack group, we help each other find each other.
Using Machine Translation as a stepping stone - Machine translation has been the first task we took on, but our aims have expanded. We're already making massive progress in NER, and speech.
To begin, check out GitHub README
The community consists of >400 participants from 30 African countries with diverse educations and occupations, and >3 countries outside Africa. As of February 2020, over 49 translation results for over 38 African languages have been published by over 35 contributors on GitHub.
The EMNLP Findings paper describes our approach to low-resource NLP: participatory research.
At EMNLP 2020, we gave the keynote at the prestigous WMT workshop. The talk features 15 of our participants.
In this talk, we challenge the idea that "low-resourcedness" is just a data problem. Instead, we propose that it is a societal problem, and that the best way to solve this societal problem is through participation
Where can I help?
You don't need to be a NLP researcher to join us! We want anyone passionate about African languages to join. So we have many major ways to help:
Accessing or creating datasets
Analysing how good (and bad) our models are
Mentoring budding NLP practitioners
Being a story teller - capturing our journey
How can I collaborate with Masakhane?
Are you looking to do research in African languages?
Welcome home! Join our slack community, contribute benchmarks, evaluations, models, join our weekly meetings when you can, our socials, find collaborators, express your priorities and constraints and ideas, build relationships, mentor, find mentors. The best way to work with us is to participate in our community and build long term relationships
We do not support “Parachute Research” from the Global North. If your ideas are presented without clear value towards our mission, our community is unlikely to participate.
Masakhane are not just annotators or translators. We are researchers. We can likely connect you with annotators or translators but we do not support shallow engagement of Africans as only data generators or consumers.
Are you a Research Lab looking to work on African NLP?
Masakhane can help connect you to African researchers who you can collaborate with. We can aid in helping you construct an internship or fellowship that reflect the values of the community and the constraints of the African landscape - as well as disseminate the opportunities among our participants.
Are you looking to build products for African NLP and would like to partner with African NLP researchers?
We have a jobs channel on our Slack where you can share your job specification. We believe Africans should be hired to develop African NLP products.
We support African-led start-ups who would like to pitch their business idea. You can arrange to pitch your idea at our weekly meeting, and engage with potential research partners and receive feedback.
We can connect you to African companies who work in African NLP and can consult for you.
We can advise on how to construct internships or positions for the African NLP scientists.
We do not support “Parachute Technology” from the Global North. If your ideas are presented without clear benefit to Africa, our community is unlikely to participate.
Unsure if your collaboration is beneficial to the Masakhane community? Drop us a note on firstname.lastname@example.org