This post was written in early 2024 but wasn’t published until now (5 November 2025). Hopefully it provides a bit more background to the Learning Companion and it remains relevant with regard to the impact of AI on many sectors.
WASHWeb
The water, sanitation and hygiene sector is full of passionate people and smart and experienced people but it is not uniform. There are different communities at work. Some across countries spurred by health, humanitarian, and development arguments about the importance of water, handwashing, menstrual hygiene management and wastewater management. There is also a community engaged day to day keeping services running that have been planned to survive many years as water and sanitation are most often tied to hard assets whether ancient or new. There is an urgency to reach everyone and there is also deep-seated conservatism to keep things the “way they are”. At the same time, the world is changing, the information and knowledge revolution of the 20th and 21st centuries are accelerating the pace. No longer a step at a time but leaps and bounds. With the advent of large language models and AI, it is accelerating even faster. There are many community tapping now into these new technologies from the AI community and from large scale services like wikipedia.
I have been focused on bringing people together to think about data and to bridge these communities. I founded WASHWeb as a space where discussions can be had in this intersection of data innovation and sustainable WASH services.
The steadfast operation of water and sanitation services, the urgency of the Sustainable Development Goals and humanitarian responses, and the faster pace of the technology-driven revolution. It is possible to improve exchanges between these communities.
WASH Registry: the vision
Some years back, I started working conceptually on a WASH registry where we can register what we want to share publicly and allow the different communities that work together on WASH to share a validated set of public information. I asked myself, why is it so difficult for non-experts to find the latest information on WASH in their country / region? A registry could become a focal point, a source like Wikipedia where one can find some summarized and validated information and links back to their original sources.
As a monitoring and evaluation expert, I’m very concerned about the quality of information in the sector and the quality of information found online. There is critical need to have a public forum where open information can be scrutinized and validated. There is also a need for this validated information to more discoverable than other sources of information.
With the wholesale adoption of large language models like OpenAI’s GPT-4, the WASH sector is going to receive a deluge of new sources of information and analysis through the users of these large language models and also through analytical tools which will increasingly enable users to do express their research questions in natural language and will automatically collect and analyze the resulting information. This is very exciting but depends critically on validated sources of information and groundtruthing.
Language models become more performant, they are increasingly able to sound like they are making sense while providing false information. They can speak the language of experts and generate nonsense. Even more worryingly, they also encode biases that are found in the content used to train the models and from the biases provided during reinforcement learning with human feedback. In other words, as the these AI models improve, they become better at hiding biases. A recent article showed that when a language model was asked to make a judgement about what a person said, the dialect of English used, influenced greatly the judgements. They found that the overt or public racism of powerful models like GPT-4 reduced but that in these judgements they found that these models trained with human feedback had internalized very strong and measurable racial biases when reading statements in different dialects.
As we aim to achieve safely managed services for everyone and move towards the SDGs, the need to address bias, especially toward vulnerable groups, is critical for WASH and beyond. What type of information would be useful to have validated? These are some ideas:
- Names, websites, and locations of organizations, such as service providers, government agencies, NGOs, and consultancies,
- Datasets and their publishers and other important metadata,
- Research and their authors and dates,
- Standard indicators and their data sources and methodologies
- Mappings of indicators from different methodologies and how they relate
- Service delivery models and links to their legal/standardized descriptions
- Service levels achieved per region and their data sources and methodologies
Together we can start to bind together the data from different initiatives by providing a consistent interface to access it.
One does not need to look far to see this is already happening and that there are strong reasons to do so. Wikidata is an enormous repository of information online that is used to feed data into Wikipedia articles and is increasingly an important source of information in its own right. Ultimately, projects like Wikidata are exactly the place that this validated information should end up.
By building a registry, we’ll be able to ask questions such as which service delivery models are well known and have definitions online with clear regulations. This could be helpful for when the next round of GLAAS surveys, as ideally there should be specific information tied to each response about regulations in each country.
By joining together the practices and technologies of the Wikidata community with the WASH sector this can be done.
A knowledge companion
As a next step, in August 2023, I began exchanging with Jeske Verhoeven, the Director of the WASH Systems Academy about building an AI-assisted Learning Companion. The idea was simple, could we not build a knowledge base and an assistant to help teach and onboard people who want to learn about water and sanitation systems. It should be able to not only search and use key references used to teach learners of the Academy but also be able to respond using the correct language and model of the WASH sector.
If you want to read about the next steps we took, read the article about the learning companion.
