In recent years, an increasing number of realworld observations and research findings have shown that machine learning (ML) can lead to unfair stereotyping and differences in outcomes between social groups. This is an issue for everything from small (Angwin and Larson 2016, Chouldechova 2017) to extremely large datasets (Blodgett et al. 2020). Although specific patterns of ‘unfair’behaviours differ widely between fields (stereotypical embeddings in natural language processing, unequal performances in computer vision, etc.), a simple and general approach consists in ignoring protected attributes during training. However, this is not always straightforward and because of dependencies between features this approach provides few guarantees and may even lead to internal representations that could reconstruct protected attributes’ value. This can significantly affect social groups (Angwin and Larson 2016, Corbett-Davies and Goel 2018, Pierson et al. 2018). In other cases, unfair behaviours stem from insufficient coverage of social groups in the training data, eg facial recognition algorithms with ethnicity-dependent performance.