Guia |
![]()
Valentin Barriere
|
---|---|
Áreas | Ciencia e Ingeniería de datos, Inteligencia artificial |
Sub Áreas | Minería de datos, Recuperación de información, Procesamiento de lenguaje natural |
Estado | Disponible |
Large Language Models (LLMs) exhibit inequalities with respect to various cultural contexts. Trained on Global North data, they can show prejudicial behavior towards other cultures. Moreover, there is a notable lack of resources to detect biases in non-English languages, especially in Latin American dialects. We propose to leverage the content of Wikipedia, the structure of the Wikidata knowledge graph, and expert knowledge from social science in order to create a dataset of Questions/Answers (Q/As) pairs, based on the different popular and social cultures of various Latin American countries. We propose to work on the definition of sociocultural bias such that computing methods can be used for both detecting and quantifying its associated valence. We will focus on general methods adapted to multilingual models in various contexts and propose to apply this to Latin America, a continent containing various cultures, even though they share a common cultural ground.
The idea is to scrap Wikipedia in order to retrieve a quantity of documents related to regional pop culture facts, and create a database of questions with the associated answers to estimate the level of knowledge of LLMs. We will take inspiration from (i) different benchmarks lilke CaLMQA, INCLUDE, BLEnD, NORMAD, CultureAtlas and CulturalBench that are all missing South America with popular culture as a target [1,2,3,4,5,6] (ii) a sociology-based methodology collecting data in long interviews [7].
References:
[1] Arora, S., Karpinska, M., Chen, H.-T., Bhattacharjee, I., Iyyer, M., & Choi, E. (2025). CaLMQA: Exploring culturally specific long-form question answering across 23 languages. ACL, 1, 11772–11817. Retrieved from http://arxiv.org/abs/2406.17761
[2] Romanou, A., Foroutan, N., Sotnikova, A., Chen, Z., Nelaturu, S. H., Singh, S., … Bosselut, A. (2024). INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge, 1–33. Retrieved from http://arxiv.org/abs/2411.19799
[3] Myung, J., Lee, N., Zhou, Y., Jin, J., Putri, R. A., Antypas, D., … Oh, A. (2024). BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages. Submitted to NeurIPS 2024 Datasets and Benchmarks Track, 1–36. Retrieved from http://arxiv.org/abs/2406.09948
[4] Rao, A., Yerukola, A., Shah, V., Reinecke, K., & Sap, M. (2024). NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models. Retrieved from http://arxiv.org/abs/2404.12464
[5] Acquisition, M. M. K., & Benchmarking, L. M. (2023). No Culture Left Behind: Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking on 1000 + Sub-Country Regions and 2000 + Ethnolinguistic Groups.
[6] Chiu, Y. Y., Jiang, L., Lin, B. Y., Park, C. Y., Li, S. S., Ravi, S., … Choi, Y. (2025). CULTURALBENCH: A ROBUST, DIVERSE AND CHALLENGING BENCHMARK ON MEASURING THE (LACK OF) CULTURAL KNOWLEDGE OF LLMS. In ACL (pp. 1–26).
[7] Montalan, J. R., Ngui, J. G., Leong, W. Q., Susanto, Y., Rengarajan, H., Aji, A. F., & Tjhi, W. C. (2024). Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino. PACLIC. Retrieved from http://arxiv.org/abs/2409.15380