Monojit Choudhury

Microsoft Research India

Monojit Choudhury currently, is a Principal Data and Applied Scientist at Turing India. They build large universal language models that form the backbone of various Microsoft products. Prior to this, he was a Principal researcher at Microsoft Research Lab India, and he still strongly collaborates with his colleagues from MSR. His research interests cut across the areas of linguistics, cognition, computation and society. He has a BTech and PhD in Computer Science and Engineering from IIT Kharagpur, and has been at Microsoft Research since 2007. He is interested in understanding the nature of the massively multilingual language models (such as Turing ULR, XLM-R, mBERT). As a part of Project LITMUS – Linguistically Aware Testing ofMultilingual Systems, he worked on systematic evaluation and estimation of MMLM performance across languages, even in the absence of test datasets. This in turn enables them to understand the factors that affect MMLM performance and build optimal data collection strategies to ensure more equal or equitable performance. He also collaborates on Project ELLORA, where their aim is to enable the speakers of low-resource languages through appropriate language technology. They are working with collaborators and NGOs on extremely low-resourced and lesser-known languages such as Gondi, Mundari, Idu-Mishmi, Sheng, Swahili and Igbo. Their decade-long experience in working with low-resource language communities tells us that technology is seldom the bottleneck; and more often than not, technological interventions do not work when the human and social contexts are not taken into consideration. On the other hand, participatory design and co-design, whenever possible, lead to simpler yet effective technological solutions.

Monojit Choudhury

Session 1D: Symposium on Developing Chat-GPT for India: Challenges and Opportunities in Building Large LanguageModels

Convener: Chiranjib Bhattacharyya

Safety, Infrastructure and Language: The Three Scaling Challenges of LLMs