A new tool from Russia’s tech giant can process a thousand academic sources a day — cutting research tasks that once took hours down to minutes
For most of its 200-year history, Russia’s Institute of Oriental Studies (IV RAN) has worked the way all great scholarship does: slowly, carefully, and largely beyond public view. Founded in St. Petersburg in 1818 as the Asiatic Museum of the Imperial Academy of Sciences, the Moscow-based institute has since grown into one of the world’s leading centres for the study of Asia and North Africa — covering everything from ancient manuscripts and epigraphy to the contemporary politics of the Middle East. Its researchers routinely work with primary sources in languages that most people have never heard of.
That pace is now changing.
In November 2025, the institute unveiled an AI assistant, built in collaboration with Yandex Cloud, the cloud and AI division of Russia’s dominant technology company. Launched in testing mode, it is capable of processing up to 1,000 Chinese-language sources per day — compared to the 8–10 a researcher could realistically handle alone. The system extracts key facts, generates concise analytical summaries, and monitors Chinese-language media, producing Russian-language digests for scholars who need to track regional developments without spending hours on translation.
The numbers tell the story bluntly. According to Alexander Kostyrkin, a senior researcher at the institute’s Laboratory for Digital Studies of the Contemporary East, the tool has cut the time required for a single research task from several hours to roughly 10–15 minutes — six to eight times faster.
A Library of 1.5 Million Documents
The assistant was built on Yandex AI Studio, Yandex’s platform for developing AI-powered applications, and trained by specialists from Yandex Cloud alongside students from the company’s School of Data Analysis. Its knowledge base was drawn from the institute’s own archive — monographs, academic papers, and other scholarly publications accumulated over decades of research.
By the time the system was announced, that base had already exceeded 1.5 million documents, representing Chinese in four regional variants: mainland, Taiwanese, Hong Kong, and Singaporean. The current version is in a testing phase, available only to institute staff. Once testing concludes, the developers plan to open it to researchers across Russia.
The roadmap is ambitious. Arabic, Japanese, Turkish, and Persian are all slated for addition — a step that would give Russian scholars direct computational access to primary sources across much of the non-Western world.
“Oriental scholars especially need to work with primary sources — to understand the language, context, and logic of the region they write about,” said Anna Lemyakina, Director of National and Strategic Projects at Yandex Cloud, explaining the goal of the collaboration. “AI allows them to do this faster, more precisely, and more deeply.”
The project is part of Yandex’s Center for Technologies for Society, which applies the company’s tools to public-interest problems rather than commercial ones.
The Broader Moment
The Russian initiative arrives at a moment when the use of AI in humanities research is accelerating globally — and drawing serious money.
In December 2025, Schmidt Sciences, the nonprofit founded by Eric and Wendy Schmidt, awarded $11 million to 23 interdisciplinary research teams across multiple continents through its Humanities and AI Virtual Institute (HAVI). The funded projects range from virtually unwinding damaged ancient scrolls to building multilingual chatbots that let scholars interrogate medieval judicial documents. Researchers at Harvard, Columbia, Berkeley, and the Sorbonne are all involved.
“Rather than destroying the humanities, as many have feared, AI has a role in advancing the humanities,” said Brent Seales, the University of Kentucky professor who directs HAVI and previously gained international recognition for developing technology to read ancient scrolls without physically opening them.
The challenge these projects share — and that the Russian initiative reflects in its own way — is a fundamental mismatch between how mainstream AI systems are built and what humanities scholars actually need. Modern language models are trained on vast quantities of contemporary, predominantly English-language text. They struggle with ancient scripts, low-resource languages, three-dimensional artefacts, and the kind of culturally specific ambiguity that is, in a sense, the whole point of humanistic inquiry.
Two Institutions, One History
A note on the institutional landscape matters here, since the two main Russian centres for this work are closely related but distinct.
IV RAN — the direct partner in the Yandex project — is the research and analytical arm, covering contemporary geopolitics and culture across Asia and Africa alongside classical scholarship. Its director, Alikber Alikberov, has described the Yandex collaboration as part of a sweeping digital transformation of the institute, one that seeks to expand the depth and scale of analysis while preserving what he calls “the fundamentality of the academic approach.”
The Institute of Oriental Manuscripts (IVR RAN, St. Petersburg), housed in the Novo-Mikhailovsky Palace on the Neva, is the custodial institution — a repository of actual manuscripts and written heritage, spun off from the Moscow institute following a 2007 reorganization and established as an independent institution in 2009. It is the older of the two branches in spirit: the Asiatic Museum of 1818 was established precisely as a repository of Eastern manuscripts and books, and it is that custodial tradition — rather than the analytical one — that IVR RAN most directly inherits.
Both institutions ultimately trace back to the same founding act of 1818. The current AI initiative is a Moscow story, but it sits within that much longer tradition of trying to make the written East legible to the world.
The Yandex collaboration is not IV RAN’s only AI venture. The institute has also developed TRAIT — a computer-assisted translation tool that uses generative AI to help scholars render Tibetan-language Buddhist texts into Russian. The project targets the Kangyur, the canonical collection of Buddha’s teachings, which runs to 333 volumes and whose translation into Russian remains unfinished. TRAIT is designed to accelerate that work, bringing together orientalists, AI engineers, and Buddhist scholars in a collaboration that few institutions outside Russia would be positioned to attempt.
Access as the Problem
One of the less obvious arguments for this kind of tool is simply the language barrier. As Kostyrkin put it, the institute had long faced a situation in which original-language sources were not just difficult to analyse — they were difficult to find in quantity. Many materials from Asian countries are published only in their national languages and are rarely translated. That, combined with the sheer volume of contemporary media output, left researchers dependent on a narrow slice of what actually existed.
The AI assistant addresses the bottleneck at both ends: it broadens the pool of sources that can be monitored and shortens the time required to extract what is useful from each one.
Whether a system trained predominantly on the institute’s own archive will generalise well to sources beyond it remains a practical question to be answered as the project expands. But as a proof of concept — and as an argument for what AI can do when adapted specifically to the needs of a scholarly community — the collaboration between Yandex and an institution whose scholarly roots go back to 1818 represents something genuinely new.
The sources have waited two centuries. The scholars, it seems, are done waiting.

