Features

Kazakhstan’s Bid For AI Sovereignty

Recent Features

Features | Politics | Central Asia

Kazakhstan’s Bid For AI Sovereignty

Astana is talking a big game on artificial intelligence, but can it deliver?

Kazakhstan’s Bid For AI Sovereignty
Credit: Depositphotos

On March 13, Kazakhstan’s President Kassym-Jomart Tokayev met with Thomas Pramotedham, the CEO of Presight AI, an artificial intelligence firm, to discuss plans for a supercomputer cluster in the country. The project is part of a slew of initiatives from the government to position itself as a regional leader in artificial intelligence. 

Astana is placing hope in the technology not merely for economic growth. There is also a cultural aspect to the push, with a strong domestic AI industry seen as vital for linguistic preservation.

However, as a recent delay to the supercomputer project demonstrates, even the best laid plans can fall victim to geopolitical forces. While Kazakhstan might talk a big game on AI, can it deliver?

Controlling The Narrative

Large Language Models, or LLMs, are the basis of AI programs such as ChatGPT, which process, understand, and generate human language. These models are overwhelmingly trained on a handful of dominant languages, such as English, Mandarin, and Spanish, while smaller languages like Kazakh are often overlooked. 

“While the larger LLMs are adding additional languages, these languages are not necessarily supported to an equal extent,” said Preslav Nakov, department chair and professor of natural language processing at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in Abu Dhabi. “LLMs use neural networks and have a limited capacity; their developers inevitably ask themselves whether they want to invest in using that capacity to support more languages or to improve in other areas, such as reasoning capabilities.” 

The secondary importance given to smaller languages leads to AI models which promote a Western world view, says Dion Wiggins, CTO of Omniscience, a firm specializing in AI-driven language processing solutions. “If you go to Grok or Llama or ChatGPT, they’re more or less all the same because they all learn from the same data,” he said.

However, if countries such as Kazakhstan could produce their own LLMs, it would mean more control over the narrative. 

“If you have a sovereign LLM, it’s got Kazakh morals, Kazakh history, Kazakh lenses, and a viewpoint from this part of the world,” said Wiggins. He cites China’s DeepSeek, which limits access to information on the Tiananmen Square massacre, and Google’s Gemini, which refuses to answer a simple question such as “Who is the President of the United States?” as examples of how we are already seeing AI being used for censorship.

Mind Your Language

LLMs require enormous amounts of data to train them to be effective.

“And there’s the problem,” said Wiggins. “There’s just not much Kazakh data.”

One of the largest data sources for AI training is Common Crawl, a non-profit that archives online information and makes it freely available to the public. Its statistics show a huge linguistic bias: 43.4 percent of Common Crawl web pages are in English. In fact, over 70 percent of all web-based data is from seven major languages: English, Russian, German, Japanese, Chinese, Spanish and French.

Kazakh accounts for 0.0298 percent. In other words, if you randomly scrolled through 10,000 web pages, three would be in Kazakh; 605 in Russian, and 4,337 in English.

Made with Flourish

This has real-world consequences: search engines prioritize English content, AI-powered assistants struggle with non-English queries, and automated translation services remain unreliable in many languages. 

“In Kazakhstan, this issue is further compounded by the historically intrinsic problem stemming from reliance on Russian,” said Aisana Kassenova, a Kazakh-born PHD candidate in AI at the Esade Business & Law School in Barcelona. “Many translating tools, like Google Translate, still use Russian as an intermediary when translating Kazakh, making it often inaccurate.”

Astana has a long-standing policy to try to promote the Kazakh language over Russian, which for many years was considered to be the language of the urban elite in the country. Many would argue that it still is: Russian has an enormous head start over Kazakh in the digital space, meaning that the majority of interactions with AI are conducted in Russian.

“This leads to a lack of Kazakh language datasets, reinforcing the perception that Russian remains the more “practical” language for technology and AI development in Kazakhstan,” said Kassenova.

Home Grown LLMs

As such, the search began for Kazakhstan’s first large language model. In December 2024, the country struck gold when Nazarbayev University’s Institute of Smart Systems and Artificial Intelligence (ISSAI) unveiled KazLLM. Designed to process and generate text in Kazakh, Russian, English, and Turkish, KazLLM was developed using a vast dataset collected from sources such as news outlets, government websites, and open-access materials. The model’s performance even drew praise from Yan LeCun, the head of AI and Research at U.S. tech giant Meta. 

That was followed in February 2025 by Sherkala, another Kazakh language AI model, developed in collaboration at MBZUAI in Abu Dhabi. 

Professor Nakov, the project’s leader, told the Diplomat that Sherkala is following in the footsteps of JAIS (2023) and NANDA (2024), which are focused on Arabic and Hindi, respectively. 

“Sherkala is built on LLaMA, the widely adopted open-source AI model from Meta, which already includes some multilingual support, but not enough to provide the level of accuracy and cultural awareness for languages such as Kazakh,” he said. To develop the model, his team made sure to fine-tune it with extra information about the culture and history of Kazakhstan.

Kassenova argues that KazLLM and Sherkala have not been designed to compete with mainstream AI models, but rather to provide more inclusivity. “Models like ChatGPT, Gemini, and Qwen are built with massive resources, endless multilingual datasets, and cutting-edge computing power, aiming for general intelligence,” she said. “In contrast, Kazakh LLMs were created in relatively small teams (and with a relatively small budget) to ensure that Kazakh speakers have AI tools tailored to our language and cultural context.”

Building AI Infrastructure

Kazakhstan’s AI ambitions extend beyond language models. Another plank of the strategy involves the creation of a national supercomputer. 

“[This] would be key for AI development,” said Kassenova. “The country has long depended on Russian computing systems, but with Russia facing its own AI chip shortages, turning to it isn’t an option.” 

The government has partnered with Presight.ai, another UAE firm, to build the supercomputer. However, delays in acquiring high-performance NVIDIA chips due to U.S. export restrictions have slowed progress on the project, which was due to be completed last year. This NVIDIA embargo has created significant bottlenecks, with the company controlling around 80 percent of the global market for AI chips. 

Wiggins suggests that Kazakhstan could turn eastward for help. “Huawei in China has created GPUs that are not as good yet, but they’re good enough,” he said referencing the recent positive performance of the Huawei Ascend 910 C chip, which has begun to close the gap on NVIDIA. 

Building an AI ecosystem requires human capital as well as infrastructure. In 2024, Kazakhstan began to introduce AI literacy courses across all universities in the country. The Astana Hub Technopark has also begun an annual project to train 700 AI teachers from 47 national universities. 

Astana also envisions becoming a regional AI hub. Plans are underway to establish an International AI Center in 2025, a move designed to attract global research collaborations and investment.

The Cart Before The Horse

However, announcing that 1 million people will be trained in AI is different from persuading them to undertake the training, just as convincing people to use Sherkala over Russian language equivalents is not a given. Kazakhstan has been here before, prematurely proclaiming itself a global hub for everything from logistics to religion

Another issue is openness. Large language models thrive on huge amounts of accurate, comprehensive information. 

While governments that tend toward opacity, such as China, have shown that a highly controlled, top-down approach with state backing, massive data, and corporate alignment can also drive progress, Kazakhstan may not have the resources to emulate that model.

A cheaper approach would be an environment which fosters open and easy access to data, particularly given the relative dearth of Kazakh language sources. However, with Reporters Without Borders ranking the country 142nd out of 180 on its 2024 World Press Freedom Index, this does not appear to be a priority.

For all its grand designs, Astana’s iron grip on information may end up holding the country back.

ISSAI, the creators of KazLLM, did not respond to requests for comment.

Presight.ai declined to comment, suggesting that questions be directed toward the government.

The government’s Ministry of Digital Development was not available for comment. 

Dreaming of a career in the Asia-Pacific?
Try The Diplomat's jobs board.
Find your Asia-Pacific job