© CABAR - Central Asian Bureau for Analytical Reporting
Please make active links to the source, when using materials from this website

“Hey Akylai”. Why does Kyrgyzstan teach the neural network the Kyrgyz language?

Akylai, voice assistant speaking Kyrgyz, is to be introduced in Kyrgyzstan. A team of Kyrgyzstanis from across the world is working over this project and they promise to provide a pilot working model in a year. Co-founders of The Cramer Project Timur Turatali and Ulan Abdurazakov said to CABAR.asia how it will work and why the voice assistant should be taught the Kyrgyz language.


First of all, let’s define the terms. Artificial intelligence, neural network and machine learning – what is the difference? And what category do voice assistance fall under?

Тимур Туратали. Фото из личного архива

Timur Turatali: In my opinion, artificial intelligence is the marketing term. In fact, there is no artificial intelligence as we imagine it.

Data science is a major field on how to get insights from the data and use them for the sake of business. For example, a retail company using the data can find out who buys more Snickers from it and at what time of the day.

Machine learning is the subfield of data science, which deals with the creation of machines without being explicitly programmed. This is the term used by the Stanford University. It imitates the human behaviour and human intelligence in basic concepts.

Neural networks is the field of machine learning, which has been viral for the last 10 years because it is the new algorithm of machine learning. The idea is to model the human brain’s work.

So, voice assistants can be referred to neural networks?

T.T.: Yes. Data science is divided into regions. For example, we all speak the language of NLP, processing of natural language. These are voice assistants, chat bots, the same ChatGPT. The best instrument to create such NLP projects is neural networks.

There is also computer vision, i.e. processing of images of fixed and moving objects by means of computer tools. There is the classic machine learning by means of old effective algorithms, which were invented back in the 20th century.

 

Who of you had the idea to create a voice assistant in the Kyrgyz language?

T.T.: I think we came up with this idea by brainstorming.

We should understand here that we had a goal – to create a project to support the community. We chose and developed the Akylai project in this regard. We decided to train the voice assistant from scratch – to collect data, teach it simple models, and then complex one – Kyrgyz language.

Why teach the neural network to the Kyrgyz language at all?

T.T.: The big problem now in the market is the lack of tools for the Kyrgyz language.

Every year, the Kyrgyz language is more and more used on the internet. Many of my friends, including the ones from the media sphere, always speak about the lack of tools. In other words, they have a text, and it should be processed in terms of data science, but they cannot do this. We decided it was a problem and it should be solved.

Улан Абдуразаков. Фото из личного архива

Ulan Abdurazakov: In general, Kyrgyz common language differs much from the literary language. By means of the project, we want to popularise the Kyrgyz literary language so that people could write in it and speak it. It will thus contribute to the development of the language itself.

Do you mean that voice assistant Akylai will be speaking the literary Kyrgyz language?

U.A.: Both literary and common. But the assistant will surely know literary words.

Our common Kyrgyz language is partially Kyrgyz, partially Russian words. Will Akylai speak only Kyrgyz?

U.A.: We will try to make it use Kyrgyz words to the maximum. Maybe she can make up some abbreviations resembling real Kyrgyz words.

Which data do you use to teach the neural network the Kyrgyz language? I believe there is a limited data array in the Kyrgyz, moreover, in the literary Kyrgyz language.

T.T.: Yes, it’s true. We have parsed all the internet and now are digitising all available books.

U.A.: We are doing it in partnership with the Kyrgyz Technical University, Hi Tech Park. Kloop.kg also helped us by providing their data to us.

T.T.: There’s also a company Inkubasia. They helped us find good experts in western markets based in America, who deal with NLP for 20 years or so.

In other words, you teach Akylai the Kyrgyz literature and everything that is available on the internet in the Kyrgyz language?

U.A.: In fact, we use everything that is available in the Kyrgyz language on the internet. We also uploaded the Wikipedia.

How big is your team working over the project?

T.T.: Now we have about ten persons in our team – they make up the expert board. We usually make a plan of things we should do and how to reach the goal step by step. We divide it into tasks and every person in the team has their own subtask. Once a week we make a conference call and share our progresses.

We have volunteers, 40-50 persons, who wanted to take part. But we will involve them later, once we get closer to development of the model itself.

How many girls are in your team?

T.T.: We have two girls in the expert board. But there is half of girls among volunteers. The problem is that few girls are represented in the artificial intelligence, among engineers and data scientists.

The project generally sound very ambitious. Ten people, isn’t it a small team to work over this big project?

T.T.: Both. We need to understand that the more people the more difficult it is to cooperate with each other. Because every team member has its main job. The project is more volunteer as we want to do a useful job. Therefore, our team is small, but we are planning to expand.

In this project, we do what western universities have done for 50-70 years. We divided the project into stages. Stage one is to create a corpus. This is the aggregation of all data in Kyrgyz.

The idea is as follows: first we should collect data, which will be used to teach. And we are collecting the maximum possible corpus. We are almost done with it.

U.A.: If you enter some Kyrgyz word there, the corpus will return a sentence, where this word was first used, in which book.

Akylai is an open source project. We will upload all data to open access to the project to all. We will make a website, call it the “Corpus of the Kyrgyz language” and upload all data that we have collected.  It will be available to any person.

Do you have a roadmap for the project?

T.T.: We do have an approximate roadmap, but you need to understand that we are moving towards the goal not in a straight, but a thorny path.

Stage one is to create the corpus. Then we will be creating simple language models. It will be based on the BERT (Bidirectional Encoder Representations from Transformers, a language model for automatic language processing – editor’s note) model.

And then we’ll be creating more complex models based on transformers. And then we will move to speech synthesis, language processing.

When should we expect the finished model?

T.T.: Not earlier than in one year.

U.A.: It will take almost one year to develop text answers. And it will take one more year to make Akylai speak like Alisa. We believe that we will have a big pool of partners by that time.

Who is your target audience? Who is your project for?

T.T.: Since it is the open source project, we do it for all people. It will be the property of Kyrgyzstanis, so that that could use Kyrgyz language in the digital world.

U.A.: Currently, news agencies and businesses are interested in it.

Is the project fully altruistic? Where do you take the funds for the project?

T.T.: All our team are Kyrgyzstanis, our fellow nationals, some of whom work abroad on good positions, ML/AI experts, and they are open to such projects.

All our team is already employed, but we can spend a couple of hours a day on the project. And we believe it will be useful in many aspects – in teaching, language development, artificial intelligence development for Kyrgyz speakers.

So, no one is paid?

T.T.: No, it’s fully volunteer project. But we propose other things instead. During the project, we will be writing scientific articles, publications. In the artificial intelligence world, such studies are highly appreciated. This is how we attract people. It is appreciated more than money because you leave the legacy behind: you were the first to create the corpus, the Kyrgyz model.

U.A.: It’s a good CV, portfolio for them, which will help them in their career.

How do you think your project can influence development of the Kyrgyz language? We have spoken about the importance of the Kyrgyz language for so many years, but nothing has changed. Can your project give impetus to it?

T.T.: Yes, we believe it will help a lot and will be a part of this development. Because now Kyrgyz language is not represented in the digital world. For example, when you use iPhone and Android, you have auto suggest in Russian, not in Kyrgyz. There is no good dictionary, no good language model, which can be used for learning. Everything is available for other languages. There has long been language processing for Russian and English.

And we believe that our project must help develop the Kyrgyz language. More people will be able to use Kyrgyz.

When you make a research, you should have data. But the problem is that we do not have the data, they are unprepared. We are solving the biggest problem in the market so that other data scientists or machine learning researchers could use our data and show their hypotheses, create models at a faster pace.

U.A.: And data scientists, once they want to do something else, could use our corpus as the starting point in their endeavours.

You have already touched on the speech synthesis topic. What will the voice of Akylai? Will it be womans or mans voice?

U.A.: We are still looking for the voice. As this is the woman’s name, the voice will be woman’s, respectively.

Will there be the option of a man’s voice? For example, I would like to have my assistant speak in a man’s voice.

Т.Т: Why not? It is a learnable thing.

How good is Akylai with ethical standards, tolerance? Can we expect sexism, racism from her? I suppose data that are used for the assistant’s learning contain many such things.

T.T.: This is how companies creating such projects usually do. First, they launch a beta version, then they listen to expert opinions and revise the product. For example, the same ChatGPT. It is then analysed by a group of specialists. 

We will open the beta version first, let it work, collect feedbacks, and then improve the project. But  given that our country has problems with sexism, feminism and other isms, we’ll probably have the assistant, who will behave the same after learning from such data.  

And who will work over eliminating such intolerant things?

T.T.: This is the last stage of development, and we will involve more people from the outside, who don’t know the code, but can suggest what’s wrong.

What about the sense of humour? Will it be there?

T.T.: Most likely, it won’t be there at primary stages. There will only be the question and answer function. But, in future, when we add speech synthesis, voice recognition, dialects, we can add some humour.

Will you be collecting the data that users will provide to Akylai – requests, texts, other information? What will you do with them?

U.A.: We’ll be using the data submitted by users, but it will be an impersonal process. We won’t collect personal data. We need only texts and their evaluation – how well they have been generated.

Are you planning to make profits from this project?

T.T.: We haven’t thought about commercialisation, so far. But some companies interested in cooperation have already contacted us. But we haven’t thought about it seriously.

Do you mean that Akylai is a free of charge project?

T.T.: So far, we are planning it to be free of charge and open source.

U.A.: But Akylai can learn some skills, like Alexa (Amazon’s voice assistant – Editor’s note). For example, phone calls, customer processing are narrow skills and they will probably be paid ones. But the main version will absolutely be free of charge.

In your opinion, how attractive is the project’s commercialisation? Given that it is only about the Kyrgyz language and is intended for the market of Kyrgyzstan only, which is not very big.

U.A.: In fact, it’s the point of approach. If you have data, say, in Kazakh language, you can do the same in Kazakh.

So, are you planning to work with other languages of Central Asia?

U.A.: Not yet, but it is possible, basically.

Will it be easier to work with other languages? Or do you need to start it all over again with every other language?

T.T.: I think the biggest problem is to collect data. And all the rest would be much easier after Akylai.

Do you have long-term plans to make a smart speaker, station for Akylai? Will it be able to connect to smart home?

U.A.: I think so. We have a proposal to make a speaker in the form of a yurt. Once we have our basic model ready, it will be easier to work over other things.

Which services do you plan to cooperate with? Will Akylai be able to work with Google, Spotify and other services?

T.T.: We haven’t thought about it so far. We had an idea of cooperating with state channels, websites, where hosts are available. So that Akylai could host some portion of a programme. For example, in Russia an AI presenter hosts weather forecasts. We can suggest this option as a variant.

But we haven’t thought about cooperation with other companies like Google, Spotify, Amazon. It is in a very long-term perspective.

Will Akylai be able to replace a journalist?

T.T.: It will, partially.

U.A.: Well, it will be helping, not replacing.

What is the future of Akylai, in your opinion?

T.T.: If we look far ahead, I think Akylai will be a good project, which can help develop the Kyrgyz language and people wishing to learn it. So, speaking to a speaker in the form of the yurt a person will learn the Kyrgyz. Akylai will be a kind of teacher.

Will Akylai have the function of Russian language?

T.T.: It will have functions of Kyrgyz, Russian and English languages.

And how will it work?

T.T.: There is so much data in English. It will be the easiest way to teach it. We will just make it so that Akylai could translate and convey the meaning.

Will it have some built-in apps to learn Kyrgyz language? What will they be based on?

U.A.: The Kyrgyz language does not have TOEFL, А1, В1, С1 levels. Therefore, we’ll need the help of the expert community, linguists here. What’s good is that people are open and willing to help out. Therefore, the standard, somewhat like the Kyrgyz TOEFL, needs to be developed.

I think such projects need state support so that they could work across the country, speaking of the language development. In your opinion, are the authorities open to that?

U.A.: We are actively cooperating with the Hi Tech Park, it’s the government entity. They are open and supportive. We are cooperating with universities. So far, so good. I am not saying that we need money from the state. But we need information, media support, assistance with data provision and collection.

If you have found a spelling error, please, notify us by selecting that text and pressing Ctrl+Enter.

Spelling error report
The following text will be sent to our editors: