შოთა რუსთაველის საქართველოს ეროვნული სამეცნიერო ფონდი

Shota Rustaveli National Science Foundation of Georgia

For Science, for Future, for Georgia

საქართველოს განათლებისა და მეცნიერების სამინისტრო
GE

Successful projects and scientists

“One More Step Towards Georgian Talking Self-Developing Intellectual Corpus”

Two-year AR/122/4-105/14 project “One More Step Towards Georgian Talking Self-Developing Intellectual Corpus” of the Scientific-Educational Center for Georgian Language Technology at the Georgian Technical University (key personnel: 1. Principal Investigator – Director of the Scientific-Educational Center for Georgian Language Technology Professor Konstantine Pkhakadze; 2. Investigator – Leading investigator of the Scientific-Educational Center for Georgian Language Technology PhD Merab Chikvinidze;  3. Investigator – Leading investigator of the Scientific-Educational Center for Georgian Language Technology PhD Giorgi Chichua), which, also, is a one of the very important subproject of the long-term project “Technological Alphabet of the Georgian Language” of the Scientific-Educational Center, was funded with 200 000 GL by Shota Rustaveli National Science Foundation (from this amount 166 000 GL was granted by National Science Foundation, 34 000 GL - by the leading and co-funding  organization – by the Georgian Technical University).  

Before of the activation of the project “Technological Alphabet of the Georgian Language” i.e. before 2012 Georgian language in the sense of language technologies was very poorly supported. Moreover, even today, the technological support of the Georgian language is alarmingly lagging compared to almost any of those 21 European languages, which according to the research “Europe's Languages in the Digital Age” done by META-NET (http://www.meta-net.eu/), are under the danger of digital extinction. All these together clearly indicate the urgent necessity of reducing this lagging as much as it is possible and as soon as it is possible. Therefore, the AR/122/4-105/14 project “One More Step Towards Georgian Talking Self-Developing Intellectual Corpus” was aimed to reduce this lagging in the shortest possible period and, consequently, to radically change this state of affairs (in more detail see information located at the AR/122/4-105/14-17 address, where is also located the monographic work "Georgian Intellectual Web - Corpus: Goals, Methods, Recommendations" (authors: Konstantine Pkhakadze, Merab Chikvinidze, Giorgi Chichua, David Kurtskhalia, Ineza Beriashvili, Shalva Malidze), which was published within the project researches). In addition, in confine of the project on the basis of those new views and methods, which are elaborated within K. Pkhakadze’s Logical Grammar of Georgian Language, there was constructed Georgian intellectual web-corpus, in other words, first trial-applied version of the Georgian universal smart corpus (http://corpus.ge/), which in spite of the fact, that it is only first step toward constructing Georgian universal smart corpus, already, in trial mode, it is:

  1. The only one Georgian self-developing Corpus, which, at the same time, is the largest corpus of the modern Georgian written language. – This only one Georgian self-developing i.e. automatically developing corpus, for today, and today is 9 September of 2017 year, contains 270 886 730 word-tokens, among them 4 234 168 are different ones.
  2. The only one Georgian multilingual and multimodal corpus. – In it inbuilt way it is already functions the only one Georgian - English, Georgian – Russian, and Georgian – Abkhazian  trial parallel corpuses. At the same time, in it, it is already inbuilt trial corpus of the Georgian titred data and, also, the tools for constructing different types corpuses of Georgian speech data
  3. The technologically mostly supported Georgian corpus. – In the corpus, in inbuilt way, there are already functioned a quite many before not existed Georgian technological systems, which were created on the basis of some well known technological platforms and K. Pkhakadze’s Logical Grammar of Georgian Language and which are already partially equipped with abilities of thinking (i.e. with abilities of morphologic, syntactic and logical analysis of Georgian texts and, also, a generating results of this analysis), talking (i.e. with abilities to hear, what was said, and to give answer on it), voice control (i.e. with abilities to perform the received voice orders) and translating (as from voice, as well as from text) in/from Georgian language. All these, in summary, together with Georgian voice browser and voice managed reader systems, which are already elaboreted in confine of the project, make Georgian internet/computer/mobile space partially interactive i.e. adapted as for vision/speech restricted persons as well as foreign users.

Thus, it can be said, that the results obtained within the two-year subproject of  the project "Technological Alphabet of the Georgian Language", in other words, the results of the successfully completed AR/122/4-105/14 project “One More Step Towards Georgian Talking Self-Developing Intellectual Corpus” of the Scientific-Educational Center for Georgian Language Technology at the Georgian Technical University,  gives us the right to emphasize: this two-year project can be estimated as a groundbreaking step for scientific field engaged by the complete technology processing of the Georgian state languages. In particular, the project researches provide the results of fundamental importance in sense of the protection from digital death of the Georgian state languages (here we have assumed as Georgian, as well as Abkhazian languages) and, thus, these researches should be estimated as:

1 One more very important step towards defending the Georgian language from the real danger of the digital extinction;

  1. First very important step toward defending the Abkhazian language from the real danger of the digital extinction.

It is also to emphasized, that  in addition to the above-mentioned monographic work "Georgian Intellectual Web - Corpus: Goals, Methods, Recommendations" the project team has published more than 40 scientific articles  and have made more than 20 scientific reports at international conferences, where were described different aspects of the results of the project.

Finally, as conclusion, we underline the following: Today, in forthcoming and partially already came digital age, by the thanks of the results of the AR/122/4-105/14 project, Georgian language becomes more protected from  threats of digital extiction than it was before. However, as it was already mentioned above, we have to emphasized also: For today, Georgian, even more Abkhazian language is alarmingly lagging in the sense of the language technology support compared with technologically advanced languages. This is due to the fact that in the recent years technological elaboration of these languages have made big and quick steps forward. Thus, all these make very clear the necessity of further rapid development of the already ongoing local processes for purposes of providing complete technological elaboration of the Georgian and Abkhazian languages.