General Architecture for a Multilingual Information Retrieval system

Project GAMIR
Project Key Information

Project Status: set-up

Start Date: September 2022

End Date: August 2025

Budget (total): 5101.48 K€

Effort:  109.77 PY

Project-ID: C2021/2-3

Project Coordinator

Name: Ahmet Sonmezisik

Company: Turkcell Teknoloji

Country: Turkey

E-mail: rt.moc.lleckrut@kisizemnos.temha

Project Consortium

BEIA Consult International S.R.L, Belgium

Code Creator, Czech Republic

Palestra, Czech Republic

Satturn, Czech Republic

CleanWatts, Portugal

ISEP – Instituto Superior de Engenharia do Porto, Portugal

CIC (Consulting Informatico de Cantabria SL), Spain

ARD Group, Turkey

Carbon Consulting, Turkey

Hiperlink, Turkey

Inosens, Turkey

Turkcell Teknoloji, Turkey

Abstract

As the computation power is increasing and cost of storage is decreasing, the amount of day-to-day data we deal with is growing exponentially. According to IDC, the total data in the world by the end of 2021 will reach 74 zettabytes. In fact, IDC predicts the world’s data will grow to 175 zettabytes in 2025. If you attempted to download 175 zettabytes at the average current internet connection speed, it would take you 1.8 billion years to download. But without a way to retrieve the information and to be able to query it, the information we collect doesn’t help.IR systems help understanding the data and transforming it into knowledge. Many applications extending from search engines to genetic researches use information retrieval systems in combination with machine learning algorithms to produce relevant results.The proposed project idea; GAMIR(General Architecture for a Multilingual Information Retrieval System) offers a reference IR framework so that cutting-edge AI algorithms and NLP techniques can be used to refine the results of multilingual informational queries on different types of contents. The framework extracts the features of populated documents by using different AI models for different content types. These features are indexed in different language indexes and the elements of results are post-processed to refine the results of the query. Below is a very high level diagram showing the components of GAMIR framework.

The project also demonstrates the use of the framework by different use cases such as multilingual text, image and video queries which will help to make more qualified and detailed searches and search within media.

Overall, the GAMIR project is an initiative to unify the efforts of European countries to employ their own NLP and AI based tools in cooperation to create and lower the barrier of implementing IR related applications.

Information retrieval systems are the basis of many state of art automated applications. The main application area of IR systems are search engines.

Today, Google dominates the entire search market by %91 penetration. There are also other important search engines in the Far-East region, like Baidu in China, Yandex in Russia and Naver in South Korea. These search engine companies have some billion USD market caps. On the other hand, their engagement with users make these companies important players in digital markets since they have vast knowledge on user identity.

When we look at Europe, we see some initiatives like Qwant and Seznam that try to encompass nationwide search engine needs. However they are not yet big enough to cater the requirements of different European countries.

Moreover, GAMIR is not only a framework for search engines but also a general framework for other IR related applications like voice assistance, context based queries and other types of applications.

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.