KISTI Unveils New Version of KONI, Science and Technology-Specialized Generative Large Language Model

이정훈 2024-07-31 View. 100,862

KISTI Unveils New Version of KONI, Science and Technology-Specialized Generative Large Language Model
- Expectation for Accelerating Innovation in Scientific Research through Large-Scale Training Data and RAG Technology -

The Korea Institute of Science and Technology Information (KISTI, President Kim Jae-soo) announced on July 31 the release of a new version of its science and technology-specialized generative large language model (LLM), KONI (KISTI Open Natural Intelligence). The new version of KONI offers significantly enhanced performance compared to the original model developed in December of last year and is now publicly available for anyone working in science and technology fields to use.

As a national research institute specializing in science and technology information, KISTI continues to lead the development of LLMs specialized in science and technology by consistently collecting and analyzing various national science and technology big data. The newly released version of KONI is available in two models: KONI-Llama3-8B (pre-trained LLM) and KONI-Llama3-8B-Instruct (Chat model).

The newly released models were trained with data containing over twice the amount of science and technology information compared to previous models, significantly improving performance in various tasks such as reasoning, writing, and comprehension. Notably, KONI achieved the top rank in the LogicKor benchmark leaderboard (https://lk.instruct.kr/) for comprehensive reasoning ability in Korean LLMs of the same size. KONI LLM, with a model size of just 8B, surpassed a LogicKor score of 8 points, reaching 8.21 points, setting a new milestone in Korean LLM development. KISTI KONI LLMs are available for unrestricted download and use through Hugging Face (https://huggingface.co/KISTI-KONI) and KISTI's AI Data Sharing and Utilization Service AIDA (https://aida.kisti.re.kr/).

Additionally, KISTI has developed Retrieval-Augmented Generation (RAG) technology, utilizing data from existing information service systems to minimize hallucination phenomena typically occurring in LLMs. By integrating RAG technology with KONI, KISTI has developed a more reliable question-answering system. Performance verification and additional training based on feedback from researchers have further improved KONI's capabilities, particularly in the domain of science and technology-related laws, regulations, and guidelines.

Moving forward, KISTI plans to continuously collect national science and technology data and periodically release new versions of KONI in various model sizes with enhanced performance. KISTI will also develop and distribute domain-specific LLMs reflecting the needs of public institutions in sectors such as defense, energy, and policy, as well as national research institutions.

KISTI President Kim Jae-soo stated, “With the new version of KONI, we plan to revolutionize the distribution and analysis systems of science and technology information across various fields, including science, technology, and industry, and ultimately build an autonomous AI researcher—an AI Agent framework based on KONI—to support scientific discoveries.”

Attached files(1)

[Attachment] Photo 1.png View

Prev KISTI Signs MOU with the Indonesian National Research and Innovation Agency (BRIN) to Build S&T Data Infrastructure Next KISTI Hosts 2024 DATA·AI Camp for Kuwaiti High School Students