Методика разработки лексико-семантических паттернов для извлечения терминологии научной предметной области

Methodology for Developing Lexical-Semantic Patterns for Extracting Scientific Subject Terminology

Article's language
Russian
Abstract
The article describes an approach to automating the extraction of terminology to enrich the ontology of the scientific subject domain from texts in Russian. The applicability of methods of automatic ontology enrichment from natural language texts depends on the characteristics of the text corpus and the language used. The specifics of the input language, characterized by strong inflectedness and free word order, and the absence of a large corpus of texts lead to the choice of a linguistic approach based on the use of lexico-semantic patterns. The features of the proposed methodology of information extraction are as follows: a) automatic replenishment of subject vocabulary on the basis of ontology and corpus of texts and annotating it with the system of semantic features; b) definition of a small set of initial structural meta-patterns, establishing conceptual contexts of ontological information extraction; c) automatic generation of a set of lexico-semantic patterns, defining lexical, semantic and syntactic properties of the contexts, on the set of structural meta-patterns.
DOI
10.31144/si.2307-6410.2022.n20.p25-46
Pages
25-46
File
Number