Системная Информатика, № 13

Download

On the need of ontology for classification and navigation in the computer languages universe

The universe (virtual world) of computer languages includes thousands of languages within different classes – programming, specification, simulation, etc. A research project targeting onto development of the classification of this universe was under development in the period from 2008 to 2013 at the A.P. Ershov Institute of Informatics Systems. This paper presents the sum of the main theoretical research results of this research on the classification of computer languages (which, in our opinion, are still valid and promising) and discusses new approaches to the development of a computer-aided support for this classification (based on machine learning and natural language processing). An importance of further development of the classification project is based on the need of better understanding the universe of the computer languages and more objective approach to language choice to implement software projects.
Download

Architecture and main features of PolarDB library of structured data manipulation tools

The article presents the PolarDB library of structured data manipulation tools, created in the Institute of Information Systems of the SB RAS. It is designed to construct systems for structuring, storing and processing data, including data of large volume. The library is built on the previously developed recursive typing system and covers a number of essential tasks, such as data structuring, serialization, data mapping to byte streams, index constructions, block implementation of dynamic byte streams, data distribution and data processing, backup and recovery. PolarDB library allows users to create effective solutions for specialized databases in different paradigms: sequences, relational tables, storage key-value, graph structures.
Download

A lexico-semantic templates as a tool for declarative description language constructs linguistic text analysis

The paper is devoted to problems of extraction of language constructs, including numerical and symbolic data which are significant for a given domain. The approach of a description of natural language constructs through lexico-semantic templates is presented and the language of templates construction on the base of language YAML is considered. The lexico-semantic template is a structural pattern of required language construction with a specified structure and lexico-semantic properties. In the case of successful matching of the template with a piece of text lexical object is formed to which formal (positional) and semantic (class and properties) characteristics are attributed. In the paper architecture of web-editor for development and testing of lexico-semantic templates is presented and establishing of two specialized dictionaries is described: 1) Dictionary of names of institutions, positions and it’s an abbreviation, 2) dictionary of numerical/temporal constructions. Designed technology supports lexico-semantic analysis of text on the base of templates and can be used independently for the task of information extraction from small pieces of text as well as part of other systems of information extraction. The proposed method is efficient for recognition of parametric constructions contain an estimate of parameter values (entities or events) in a domain.
Download

Problems of Extracting Terminological Core of the Subject Domain from Electronic Encyclopedic Dictionaries

The paper is devoted to the problems of automatic construction of the terminological system of the subject domain. A method for extracting domain terms from electronic encyclopedic data sources is proposed. The peculiarity of the proposed approach is a thorough analysis of the term structure, recognition of errors based on their linguistic classification, automatic generation of lexical-syntactic patterns representing multi-component terms, and the use of a set of heuristic methods for processing "special" terms. By analyzing encyclopedic dictionaries, a reference list of concept names is automatically formed, which is used to assess the quality of the dictionaries being developed.