Named Entity Extraction (NER) is the task of extracting information from text data that belongs to predefined categories, such as organizations names, place names, people's names, etc. Within the framework of the presented work, was developed an approach for the additional training of deep neural networks with the attention mechanism (BERT architecture). It is shown that the preliminary training of the language model in the tasks of recovering the masked word and determining the semantic relatedness of two sentences can significantly improve the quality of solving the problem of NER. One of the best results has been achieved in the task of extracting named entities on the RuREBus dataset. One of the key features of the described solution is the closeness of the formulation to real business problems and the selection of entities not of a general nature, but specific to the economic industry.
Named entity recognition in texts of administrative documents with deep neural networks
Named entity recognition in texts of administrative documents with deep neural networks
Article's languageRussian
Abstract
DOI10.31144/si.2307-6410.2020.n16.p137-148
UDK004.032.26
Issue
# 16,
Pages137-148
File
berezinbondarenko.pdf
(619.98 KB)