Recognizing Biomedical Named Entities using Skip-chain Conditional Random Fields
Date
2010-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Association for Computational Linguistics, Stroudsburg, PA
Abstract
Linear-chain Conditional Random Fields
(CRF) has been applied to perform the
Named Entity Recognition (NER) task in
many biomedical text mining and information
extraction systems. However, the
linear-chain CRF cannot capture long distance
dependency, which is very common
in the biomedical literature. In this paper,
we propose a novel study of capturing
such long distance dependency by defining
two principles of constructing skip-edges
for a skip-chain CRF: linking similar
words and linking words having typed
dependencies. The approach is applied to
recognize gene/protein mentions in the literature.
When tested on the BioCreAtIvE
II Gene Mention dataset and GENIA corpus,
the approach contributes significant
improvements over the linear-chain CRF.
We also present in-depth error analysis on
inconsistent labeling and study the influence
of the quality of skip edges on the labeling
performance.
Description
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
item.page.type
Conference Paper
item.page.format
Text
Keywords
RANDOM FIELD, INFORMATION RETRIEVAL, BIOMEDICAL DATA, ERROR ANALYSIS, LABELLING
Citation
Liu, J., Huang, M., & Zhu, X. (2010). Recognizing Biomedical Named Entities using Skip-chain Conditional Random Fields. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, Uppsala, SWE. (p. 10-18). Association for Computational Linguistics.