Recognizing Biomedical Named Entities using Skip-chain Conditional Random Fields

Date

2010-07

Journal Title

Journal ISSN

Volume Title

Publisher

Association for Computational Linguistics, Stroudsburg, PA

Abstract

Linear-chain Conditional Random Fields (CRF) has been applied to perform the Named Entity Recognition (NER) task in many biomedical text mining and information extraction systems. However, the linear-chain CRF cannot capture long distance dependency, which is very common in the biomedical literature. In this paper, we propose a novel study of capturing such long distance dependency by defining two principles of constructing skip-edges for a skip-chain CRF: linking similar words and linking words having typed dependencies. The approach is applied to recognize gene/protein mentions in the literature. When tested on the BioCreAtIvE II Gene Mention dataset and GENIA corpus, the approach contributes significant improvements over the linear-chain CRF. We also present in-depth error analysis on inconsistent labeling and study the influence of the quality of skip edges on the labeling performance.

Description

Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

Keywords

RANDOM FIELD, INFORMATION RETRIEVAL, BIOMEDICAL DATA, ERROR ANALYSIS, LABELLING

Citation

Liu, J., Huang, M., & Zhu, X. (2010). Recognizing Biomedical Named Entities using Skip-chain Conditional Random Fields. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, Uppsala, SWE. (p. 10-18). Association for Computational Linguistics.

DOI