Journal of Food, Agriculture and Environment




Vol 11, Issue 1,2013
Online ISSN: 1459-0263
Print ISSN: 1459-0255


Identification of discriminative features for biological event extraction through linguistically informed feature selection 


Author(s):

Xing Zhang 1, Jingbo Xia 2, 3, Jonathan Webster 1, 2, Alex Chengyu Fang 1, 2*

Recieved Date: 2012-10-13, Accepted Date: 2013-01-28

Abstract:

Machine learning classifiers have achieved significant performance in the area of biomedical event extraction. For example, support vector machine (SVM) classifiers in the Turku Event Extraction System achieved the best performance in BioNLP09 task. Such classifiers typically rely on the use of large feature sets. Despite their robust performance, however, recent research has suggested that feature sets produced through automatic training need to be further optimized through size reduction in order to improve system performance. The current paper attempts to identify ways to reduce the size of feature sets by investigating the contribution of four different feature sets constructed according to lexical, grammatical, syntactic and semantic information. It reports an experiment based on BioNLP data prepared by the Turku team for biological event extraction and examines to what extent the dimension of the feature sets can be reduced while the classifier can still achieve similar performance. The importance of each feature set is evaluated through a SVM classifier. Our experiments demonstrate that feature set construction according to lexical, grammatical and syntactic information can effectively reduce the set size by as much as 86% while maintaining a comparable performance, hence significantly resolving the feature dimension issue. It is also shown through our experiments that a hybrid feature set constructed according to a combination of lexical and semantic information can achieve the second highest accuracy, hence indicating the useful feasibility of constructing an optimal feature set through dimension reduction and feature combination. We conclude that the experiments reported in the current paper have produced empirical evidence supporting the importance of linguistic information for the construction of high-performance feature sets in addition to domain knowledge for the task of biomedical event extraction. 

Keywords:

Turku event extraction system,  feature selection, event extraction, support vector machine, linguistic features, syntactic information, semantic information


Journal: Journal of Food, Agriculture and Environment
Year: 2013
Volume: 11
Issue: 1
Category: Environment
Pages: 1032-1036


Full text for Subscribers
Information:

Note to users

The requested document is freely available only to subscribers/registered users with an online subscription to the Journal of Food, Agriculture & Environment. If you have set up a personal subscription to this title please enter your user name and password. All abstracts are available for free.

Article purchasing

If you like to purchase this specific document such as article, review or this journal issue, contact us. Specify the title of the article or review, issue, number, volume and date of the publication. Software and compilation, Science & Technology, all rights reserved. Your use of this website details or service is governed by terms of use. Authors are invited to check from time to time news or information.


Purchase this Article:   20 Purchase PDF Order Reprints for 15

Share this article :