Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.13087/1541
Title: The effects of globalisation techniques on feature selection for text classification
Authors: Parlak, Bekir
Uysal, Alper Kürşat
Keywords: Feature selection
globalisation techniques
text classification
Issue Date: 2020
Publisher: Sage Publications Ltd
Abstract: Text classification (TC) is very important and critical task in the 21th century as there exist high volume of electronic data on the Internet. In TC, textual data are characterised by a huge number of highly sparse features/terms. A typical TC consists of many steps and one of the most important steps is undoubtedly feature selection (FS). In this study, we have comprehensively investigated the effects of various globalisation techniques on local feature selection (LFS) methods using datasets with different characteristics such as multi-class unbalanced (MCU), multi-class balanced (MCB), binary-class unbalanced (BCU) and binary-class balanced (BCB). The globalisation techniques used in this study are summation (SUM), weighted-sum (AVG), and maximum (MAX). To investigate the effect of globalisation techniques, we used three LFS methods named as Discriminative Feature Selection (DFSS), odds ratio (OR) and chi-square (CHI2). In the experiments, we have utilised four different benchmark datasets named as Reuters-21578, 20Newsgroup., Enron1, and Polarity in addition to Support Vector Machines (SVM) and Decision Tree (DT) classifiers. According to the experimental results, the most successful globalisation technique is AVG while all situations are taken into account. The experimental results indicate that DFSS method is more successful than OR and CHI2 methods on datasets with MCU and MCB characteristics. However, CHI2 method seems more accurate than OR and DFSS methods on datasets with BCU and BCB characteristics. Also, SVM classifier performed better than DT classifier in most cases.
URI: https://doi.org/10.1177/0165551520930897
https://hdl.handle.net/20.500.13087/1541
ISSN: 0165-5515
1741-6485
Appears in Collections:Matematik Bölümü Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu

Show full item record

CORE Recommender

WEB OF SCIENCETM
Citations

4
checked on Jun 22, 2022

Page view(s)

14
checked on Oct 3, 2022

Google ScholarTM

Check

Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.