Please use this identifier to cite or link to this item:
Title: Improved inverse gravity moment term weighting for text classification
Authors: Dogan, Turgut
Uysal, Alper Kürşat
Keywords: Inverse gravity moment (IGM)
Term weighting
Text classification
Issue Date: 2019
Publisher: Elsevier Ltd
Abstract: Text classification is one of the popular high dimensional classification problems where providing better feature vector representations explicitly improve classification performances. Thus, assigning appropriate weights to features or terms are crucial for obtaining effective feature vector representations. The methods used for weighting terms in text classification are called term weighting schemes. Although there exist some term weighting schemes for text classification, they are not fully effective and researchers still focus on proposing new term weighting schemes. In this study, two novel term weighting schemes namely SQRT_TF-IGM imp and TF-IGM imp derived from standard inverse gravity moment formula are proposed to improve weighting behaviors of existing TF-IGM scheme especially for some extreme cases. The performances of proposed schemes are compared with two standard IGM based schemes and five other state-of-the-art term weighting methods on both unbalanced (Reuters-21578) and balanced (20 Mini Newsgroups and 20 Newsgroups) datasets with KNN, SVM, and NN classifiers. Micro-F1 and macro-F1 are used as success measures. The experiments are conducted with various different feature sizes to examine the effects of the feature size on the success of weighting. The experimental results showed that the proposed SQRT_TF-IGM imp method generally outperformed all schemes including both standard TF-IGM and SQRT_TF-IGM schemes. However, the proposed TF-IGM imp scheme also showed mostly better performance than standard TF-IGM. To demonstrate validity of the proposed weighting scheme having maximum performance, t-test is also used and it can be stated that the performance gains obtained by the proposed SQRT_TF-IGM imp weighting scheme compared to standard SQRT_TF-IGM are statistically significant. © 2019 Elsevier Ltd
ISSN: 0957-4174
Appears in Collections:Bilgisayar Mühendisliği Bölümü Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Show full item record

CORE Recommender


checked on Jun 22, 2022

Page view(s)

checked on Oct 3, 2022

Google ScholarTM



Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.