Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.13087/214
Title: A selective approach to index term weighting for robust information retrieval based on the frequency distributions of query terms
Authors: Arslan, Ahmet
Dinçer, Bekir Taner
Keywords: Chi-square goodness-of-fit
Index term weighting
Robustness in retrieval effectiveness
Selective information retrieval
Issue Date: 2019
Publisher: Springer
Abstract: A typical information retrieval (IR) system applies a single retrieval strategy to every information need of users. However, the results of the past IR experiments show that a particular retrieval strategy is in general good at fulfilling some type of information needs while failing to fulfil some other type, i.e., high variation in retrieval effectiveness across information needs. On the other hand, the same results also show that an information need that a particular retrieval strategy failed to fulfil could be fulfilled by one of the other existing retrieval strategies. The challenge in here is therefore to determine in advance what retrieval strategy should be applied to which information need. This challenge is related to the robustness of IR systems in retrieval effectiveness. For an IR system, robustness can be defined as fulfilling every information need of users with an acceptable level of satisfaction. Maintaining robustness in retrieval effectiveness is a long-standing challenge and in this article we propose a simple but powerful method as a remedy. The method is a selective approach to index term weighting and for any given query (i.e., information need) it predicts the best term weighting model amongst a set of alternatives, on the basis of the frequency distributions of query terms on a target document collection. To predict the best term weighting model, the method uses the Chi-square statistic, the statistic of the Chi-square goodness-of-fit test. The results of the experiments, performed using the official query sets of the TREC Web track and the Million Query track, reveal in general that the frequency distributions of query terms provide relevant information on the retrieval effectiveness of term weighting models. In particular, the results show that the selective approach proposed in this article is, on average, more effective and more robust than the most effective single term weighting model.
URI: https://doi.org/10.1007/s10791-018-9347-9
https://hdl.handle.net/20.500.13087/214
ISSN: 1386-4564
1573-7659
Appears in Collections:Malzeme Bilimi ve Mühendisliği Bölümü Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu

Show full item record

CORE Recommender

SCOPUSTM   
Citations

1
checked on Dec 28, 2022

WEB OF SCIENCETM
Citations

1
checked on Jul 14, 2022

Page view(s)

18
checked on Oct 3, 2022

Google ScholarTM

Check

Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.