Computing Immutable Regions for Subspace Top-K Queries
Given a high-dimensional dataset, a top-k query can be used to shortlist the k tuples that best match the user's preferences. Typically, these preferences regard a subset of the available dimensions (i.e., attributes) whose relative significance is expressed by user-specified weights. Along with the query result, we propose to compute for each involved dimension the maximal deviation to the corresponding weight for which the query result remains valid. The derived weight ranges, called immutable regions, are useful for performing sensitivity analysis, for finetuning the query weights, etc.
In this presentation, we focus on top-k queries with linear preference functions over the queried dimensions. We codify the conditions under which changes in a dimension's weight invalidate the query result, and develop algorithms to compute the immutable regions. In general, this entails the examination of numerous non-result tuples. To reduce processing time, we introduce a pruning technique and a thresholding mechanism that allow the immutable regions to be determined correctly after examining only a small number of non-result tuples. We demonstrate empirically that the two techniques combine well to form a robust and highly resource-efficient algorithm. We verify the generality of our findings using real high-dimensional data from different domains (documents, images, etc) and with different characteristics.
Prof. Kyriakos Mouratidis
Date & Time
24 Oct 2013 (Thursday) 11:00 - 12:00
J206 (University of Macau)
Department of Computer and Information Science
Kyriakos Mouratidis is an Associate Professor at the School of Information Systems at Singapore Management University (SMU). He received the B.Sc. degree from the Computer Science Department at Aristotle University of Thessaloniki (AUTH) in 2002, and the Ph.D. degree from the Department of Computer Science and Engineering at Hong Kong University of Science and Technology (HKUST) in 2006. His main research area is spatial and spatiotemporal databases, with a focus on continuous query processing, road network databases, and spatial optimization problems. He has also worked on preference-based queries, wireless broadcasting systems, and outsourced database authentication. He has published at and been involved in the programme committees of most major database conferences, including SIGMOD, VLDB and ICDE.