Enhancing SpatialHadoop with Closest Pair Queries
StadisticsView Usage Statistics
MetadataShow full item record
Author/sGarcía García, Francisco; Corral Liria, Antonio Leopoldo; Iribarne Martínez, Luis Fernando; Vassilakopoulos, Michael; Manolopoulos, Yannis
Given two datasets P and Q, the K Closest Pair Query (KCPQ) finds the K closest pairs of objects from P ×Q. It is an operation widely adopted by many spatial and GIS applications. As a combination of the K Nearest Neighbor (KNN) and the spatial join queries, KCPQ is an expensive operation. Given the increasing volume of spatial data, it is difficult to perform a KCPQ on a centralized machine efficiently. For this reason, this paper addresses the problem of computing the KCPQ on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently, and proposes a novel algorithm in SpatialHadoop to perform efficient parallel KCPQ on large-scale spatial datasets. We have evaluated the performance of the algorithm in several situations with big synthetic and real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal.