Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark

Moutafis, Panagiotis; Mavrommatis, George; Vassilakopoulos, Michael; Corral Liria, Antonio Leopoldo

doi:10.3390/ijgi10110763

Ficheros

ijgi-10-00763.pdf (3.698Mb)

Identificadores

URI: http://hdl.handle.net/10835/13072
ISSN: 2220-9964
DOI: 10.3390/ijgi10110763

Servicios

Fecha

2021-11-11

Resumen

Aiming at the problem of spatial query processing in distributed computing systems, the design and implementation of new distributed spatial query algorithms is a current challenge. Apache Spark is a memory-based framework suitable for real-time and batch processing. Spark-based systems allow users to work on distributed in-memory data, without worrying about the data distribution mechanism and fault-tolerance. Given two datasets of points (called Query and Training), the group K nearest-neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been actively studied in centralized environments and several performance improving techniques and pruning heuristics have been also proposed, while, a distributed algorithm in Apache Hadoop was recently proposed by our team. Since, in general, Apache Hadoop exhibits lower performance than Spark, in this paper, we present the first distributed GKNN query algo...

Palabra/s clave

big spatial data

spatial query processing

group nearest-neighbor query

Apache Spark

spatial query evaluation

Colecciones

Artículos de revista Dpto. Informática [90]

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional