Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark
Ficheros
Identificadores
Compartir
Metadatos
Mostrar el registro completo del ítemAutor
Moutafis, Panagiotis; Mavrommatis, George; Vassilakopoulos, Michael; Corral Liria, Antonio LeopoldoFecha
2021-11-11Resumen
Aiming at the problem of spatial query processing in distributed computing systems, the design and implementation of new distributed spatial query algorithms is a current challenge. Apache Spark is a memory-based framework suitable for real-time and batch processing. Spark-based systems allow users to work on distributed in-memory data, without worrying about the data distribution mechanism and fault-tolerance. Given two datasets of points (called Query and Training), the group K nearest-neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been actively studied in centralized environments and several performance improving techniques and pruning heuristics have been also proposed, while, a distributed algorithm in Apache Hadoop was recently proposed by our team. Since, in general, Apache Hadoop exhibits lower performance than Spark, in this paper, we present the first distributed GKNN query algo...
Palabra/s clave
big spatial data
spatial query processing
group nearest-neighbor query
Apache Spark
spatial query evaluation