Efficient Distance Join Query Processing in Distributed Spatial Data Management Systems

García-García, Francisco; Corral, Antonio; Iribarne, Luis; Vassilakopoulos, Michael; Manolopoulos, Yannis

dc.contributor.author	García-García, Francisco
dc.contributor.author	Corral, Antonio
dc.contributor.author	Iribarne, Luis
dc.contributor.author	Vassilakopoulos, Michael
dc.contributor.author	Manolopoulos, Yannis
dc.date.accessioned	2023-02-24T11:13:29Z
dc.date.available	2023-02-24T11:13:29Z
dc.date.issued	2020
dc.identifier.issn	0020-0255
dc.identifier.uri	http://hdl.handle.net/10835/14372
dc.description.abstract	Due to the ubiquitous use of spatial data applications and the large amounts of such data these applications use, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Distance Join Queries (DJQs) are important and frequently used operations in numerous applications, including data mining, multimedia and spatial databases. DJQs (e.g., k Nearest Neighbor Join Query, k Closest Pair Query, ε Distance Join Query, etc.) are costly operations, since they involve both the join and distance-based search, and performing DJQs efficiently is a challenging task. Recent Big Data developments have motivated the emergence of novel technologies for distributed processing of large-scale spatial data in clusters of computers, leading to Distributed Spatial Data Management Systems (DSDMSs). Distributed cluster-based computing systems can be classified as Hadoop-based or Spark-based systems. Based on this classification, in this paper, we compare two of the most recent and leading DSDMSs, SpatialHadoop and LocationSpark, by evaluating the performance of several existing and newly proposed parallel and distributed DJQ algorithms under various settings with large spatial real-world datasets. A general conclusion arising from the execution of the distributed DJQ algorithms studied is that, while SpatialHadoop is a robust and efficient system when large spatial datasets are joined (since it is built on top of the mature Hadoop platform), LocationSpark is the clear winner in total execution time efficiency when medium spatial datasets are combined (due to in-memory processing provided by Spark). However, LocationSpark requires higher memory allocation when large spatial datasets are involved in DJQs (even more so when k and ε are large). Finally, this detailed performance study has demonstrated that the new distributed DJQ algorithms we have proposed are efficient, robust and scalable with respect to different parameters, such as dataset sizes, k, ε and number of computing nodes.	es_ES
dc.language.iso	en	es_ES
dc.publisher	Elsevier	es_ES
dc.relation	info:eu-repo/grantAgreement/ES/MINECO/TIN2013-41576-R/ES/Evolución de sistemas dinámicos en la nube: Un escenario marco hacia las interfaces de usuario inteligentes/ESDNEMIUI	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.source	Information Sciences (Elsevier), Volume 512, February 2020, Pages 985-1008	es_ES
dc.title	Efficient Distance Join Query Processing in Distributed Spatial Data Management Systems	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.ins.2019.10.030	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES

Ficheros en el ítem

Nombre:: 2020-Efficient Distance Join ...
Tamaño:: 3.503Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos de revista Proyecto TIN2013-41576-R [20]

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional