Quotation Feinerer, Ingo, Pichler, Reinhard, Sallinger, Emanuel, Savenkov, Vadim. 2015. Using Statistics for Computing Joins with MapReduce. In Using Statistics for Computing Joins with MapReduce, Hrsg. Andrea Cali, Maria-Esther Vidal, S. 69-74. Lima, Peru: CEUR Workshop Proceedings.


RIS


BibTeX

Abstract

To cope with ever-growing amounts of data, the MapReduce model has been designed. It has been successfully applied to various computational problems. In recent years, multiple MapReduce algorithms have also been developed for computing joins -- one of the fundamental problems in managing and querying data. Current MapReduce algorithms for computing joins (in particular the ones based on the HyperCube algorithm of Afrati and Ullman) take into account the size of the tables and so-called "heavy hitters" (i.e., attribute values that occur particularly often) when determining an optimal distribution of computation tasks to the available reducers. However, in contrast to most state-of-the-art database management systems, more elaborate statistics on the distribution of data values are not used for optimization purposes. In this short paper, we initiate the study of the following questions: How is the performance of known MapReduce algorithms for join computation, in particular, for skewed data? Can more fine-grained statistics help to improve these methods? Our initial study shows that such enhancements can indeed be used to improve existing methods.

Tags

Press 'enter' for creating the tag

Publication's profile

Status of publication Published
Affiliation WU
Type of publication Contribution to conference proceedings
Language English
Title Using Statistics for Computing Joins with MapReduce
Title of whole publication Using Statistics for Computing Joins with MapReduce
Editor Andrea Cali, Maria-Esther Vidal
Page from 69
Page to 74
Location Lima, Peru
Publisher CEUR Workshop Proceedings
Year 2015
URL http://ceur-ws.org/Vol-1378/AMW_2015_paper_13.pdf
JEL C80

Associations

Projects
SPARQL Evaluations and Extensions (kurz: SEE): Subproject XSPARQL, SPARQL Update & Linked Data
People
Savenkov, Vadim (Details)
External
Feinerer, Ingo (Fachhochschule Wiener Neustadt, Austria)
Pichler, Reinhard (Vienna University of Technology, Austria)
Sallinger, Emanuel (Vienna University of Technology, Austria)
Organization
Information Business IN (Details)
Research areas (Ă–STAT Classification 'Statistik Austria')
1109 Information and data processing (Details)
Google Scholar: Search