Scaling Distributed Database Joins by Decoupling Computation and Communication

 Scaling Distributed Database Joins by Decoupling Computation and Communication

Abhirup Chakraborty, ACM Member

Abstract

To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper proposes frameworks and algorithms for processing distributed joins—a compute- and communication-intensive workload in modern data-intensive systems. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.


Keywords

Distributed joins; Multi-core; Database; Pipelining; Parallel processing


Issue URL : https://airccse.org/journal/ijdms/current2023.html

Abstract URL: https://aircconline.com/abstract/ijdms/v15n1/15123ijdms02.html

Full Article: https://aircconline.com/ijdms/V15N1/15123ijdms02.pdf

http://airccse.org/journal/ijdms/index.html

#distributedjoins #multicore #database #pipelining #parallelprocessing



Comments

Popular posts from this blog

3rd International Conference on Computer Science, Engineering and Artificial Intelligence (CSEAI 2025)

A REVIEW OF THE USE OF R PPROGRAMMING FOR DATA SCIENCE RESEARCH IN BOTSWANA

HYBRID ENCRYPTION ALGORITHMS FOR MEDICAL DATA STORAGE SECURITY IN CLOUD DATABASE