Towards Modeling the Performance of a Fast Connected Components
Algorithm on Parallel Machines
We present and analyze a portable, high-performance algorithm for
finding connected components on modern distributed memory
multiprocessors. The algorithm is a hybrid of the classic DFS on the
subgraph local to each processor and a variant of the Shiloach-Vishkin
PRAM algorithm on the global collection of subgraphs. We implement
the algorithm in Split-C and measure performance on the the Cray T3D,
the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs
derived from cluster dynamics methods in computational physics. On a
256 processor Cray T3D, the implementation outperforms all previous
solutions by an order of magnitude. A characterization of graph
parameters allows us to select graphs that highlight key performance
features. We study the effects of these parameters and machine
characteristics on the balance of time between the local and global
phases of the algorithm and find that edge density, surface-to-volume
ratio, and relative communication cost dominate performance. By
understanding the effect of machine characteristics on performance,
the study sheds light on the impact of improvements in computational
and/or communication performance on this challenging problem.
Copyright 1995 by the
Association for Computing Machinery, Inc. (ACM).
PostScript version
HTML version
Supercomputing 1995 Proceedings (HTML)
Split-C code