- 著者
- Siddhartha Chatterjee
- タイトル
- Compiling Data-Parallel Programs For Efficient Execution on
Shared_memory Multiprocessors
- 日時
- October 1991
- 概要
- Data parallelism is well-suited from algorithmic, architectural,
and linguistic considerations to serve as a basis for portable
parallel programming.
However, the fine-grained parallelism characteristic of data-
parallel programs makes the efficient implementation of such
languages on MIMD machines a challenging task due to the high
overheads these machines incur at small grain sizes.
We claim that compile-time analysis can be used to reduce these
overheads, thereby allowing data-parallel code to run
efficiently on MIMD machines.
This dissertation reports on the design, implementation, and
evaluation of an optimizing compiler for an applicative nested
data-parallel language called VCODE.
The target machine is the Encore Multimax, a coherent-cache
shared-memory multiprocessor.
The source language allows nested aggregate reductions, and
permutations.
Such features greatly expand the range of applications that can
be cast into a data-parallel model.
We present a small set of powerful compile-time techniques that
reduce the overheads on MIMD machines in several ways: by
increasing the grain size of the output program, by reducing
synchronization and storage requirements, and by improving
locality of reference.
The two key ideas behind these optimizations are the symbolic
analysis of loop structures, and the hierarchical clustering of
the program graph, first by loop structure, and then by loop
traversal patterns.
This localizes synchronization and work distribution actions to
well-defined points in the output code.
Loop traversal patterns are then used to identify parallel loops
and to eliminate unnecessary intermediate storage.
The most significant aspect of the analysis techniques is that
they are symbolic in nature and work in the presence of control
constructs such as conditionals and recursion.
A compiler has been implemented based on these ideas and has
been used to compile a large number of benchmarks.
The benchmark suite convers a variety of problem domains,
including dense and sparse matrix operations, tree and graph
algorithms, dynamic and data-dependent algorithms, and low-and
medium-level image processing applications.
The performance results substantiate the claim that data-
parallel code can be made to run efficiently on MIMD machines,
and demonstrate that compiler optimizations are essential for
good performance.
- カテゴリ
- CMUTR
Category: CMUTR
Institution: Department of Computer Science, Carnegie
Mellon University
Abstract: Data parallelism is well-suited from algorithmic, architectural,
and linguistic considerations to serve as a basis for portable
parallel programming.
However, the fine-grained parallelism characteristic of data-
parallel programs makes the efficient implementation of such
languages on MIMD machines a challenging task due to the high
overheads these machines incur at small grain sizes.
We claim that compile-time analysis can be used to reduce these
overheads, thereby allowing data-parallel code to run
efficiently on MIMD machines.
This dissertation reports on the design, implementation, and
evaluation of an optimizing compiler for an applicative nested
data-parallel language called VCODE.
The target machine is the Encore Multimax, a coherent-cache
shared-memory multiprocessor.
The source language allows nested aggregate reductions, and
permutations.
Such features greatly expand the range of applications that can
be cast into a data-parallel model.
We present a small set of powerful compile-time techniques that
reduce the overheads on MIMD machines in several ways: by
increasing the grain size of the output program, by reducing
synchronization and storage requirements, and by improving
locality of reference.
The two key ideas behind these optimizations are the symbolic
analysis of loop structures, and the hierarchical clustering of
the program graph, first by loop structure, and then by loop
traversal patterns.
This localizes synchronization and work distribution actions to
well-defined points in the output code.
Loop traversal patterns are then used to identify parallel loops
and to eliminate unnecessary intermediate storage.
The most significant aspect of the analysis techniques is that
they are symbolic in nature and work in the presence of control
constructs such as conditionals and recursion.
A compiler has been implemented based on these ideas and has
been used to compile a large number of benchmarks.
The benchmark suite convers a variety of problem domains,
including dense and sparse matrix operations, tree and graph
algorithms, dynamic and data-dependent algorithms, and low-and
medium-level image processing applications.
The performance results substantiate the claim that data-
parallel code can be made to run efficiently on MIMD machines,
and demonstrate that compiler optimizations are essential for
good performance.
Number: CMU-CS-91-189
Bibtype: TechReport
Month: oct
Author: Siddhartha Chatterjee
Title: Compiling Data-Parallel Programs For Efficient Execution on
Shared_memory Multiprocessors
Year: 1991
Address: Pittsburgh, PA
Super: @CMUTR