著者
Siddhartha Chatterjee
タイトル
Compiling Data-Parallel Programs For Efficient Execution on Shared_memory Multiprocessors
日時
October 1991
概要
Data parallelism is well-suited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming. However, the fine-grained parallelism characteristic of data- parallel programs makes the efficient implementation of such languages on MIMD machines a challenging task due to the high overheads these machines incur at small grain sizes. We claim that compile-time analysis can be used to reduce these overheads, thereby allowing data-parallel code to run efficiently on MIMD machines. This dissertation reports on the design, implementation, and evaluation of an optimizing compiler for an applicative nested data-parallel language called VCODE. The target machine is the Encore Multimax, a coherent-cache shared-memory multiprocessor. The source language allows nested aggregate reductions, and permutations. Such features greatly expand the range of applications that can be cast into a data-parallel model. We present a small set of powerful compile-time techniques that reduce the overheads on MIMD machines in several ways: by increasing the grain size of the output program, by reducing synchronization and storage requirements, and by improving locality of reference. The two key ideas behind these optimizations are the symbolic analysis of loop structures, and the hierarchical clustering of the program graph, first by loop structure, and then by loop traversal patterns. This localizes synchronization and work distribution actions to well-defined points in the output code. Loop traversal patterns are then used to identify parallel loops and to eliminate unnecessary intermediate storage. The most significant aspect of the analysis techniques is that they are symbolic in nature and work in the presence of control constructs such as conditionals and recursion. A compiler has been implemented based on these ideas and has been used to compile a large number of benchmarks. The benchmark suite convers a variety of problem domains, including dense and sparse matrix operations, tree and graph algorithms, dynamic and data-dependent algorithms, and low-and medium-level image processing applications. The performance results substantiate the claim that data- parallel code can be made to run efficiently on MIMD machines, and demonstrate that compiler optimizations are essential for good performance.
カテゴリ
CMUTR
Category: CMUTR
Institution: Department of Computer Science, Carnegie
        Mellon University
Abstract: Data parallelism is well-suited from algorithmic, architectural,
        and linguistic considerations to serve as a basis for portable 
        parallel programming.
        However, the fine-grained parallelism characteristic of data-
        parallel programs makes the efficient implementation of such 
        languages on MIMD machines a challenging task due to the high 
        overheads these machines incur at small grain sizes.
        We claim that compile-time analysis can be used to reduce these
        overheads, thereby allowing data-parallel code to run 
        efficiently on MIMD machines.
        This dissertation reports on the design, implementation, and
        evaluation of an optimizing compiler for an applicative nested
        data-parallel language called VCODE.
        The target machine is the Encore Multimax, a coherent-cache
        shared-memory multiprocessor.
        The source language allows nested aggregate reductions, and 
        permutations.
        Such features greatly expand the range of applications that can 
        be cast into a data-parallel model.
        
        We present a small set of powerful compile-time techniques that 
        reduce the overheads on MIMD machines in several ways: by 
        increasing the grain size of the output program, by reducing 
        synchronization and storage requirements, and by improving 
        locality of reference.
        The two key ideas behind these optimizations are the symbolic
        analysis of loop structures, and the hierarchical clustering of
        the program graph, first by loop structure, and then by loop
        traversal patterns.
        This localizes synchronization and work distribution actions to
        well-defined points in the output code.
        Loop traversal patterns are then used to identify parallel loops
        and to eliminate unnecessary intermediate storage.
        The most significant aspect of the analysis techniques is that 
        they are symbolic in nature and work in the presence of control
        constructs such as conditionals and recursion.
        
        A compiler has been implemented based on these ideas and has 
        been used to compile a large number of benchmarks.
        The benchmark suite convers a variety of problem domains, 
        including dense and sparse matrix operations, tree and graph 
        algorithms, dynamic and data-dependent algorithms, and low-and 
        medium-level image processing applications.
        The performance results substantiate the claim that data-
        parallel code can be made to run efficiently on MIMD machines, 
        and demonstrate that compiler optimizations are essential for 
        good performance.
        
Number: CMU-CS-91-189
Bibtype: TechReport
Month: oct
Author: Siddhartha Chatterjee
Title: Compiling Data-Parallel Programs For Efficient Execution on
        Shared_memory Multiprocessors
Year: 1991
Address: Pittsburgh, PA
Super: @CMUTR