- 著者
- J. K. Cringean, R. England, G. A. Manson, P. WillettYoung M. Kim, Dik Lun Lee
- タイトル
- Parallel text searching in serial files using a
processor farmEfficient search methods for signature files
- 書籍
- Proceedings of the 13th International Conference on
Research and Development in Information Retrieval
- ページ
- 429-54
- 日時
- March 1990
- 概要
- The paper discusses the implementation of a parallel
text retrieval system using a microprocessor
network. The system is designed to allow fast
searching in document databases organised using the
serial file structure, with a very rapid initial
text signature search being followed by a more
detailed, but more time-consuming, pattern matching
search. The network is built from transputers, high
performance microprocessors developed specifically
for the construction of highly parallel computing
systems, which are linked together in a processor
farm. The paper discusses the design and
implementation of processor farms, and then reports
initial studies of the efficiency of searching that
can be achieved using this approach to text
retrieval from serial filesMany approaches have been proposed for searching
signature files efficiently. These methods apply
different techniques to reduce the number of block
signatures that need to be accessed and compared to
the query signature. Owing to the difference in the
performance measures and assumptions used in these
methods, it is difficult to determine which method
is the best under a common condition. In this paper,
we study three basic methods proposed in the
literature, namely, the indexed descriptor
file\cite{Pfaltz:indexedsignature}, the
two-level superimposed coding
scheme\cite{SacksDavis:twosuperimpose}, and the
partitioned signature file
approach\cite{Lee:partition}. The contribution of
this paper is two-fold. We present a uniform
analytic performance model so that these methods can
be compared fairly and consistently. We show that
the two-level superimposed coding scheme, if stored
in a transposed file\cite{Lee:signatureprocessor} is
the best in performance. We then introduce an
improved method, the multi-level superimposed coding
method, which is an extension to the two-level
superimposed coding method. We demonstrate that the
two-level method is not optimal, and obtain the
optimal number of levels for the multi-level method.
- コメント
- シグナチャの高速検索各方式の比較。
- カテゴリ
- Signature
Category: Signature
Institution: Ohio State University, Computer and Information
Science Research Center
Comment: シグナチャの高速検索各方式の比較。
Abstract: The paper discusses the implementation of a parallel
text retrieval system using a microprocessor
network. The system is designed to allow fast
searching in document databases organised using the
serial file structure, with a very rapid initial
text signature search being followed by a more
detailed, but more time-consuming, pattern matching
search. The network is built from transputers, high
performance microprocessors developed specifically
for the construction of highly parallel computing
systems, which are linked together in a processor
farm. The paper discusses the design and
implementation of processor farms, and then reports
initial studies of the efficiency of searching that
can be achieved using this approach to text
retrieval from serial filesMany approaches have been proposed for searching
signature files efficiently. These methods apply
different techniques to reduce the number of block
signatures that need to be accessed and compared to
the query signature. Owing to the difference in the
performance measures and assumptions used in these
methods, it is difficult to determine which method
is the best under a common condition. In this paper,
we study three basic methods proposed in the
literature, namely, the indexed descriptor
file\cite{Pfaltz:indexedsignature}, the
two-level superimposed coding
scheme\cite{SacksDavis:twosuperimpose}, and the
partitioned signature file
approach\cite{Lee:partition}. The contribution of
this paper is two-fold. We present a uniform
analytic performance model so that these methods can
be compared fairly and consistently. We show that
the two-level superimposed coding scheme, if stored
in a transposed file\cite{Lee:signatureprocessor} is
the best in performance. We then introduce an
improved method, the multi-level superimposed coding
method, which is an extension to the two-level
superimposed coding method. We demonstrate that the
two-level method is not optimal, and obtain the
optimal number of levels for the multi-level method.
Number: OSU-CISRC-3/90-TR8
Bibtype: TechReport
Booktitle: Proceedings of the 13th International Conference on
Research and Development in Information Retrieval
Author: J. K. Cringean
R. England
G. A. Manson
P. WillettYoung M. Kim
Dik Lun Lee
Pages: 429-54
Month: mar
Title: Parallel text searching in serial files using a
processor farmEfficient search methods for signature files
Year: 1990
Keyword: database management systems, file organisation,
information retrieval, information retrieval
systems, parallel programming, serial files,
processor farm, parallel text retrieval system,
microprocessor network, fast searching, document
databases, pattern matching search, transputers
Address: 2036 Neil Avenue Mall, Columbus, Ohio 43210