Integrating Knowledge Acquisition and Language Acquisition

著者: Kevin Knight
タイトル: Integrating Knowledge Acquisition and Language Acquisition
日時: August 1991
概要: Very large knowledge bases (KB's) constitute an important step for artificial intelligence and will have significant effects on the field of natural language processing. This thesis addresses the problem of effectively acquiring two large bodies of formalized knowledge: knowledge about world (a KB), and knowledge about words (a lexicon). The central observation is that these two bodies of knowledge are highly redundant. For example, the syntactic behavior of a noun (or a verb) is highly correlated with certain physical properties of the object (or event) to which it refers. It should be possible to take advantage of this type of redundancy in order to greatly reduce both the time and expertise required to build large KB's and lexicons. This thesis describes LUKE, a software tool that allows a knowledge base builder to create an English language interface by associating words and phrases with KB entities. LUKE assumes no linguistic expertise on the part of the user, because that expertise is built directly into the tool itself. LUKE draws its power from a large set of heuristics about how words are typically used to describe the world. These heuristics exploit the redundancy between linguistic and world knowledge. When a word or phrase is associated with some KB entity, LUKE is able to accurately guess features of the word based on features of the word based on features of the KB entity. LUKE can also hypothesize new words and word senses based on the existence of others. All of LUKE's hypotheses are displayed to the user for verification, using a format designed to tap the user's basic linguistic intuitions. LUKE stores its lexicon in the KB. Truth maintenance links ensure that changes in the KB are automatically propagated to the lexicon. LUKE compiles lexical entries into data structures convenient for natural language parsing and generation programs. Lexicons acquired by LUKE have been used by KBNL, a knowledge- based natural language system, for applications in information retrieval, machine translation, and KB navigation. This work identifies several dozen heuristics that encode redundancies between linguistic representations and representations of world knowledge. It also demonstrates the usefulness of these heuristics in a working lexical acquisition system.
カテゴリ: CMUTR

Category: CMUTR Institution: Department of Computer Science, Carnegie Mellon University Abstract: Very large knowledge bases (KB's) constitute an important step for artificial intelligence and will have significant effects on the field of natural language processing. This thesis addresses the problem of effectively acquiring two large bodies of formalized knowledge: knowledge about world (a KB), and knowledge about words (a lexicon). The central observation is that these two bodies of knowledge are highly redundant. For example, the syntactic behavior of a noun (or a verb) is highly correlated with certain physical properties of the object (or event) to which it refers. It should be possible to take advantage of this type of redundancy in order to greatly reduce both the time and expertise required to build large KB's and lexicons. This thesis describes LUKE, a software tool that allows a knowledge base builder to create an English language interface by associating words and phrases with KB entities. LUKE assumes no linguistic expertise on the part of the user, because that expertise is built directly into the tool itself. LUKE draws its power from a large set of heuristics about how words are typically used to describe the world. These heuristics exploit the redundancy between linguistic and world knowledge. When a word or phrase is associated with some KB entity, LUKE is able to accurately guess features of the word based on features of the word based on features of the KB entity. LUKE can also hypothesize new words and word senses based on the existence of others. All of LUKE's hypotheses are displayed to the user for verification, using a format designed to tap the user's basic linguistic intuitions. LUKE stores its lexicon in the KB. Truth maintenance links ensure that changes in the KB are automatically propagated to the lexicon. LUKE compiles lexical entries into data structures convenient for natural language parsing and generation programs. Lexicons acquired by LUKE have been used by KBNL, a knowledge- based natural language system, for applications in information retrieval, machine translation, and KB navigation. This work identifies several dozen heuristics that encode redundancies between linguistic representations and representations of world knowledge. It also demonstrates the usefulness of these heuristics in a working lexical acquisition system. Number: CMU-CS-91-209 Bibtype: TechReport Month: aug Author: Kevin Knight Title: Integrating Knowledge Acquisition and Language Acquisition Year: 1991 Address: Pittsburgh, PA Super: @CMUTR