July 1, 2022

Domain Knowledge-Based Information Retrieval for Engineering Technical Documents

Shang-hsien Hsieh; Ken-yu Lin; Nai-wen Chi; Hsien-tang Lin. (2015). Domain Knowledge-Based Information Retrieval for Engineering Technical Documents. Ontology In The AEC Industry. A Decade Of Research And Development In Architecture, Engineering And Construction, chapter 1.

Technical documents with complicated structures are often produced in architecture/engineering/construction (AEC) projects and research. Information retrieval (IR) techniques provide a possible solution for managing the ever-growing volume and contexts of the knowledge embedded in these technical documents. However, applying a general-purpose search engine to a domain-specific technical document collection often produces unsatisfactory results. To address this problem, we research the development of a novel IR system based on passage retrieval techniques. The system employs domain knowledge to assist passage partitioning and supports an interactive concept-based expanded IR for technical documents in an engineering field. The engineering domain selected in this case is earthquake engineering, although the technologies developed and employed by the system should be generally applicable to many other engineering domains that use technical documents with similar characteristics. We carry out the research in a three-step process. In the first step, since the final output of this research is an IR system, as a prerequisite, we created a reference collection which includes 111 earthquake engineering technical documents from Taiwan's National Center for Research on Earthquake Engineering. With this collection, the effectiveness of the IR system can be further evaluated onceit is developed. In the second step, the research focuses on creating a base domain ontology using an earthquake-engineering handbook to represent the domain knowledge and to support the target IR system with the knowledge. In step three, the research focuses on the semantic querying and retrieval mechanisms and develops the OntoPassage approach to help with the mechanisms. The OntoPassage approach partitions a document into smaller passages, each with around 300 terms, according to the main concepts in the document. This approach is then used to implement the target domain knowledge-based IR system that allows users to interact with the system and perform concept-based query expansions. The results show that the proposed domain knowledge-based IR system can achieve not only an effective IR but also inform search engine users with a clear knowledge representation.


Architecture; Construction; Engineering; Knowledge Based Systems; Ontologies (artificial Intelligence); Query Processing; Search Engines; Knowledge Representation; Concept-based Query Expansions; Base Domain Ontology; Earthquake Engineering; General-purpose Search Engine; Aec Projects; Architecture/engineering/construction Projects; Complicated Structures; Technical Documents; Domain Knowledge-based Information Retrieval