With the increasing of knowledge resources and the demand of knowledge mining, it is important to enrich the semantics of academic literature, which can not only help users quickly and accurately locate the knowledge units in scientific papers, but also can help readers to conduct comparative analysis and strategic reading. Therefore, it's essential to identify and describe the components and their semantic functions within scientific papers for promoting knowledge discovery and knowledge services.
Scientific paper content ontology is the standardized knowledge representation of scientific papers content structure and semantic function. It is of great significance for the deep indexing, information extraction and knowledge mining of scientific papers. After a review on the existing researches of paper components and attributions as well as the published ontologies, the existing ontologies, limited to the fundamental theories, have some deficiencies in revealing the deep semantics of information embedded in scientific papers. In order to design and build a component ontology, which is more suitable for information extraction, the functional unit theory should be considered.
The functional unit theory is the fundamental theory that combines information tasks and genre analysis, which is more suitable for the development of scientific paper content ontology oriented to knowledge discovery. Based on the functional unit theory, a novel ontology named Scientific Paper Functional Units Ontology(FUO)is designed. After reviewing the 41 functional units, 28 components are redesigned, including background, goal, motivation, method description, conclusion, contributions, etc. Based on the components, 12 classes and 28 subclasses are designed. The attributions of the classes are also designed by refering to Bio Event ontology and News Event ontology. The classes and attributions of FUO are formally represented with protégé 5.1. Then 10 research papers from JASIST are randomly selected to conduct a deep indexing experiment by using the GATE, a semantic annotation software. Finally, the distribution of different functional units within scientific papers is analyzed.
The originality of this research lies in the clear definition of the functional units with their attributes and the FUO which can reveal semantic features of scientific papers components in a more comprehensive and detailed manner. The results have also proved the potential availability of FUO for deep semantic indexing, semantic retrieval and knowledge discovery. This research deepens our understanding on scientific paper as a knowledge container from the perspective of information science. The limitation of this paper is the lack of considering the semantic relationships between content components of scientific paper. More detailed definition of the relationships and new components such as interactive tables, datasets, audios and videos should be studied in the future. 4 figs. 9 tabs. 47 refs．