Ancient book catalogs and ancient literature are important sources and evidence material for many Humanities and Social Science research. Traditional research related to ancient books usually relies on experts expertise or subjective judgment. The emerging Digital Humanities can help scholars to gather relevant information as completely as possible. It can help to raise research questions from bigger spatio-temporal scenes and conduct intensive research across a variety of subjects with unprecedented perspectives. This requires developing a digital humanity platform with relatively complete data and more applicable advanced technologies. A data model that can integrate different formats of different kinds of ancient book catalog data is the basis of this platform.
In this paper, we are proposing a data model of Chinese ancient books using cutting-edge ontology and linked data technology to support researchers to accomplish a so called “evidence-based practice”. The data model is based on the knowledge of classical bibliography combining with philology, bibliology, and so on. This research also intends to explore the new methods of the use of the ancient catalog and documentation to support researches in various disciplines, such as historical research, linguistics, sociology, literature, culture and arts. Web ontology and linked data are the latest achievements of the semantic technologies. They are the most suitable and applicable technologies for developing “authority control” and “evidence-based” applications. It has the advantages of flexibility and scalability that the traditional relational database does not have. It is very important especially in the distributed environment of massive semi-or non-structured data applications. The advantage of having such data model can directly deal with semantic data (machine understandable), but also support knowledge-based queries with reasoning function.
The data model takes into account of the design method and the aspects of the data model, including the bibliographic framework, creators and contributors, classifications, seals, taboo term and so on. The bibliographic framework consists of 3 + 2 model which stands for “Work- Instance-Item”+“Annotation”+“Classification” based on the needs of evidence-based research of Chinese ancient books, with the reference of the four-tier model of FRBR's “WEMI” and three-tier model of LOC's BIBFRAME2.0. It can adapt flexibly to any kinds of ancient book catalogs and metadata schema based on MARC or DCAP; it also can integrate the full texts of ancient literature. It has an appropriate ability to represent the classification and its multiple comments of different time periods in the records of ancient books. For the description of creators and contributors, the BIBFRAME “Contribution” model is used to clarify the relationship between the responsibility and the document, the relationship between the principle responsibility and the shared responsibility. The knowledge of ancient books is structured into fine-grained semantic units in order to facilitate the machine processing.
Using this model and the vocabularies to integrate data from 14 titles of typical ancient book catalogs, including historical catalogs, official catalogs, private catalogs, large modern joint catalogs and Shanghai Library's ancient book database, the platform realizes key functions for evidence-based research of ancient books. The functions include the search of the versions and classification of ancient books, the clustering and comparing of different versions or copies of an ancient book, the relations of authors and contributors, and the statistical analysis of ancient books with a given time period, area and topic. By the construction of “Chinese Ancient Book Union Catalog Platform for Evidence-based Research”, the availability, flexibility and scalability of the data model has been verified. The paper also puts forward the problems that need to be further resolved, such as the identification of a “Work”, the establishment of the relationships between the “Instances”, the extraction of structured and fine-grained data from the content of ancient books, and so on. 6 figs. 5tabs.18 refs.