陈涛,张永娟,刘炜,朱庆华.关联数据发布的若干规范及建议[J].中国图书馆学报,2019,45(1):34~46
Several Specifications and Recommendations for the Publication of Linked Data
关联数据发布的若干规范及建议
Received:July 11, 2018  Revised:August 03, 2018
DOI:
Key words:Linked data  Seven star model  Technical specification  Semantic web  Open data
中文关键词:  关联数据  七星模型  技术规范  语义网  开放数据
基金项目:本文系国家社会科学基金重大项目“面向大数据的数字图书馆移动视觉搜索机制及应用研究”(编号:15ZDB126)的研究成果之一
Author NameAffiliationE-mail
CHEN Tao 上海图书馆 上海 200031 tchen@libnet.sh.cn,tchen@libnet.sh.cn 
ZHANG Yongjuan 中国科学院上海生命科学研究院 上海 200031  
LIU Wei 上海图书馆 上海 200031  
ZHU Qinghua 南京大学信息管理学院 江苏 南京 210093  
Hits: 365
Download times: 
Abstract:
The Five Star model of open data proposed by Berners Lee, has been regarded by the industry as the highest standard for the publication of linked data these years. Publishers should apply the Five Star model requirements to pursue high quality data. However, many datasets published following the Five Star model do not bring convenience to consumers. These datasets are more like black boxes that users have to explore. Scholars in the Aalto University in Finland have proposed a Seven Star model of linked data, which is an extension of the Five Star model in terms of usability of the dataset. When the dataset is published, the sixth star requires that its corresponding ontology should also be released, which makes it easy for the user to understand the structure of the dataset. And the seventh star requires declaring the use of classes and attributes in the ontology of the dataset, which makes it easy to confirm the data status of the dataset for user. 
This article is based on the Seven Star model. As we all know, the one star stage specifies data online, such as image and PDF file; two star is machine readable, such as EXCEL file; and three star is a relatively simple non proprietary format, such as CSV file. Not all of these three stars are the core of the linked data system, thus they will not be discussed in this article. We analyze the problems that need to be addressed from the four star to seven star models and put forward corresponding regulatory measures and recommendations in this paper. In the four star model, which needs to publish dataset with W3C open standards, we propose common technical specifications on resource URI design, content negotiation, storage and publishing the RDF data based on the four principles of linked data publication. In terms of data storage, we also propose that the three data storage methods are universal according to the structure of the dataset. There are “single graph(SG) mode”, “multi graph(MG) mode” and “data hub(DH) mode”. The publication of the dataset is the basis in the linked data application. Moreover, the linking between the different datasets is the purpose. Therefore, for the five star model about the linked data, this paper proposes some norms and suggestions that are often overlooked from the perspective of ontology vocabulary design and reuse. The expanded six star and seven star models are not separated from the five star model, and there put higher requirements on the application of the linked data in details. In the six star model specification, this paper proposes three forms of ontology publishing: “point mode” (file mode), “line mode” (page mode), and “face mode” (visualization mode) according to the degree of readability of the ontology. The “point mode” and “line mode” modes are six star standards, while the “face mode” mode is up to six and a half stars. In addition, metadata description of the dataset is often overlooked, and this has an impact on whether the dataset is easy to understand and reuse. The corresponding solutions are proposed for these issues. The standard for the seven star model is relatively simple; therefore, it can be realized by the use of existing tools or statistics on simple classes and properties. 
In summary, the norms and recommendations are presented based on the author's practical experience in the field of digital humanities and linked data during these years. These technical solutions are reproducible. The purpose is to sort out and to guide the publication of linked data. Then these measures can promote the development of linked data in various industries, and create a good ecosystem of the linked data. 5 figs. 3 tabs. 17 refs.
中文摘要:
      伯纳斯-李提出的开放数据五星模型,多年来一直被业内视为关联数据发布的最高标准。五星模型从追求高质量数据的角度出发,对数据发布者提出了相关要求,然而遵循五星模型发布的诸多数据集并没有为使用者带来便利,而更像一个个需要人们去探索的黑盒。芬兰阿尔托大学的学者由此提出了开放数据七星模型,从数据集的易用性方面对五星模型做了扩展。本研究以此七星模型为基础,分析了从四星到七星模型中需要注意的诸多问题,并从本体设计、资源存储、数据发布、状态监控等角度提出了相应的规范措施和建议。这些规范和建议来自于多个数字人文项目中的实践经验,具有技术的可复制性,旨在从数据集发布伊始对发布中的多项工作进行梳理和指导,共同推动关联数据在各行各业的发展,打造良好的关联数据生态圈。图5。表3。参考文献17。
Download PDF   View/Add Comment  Download reader