Co-word analysis is a content analysis technique based on the assumption that the subject of a paper can be summarized in a limited number of key terms. If two terms co-occur within one paper,the two research topics they represent are related,and the higher frequency of the co-word means stronger correlation in terms pairs. However,the basic work of co-word analysis is still words and extremely sensitive to the selection of terms,and the quality of co-word analysis depends on a variety of factors,such as the quality of terms and indexes,the high-frequency terms extraction,and the adequacy of statistical methods. Therefore it is necessary to delve into the limitations of co-word analysis at different stages to improve and optimize it.
The co-word analysis conducted in the present study involved six sequential steps: determination of problem analysis,term source selection,high-frequency terms extraction,relevance calculation of terms,multivariate statistical analysis,and visual presentation of results. This paper focuses on those six key issues to analyze and demonstrate the main problems based on the induction and summarization of the existing relevant research. Results indicate the following conclusions. 1) In the term source selection,solely making use of keywords and index words,which is called “indexer effect” by researchers,is the biggest problem of early co-word analysis. Keywords are uncontrolled words,and problems of homonyms and synonyms will be brought out. Meanwhile,terms expression differences exist among different parts of analysis units,and some errors of co-word analysis will be induced if those differences are ignored. In order to solve the above problems,the textual semantic structure and the phenomenon of different quality with different quantity of terms can be considered. 2) Researchers engaged in co-word analysis have never been out of the pattern that adopts high-frequency term to develop the multivariate statistical analysis. The extraction of high-frequency terms not only makes low-frequency terms more marginalized,but also causes isolation of high-frequency terms that have low correlation with clusters. Considering the discipline and multi-semantic types of terms to distinguish the representation capabilities of subject areas,we can have a comprehensive and in-depth understanding of the research characteristics of this field. 3) Two co-occurrence terms may correlate each other directly or indirectly,but these semantic relationships between co-occurrence terms are not considered at all,which may affect the soundness of the results of co-word analysis ultimately. Thus we summarize the existing calculation methods of semantic correlation and point out the limitations of each method. 4) Finally,in the multivariate statistical analysis,taking the co-word clustering and co-word association analysis method as example,we discuss the problems of their application in the new data environment and put forward the improvement method and suggestion.
Co-word analysis has been most commonly utilized in mapping or tracing patterns and trends in term association. Although co-word analysis has been improved in many aspects,it still has some limitations. This paper tries to provide theoretical and operational references for co-word analysis researchers and enhancing the reliability and effectiveness of co-word analysis,and it is of great significance to jump out of the traditional theory and practice strategy of co-word analysis. 1 fig. 82 refs.