• ×

    人物专栏 >> 全部专家列表

    The Application of Hierarchical Cluster Analysis in the Evaluation
    2014-09-03 14:33:51   来源:Tranbbs.com      作者:Hongqiang Li,Jianzhi Wang,Dianhai Wang    评论:0 点击:

    1.Introduction

    In order to accurately orient the development status of urban public transportation, make cities know their own levels of transit and determine the direction of struggle henceforth, at last impel them to strengthen the management, raise working efficient, and realize the aim of reducing cost and achieving better benefit, it is necessary to compare transversely among cities. However, there are problems about comparable cities, evaluation indexes, and criteria. Because of the great disparities of the urban economic and society, it is useful to research the choice of comparable cities. Therefore, this paper clusters thirty-one big cities of China into eight groups using HCA (Hierarchical Cluster Analysis).

    2. Hierarchical cluster analysis

    Cluster analysis is a classification method that study cluster of cases (or variables) in multifactor statistical analysis. This procedure attempts to identify relatively homogeneous groups of cases based on selected characteristics.

    HCA, one kind of method of cluster analysis, is applied widely. Any analysis of 1Doctor, Transportation College of Jilin University, Changchun, 130025, Jilin, P.R.China.

    cases (or variables) that has character of numerical value can use it. It combines the cases (or variables) one by one into the subclasses until all the cases (or variables) are in the one class. The detailed process is as follows:

    Step 1: The values need to be transformed by cases (or variables) before cluster analysis (We presume the variables have been filtered. The variables whose relativityare indistinctive and contributions are great are chosen, and variables that havestrongly correlative are eliminated);

    Step 2: HCA uses an algorithm that starts with each case (or variable) in a separate cluster (There are n clusters for n cases or variables), calculates the distances between the cases (or variables) and combines the two nearest cases (or variables) to one cluster;

    Step 3: Then chooses the method to calculate the distance between clusters and combines the two nearest clusters until only one cluster is left;

    Step 4: Draws dendrogram for cases (or variables) at last.

    Different classification criteria (cluster methods, measures, and standardization methods) can obtain different classification results.

    3. Clustering of cities

    This section first standardizes the original variables of 31 main cities of China in 1997(Table 1). The method of transform values is Z scores by variable. Then, calculates the distance (or similarity) matrix between cities adopting the Euclidean distance, and clusters groups of cities according to the ward’s method.

    table1 .jpg

    3.1 Standardization transform of original data

    This step is mainly to transform the variables, including:(1) first centers the variables; (2) standardize them by standard deviation.

    QQͼƬ20140903143012.jpg

    By the disposal of standardization transform, the mean of data for every variable is zero,and the variance is one. Even though the samples change, the result also keeps relative stability. The transform values of table 1 are as follows (Table 2), and some are omitted.

    Table 2.   Result of standardization transform

    table2.jpg

    3.2 Calculate the distance between cities 

    Euclidean distance is usually used to calculate the distance in clustering analysis. It is the square root of the sum of the squared differences between values for the items (Formula 4). Table 3 is the distance matrix.

    1.jpg

    Table 3.  Distance matrix

    table3.jpg

    ward’s method was put forward by Ward in 1936, and was developed by Orcloci et.al. (1967). It uses an analysis of variance approach to evaluate the distances between clusters. In short, this method attempts to minimize the sum of deviation squares of any two hypothetical clusters that can be formed at each step. If the classification is reasonable, the sum in the same cluster is smaller, but bigger in two clusters. Furthermore, it can make the number of cities in each group equal approximately.3.3 Hierarchical cluster

    First, each case (city) consists of one separate cluster, namely 31 clusters. Then combines the two clusters whose sum of deviation squares is smallest into one cluster. If n cases(or variables) was classified into K clusters,namely G1、G2…Gk, the sum of deviation squares of the cases in cluster Gt is St.

    2.jpg

    3.4 The result of clustering and dendrogram

    In this paper, 31 cities were combined into 8 homogeneous groups on the whole (Table 4), which accords with the simple line chart of the standardization data (omitted). So, the indexes of the cities in the same cluster have a lot of comparability and the approximate change range. Figure 1 is the dendrogram according to the result of last step.

    Table 4.  Clustering result

    table4.jpg

    3.5 Analysis of clustering result

    From the absolute indexes, the data of the first, second and sixth clusters are bigger in the total population, developed areas, the passengers carried of buses and trolley buses, the length of paved road and operational route net. As far as relative indexes are concerned, the area of paved roads of the cities in the eighth, seventh,   

    3.jpg

    second clusters are bigger. Beijing, Shanghai, and Chongqing are smaller. The change of other clusters is inconspicuous. The number of taxi per 10000 persons in the cities of the eighth and first cluster is bigger, but the change in the same cluster is also bigger because of the different policies of transportation. The public vehicles per 10000 persons in the cities of the sixth and eighth clusters is bigger, so it can provide more transportations capacity for society. GDP per capita in the third, sixth, seventh is higher.

    4. Conclusion

    The evaluation criteria of every single index can be the average values of every variable of the eight kinds of clusters. As for the integrated index evaluation, the integrated evaluation value of each city can be calculated for the compare by certain math models after the transform from many evaluation factors to several factors that can reflect main characters (or information). Moreover, it can be used to analyze the disparities between any two clusters, or distinguish and classify any other cities.

    Acknowledgement

    This paper is based on the project“ Study on the Traffic Network Flow Characteristics by Simulating Circuit System” supported by NSFC. (59978018)

    References

    He, X.Q.(1998) Modern Statistical Analysis Method and Application, China Renmin University Press,Beijing.

    Urban Social & Economic Investigation Team of State Statistical Bureau of China (1999) Urban Statistical Yearbook of China 1998. China Statistics Press, Beijing.


    责任编辑:白小嵩

    相关热词搜索: evaluation application

    上一篇:公路建设对国民经济增长贡献的 测算方法与应用研究
    下一篇:最后一页

    分享到: 收藏
    [专栏文章:6 篇]人物简介
      栗红强,北京易华录信息技术股份有限公司总工办主任。1976年出生于山西省沁县,2004年吉林大学交通学院博士毕业,2006年清华大学土木工程博士后流动站(土木工程系交通研究所)出站,高级工程师。现任北京易华录信息技术股份有限公司总工办主任,负责科技项目、科研机构、中关村十百千、联盟、标准等工作。