| Cluster Analysis Tool | ![]() |
| Created: | 6/24/2002 |
The Scatterplot
The graph below is called a scatterplot. A scatterplot contains a point for
each record in the dataset (in this case a point for each state). The X-axis of the
scatterplot represents housing value, and the Y-axis represents rent, thus creating
an attribute space within which we can identify clusters. Points that are in
the upper right-hand corner of the scatterplot are states that have high mean rent and high mean value, while points in
the lower left are states that have low mean rent and low mean value. (Note: The scatterplot
pictured here was generated using the Scatterplot Tool, which can be found in ArcObjects Developer Help under
Samples > Analysis and Visualization > Scatterplot Tool).
Cluster analysis looks at the patterns that these points form in data space. The type of cluster
analysis that we used here starts out by placing each point in the scatterplot in
its own cluster. It then looks to see which two points are closest to one another (in data space). Those
two points are added together to create a new, larger cluster. The process then
repeats itself, finding the two clusters that are closest to one another, and "lumping" them
together into a larger cluster. The process is complete when all individuals in the
data set have been lumped together into one big cluster.
The Dendrogram
One product of cluster analysis is a tree diagram representing the entire process of going
from individual points to one big cluster. This diagram is called a dendrogram, and is
illustrated below. Once the cluster analysis algorithm has been
run, the user must decide how many clusters he or she wants to explore (this is sometimes
referred to as "pruning" the dendrogram). In this example we have chosen to look at four
clusters (symbolized in red, yellow, green, and blue).
Deciding the number of clusters to map can be aided by looking at the dendrogram.
There are three key pieces of information that you can get from the dendrogram. In the
dendrogram above, the yellow cluster is labeled so that you can see the parts of
it that represent these pieces of information. They are:
The User Interface
The user interface for the cluster analysis sample consists of a context command that must be placed in the Feature
Layer context menu. When executed, that command opens a form that allows you to specify parameters and then to run
the cluster analysis algorithm. Running the algorithm produces a dendrogram and a new field in the
source table storing the new classification. The dendrogram consists of a dataframe with three layers: two point
layers (the first contains the nodes and the second the leaves) and a line layer (contains the branches).
Each dendrogram has one leaf corresponding to each feature in the map (in the example above, each leaf represents a
state). The leaf feature layer contains all of the data associated with the source feature layer. The nodes and
branches both have the following fields containing results from the cluster analysis:
The parameters in the form include the following: 
| Visual Basic |
| File | Description |
| DendroGenUI.cls | Implements the ICommand interface to add the cluster analysis tool to the UI. |
| DendroGenerator.cls | Uses the cluster analysis engine to run a cluster analysis and build a dendrogram. |
| ClusterAnalysis.vbp | Visual Basic project file. |
| ClusterAnalysis.dll | The compiled component. |
| frmMakeDendro.frm | Dialog box for setting parameters and running cluster analysis to generate a dendrogram. |