Maximum Likelihood Classification Example Essay
How Maximum Likelihood Classification works
- The cells in each class sample in the multidimensional space are normally distributed.
- Bayes' theorem of decision making.
ExampleThe following example shows the classification of a multiband raster with three layers into five classes. The five classes are dry riverbed, forest, lake, residential/grove, and rangeland. An output confidence raster will also be produced. The input raster bands are displayed below.
The Maximum Likelihood Classification tool is used to classify the stack into five classes. The following settings were used:
The classified raster appears as:
Areas displayed in red are cells that have less than a 1 percent chance of being correctly classified. These cells are given the value NoData due to the reject fraction used. The dry riverbed class is displayed as white, with the forest class as green, lake class as blue, residential/grove class as yellow, and rangeland as orange.
The list below is the value attribute table for the output confidence raster. It shows the number of cells classified with what amount of confidence. Value 1 has a percent chance of being correct. There are 3, cells that were classified with that level of confidence. Value 5 has a 95 percent chance of being correct. There were 10, cells that have a percent chance of being correct with a value of
Table 4 shows the best pixel-based classification accuracies of the algorithms. For the two unsupervised algorithms, they could produce as good results as some of the supervised algorithms when we cluster spectral clusters. This is usually a very large number of clusters for an image analyst. Thus, we did not experiment for more clusters. Most supervised algorithms produce satisfactory results when the training samples are sufficient (more than samples per class). However, MLC only requires 60 pixels to reach its highest accuracy. This indicates the high level of robustness and capability of generalization.
A small value of K (K = 3) for KNN is the better choice in this study, and the distance-based weighting improves the KNN results. For the simple classification tree algorithms (CART, C, and QUEST), minNumObj means minimum number of samples at a leaf node, which determines when to stop tree growing. All the three simple tree algorithms achieve high accuracies when this value is less than In other words, they all grow big trees and then prune them. However, the LMT needs a large minNumInstances to build the tree. For RF, numFeatures means the number of features to be randomly selected at each node and numTrees means number of trees generated. Usually, the suggested value of numFeatures is , where N is the number of features . However, in this research, we find a value smaller than is more suitable. For SVM, we used radial basis function (RBF) kernel, the space affected by each support vector is reduced as the kernel parameter gamma increases. A slightly large gamma (23, 24) is the best choice for this research, which means more support vectors are used to divide the feature space. MinStdDev in RBFN is the minimum number of standard deviations for the clusters, controlling the width of Gaussian kernel function as gamma in SVM. numCluster is the number of clusters, determining the data centers of the hidden nodes. In this research, we found the numCluster equal to or slightly greater than the number of classes is a better choice. BagSizePercent in Bagging controls the percentage of training samples randomly sampled from the training sets with replacement. The results show that 60%–80% of the training set achieved better results. It is similar to weightThreshold in Adaboost, but the latter one resamples the training set according to the weight of the last iteration. It achieves good classification results using only 10 iterations. For SGB, woaknb.wz.skon controls the fraction of training set randomly selected without replacement. The best value of the sampling fraction is This reduces the correlations between models at each iteration. The best shrinkage value, which is the learning rate is
From Table 4 we can see that the best classification accuracy for the 6-band case is achieved by logistic regression, followed closely by the maximum likelihood classifier, neural network, support vector machine, and logistic model tree algorithms. Opposite to this, the CBEST and KNN produced the lowest accuracies. The range of Kappa coefficient from the lowest to the highest is For the 4-band case, in general, there is a to difference in Kappa for each algorithm, confirming the fact that with fewer spectral bands there is indeed accuracy loss. However, in this experiment, the accuracy drop is quite small implying that the inclusion of the two middle infrared bands of the TM would not add a lot of power in separability to the classification of our classes. The maximum likelihood classifier produced the highest accuracy of for the 4-band case, only inferior to the highest accuracy with the 6-band case. The accuracy range for the 4-band case is between and
Table 5 shows the best classification accuracies using objected-oriented method. The results of this kind of classification are largely depending on the segmentation . The classification accuracies are the highest when the segmentation scale is set to 5 (the smallest). The best performer is SGB with an accuracy improvement of over the best pixel-based classification results. This is followed closely by RF. The accuracies decrease with the increase of the threshold. A higher threshold produces larger objects. For the TM image, which is 30 m in resolution, fragmentation is relatively high in this urban area. High threshold brings more mixed information in the segments under this classification system. As small segments are relatively homogeneous, the classifiers utilizing statistical properties of the segments rather than individual pixel values improved the results.
Comparing Table 5 with Table 4, we can see all results are improved based on objected-oriented approach using spectral features only. Among them, SGB produced the best results, followed by RF, C, LMT, LR, and MLC. From another perspective, these algorithms could deal with high-dimensional data.