Overview

<< Click to Display Table of Contents >>

Navigation:  R Statistics in PMOD > Analysis Scripts for Aggregates > Linear Models: Statistical Analysis of Regional Imaging Data > Discrimination Analysis >

Overview

Linear discriminant analysis (LDA) aims at identifying combinations of parameters (typically ROIs in the context of our presentation) that provide the best discrimination between two groups of subjects, e.g. patients and controls. It is in many respects similar to the multivariate analysis of regional differences between groups, as discussed above, but has the aim of providing a discriminant function instead of determining the significance of regional differences. That discriminant function can then be applied to future cases to determine with which group they fit best, e.g. to determine whether a data set belongs to the control or to the patient group. Various methods have been developed for LDA [10], with logistic regression probably being the most widely used and generally applicable method. The linear discriminant function is determined by maximizing the likelihood of the data fit. In R, logistic regression is implemented within the generalized linear models procedure "glm".

As with all data fitting, there is a danger of overfitting by including too many variables as predictors. A useful criteria to control for that is the Akaike Information Criterium (AIC), which penalizes the quality of fit by the number of predictors used with reference to information theory. It can be used to eliminate variables from the model that are only providing minor improvements to the fit while probably mostly fitting data noise. A stepwise elimination of predictor variables is implemented in R by the procedure "stepAIC".

Quality of discrimination can be checked by using the Receiver Operating Characteristic (ROC), which provides a graphical illustration of sensitivity and specificity as the threshold used for discrimination is varied. As a numeric parameter the area under discrimination curve (AUC) is being used, which is independent from the logistic regression model used above. AUC values range between 0.5 (no discrimination) and 1.0 (perfect discrimination). ROC is implemented in R by procedure "roc".

The discriminant analysis using logistic regression is providing discriminant functions that can be used for classification of individual test cases, with potential diagnostic applications.