Compare LDA and Logistic Regression(1)

Karl 曹
May 20, 2022
2 min read

Here we state the initial setting:

We assume we have two classes, call them Class 1 and Class 2, obs. for class 1 is 300, and obs. for class 2 is 500. For we have two covariants, x1 and x2. They follow multinormal distribution, and share the same variance-covariance matrix :

and their mean are (-3,3) and (5,5) for class 1 & 2 respectively.

They look like:

Note that R give Class 1 a factor 0 and Class 2 a factor 1.

We first implement LDA to our simulated data set, firstly review that LDA model generates a classification variable:

for each class k. The biggest one given x for class k makes corresponding class belongs to k.

This is what we get.

And we obtain confusion table as follows given 0.5 as threshold:

The two columes are for true classes, and two rows are for predicted classes by our model.

Our predicted result for logistic regression:

Confusion Table for logistic model given threshold as 0.5:

Some review:

sensitivity = true predicted class 2/total true class 2

specificity = true predicted class 1/total true class 1

After simulate 100 times data and make 100 times model fitting and prediction, we obtain 100 prediction errors:

We can see that in this setting, logistic regression's prediction error has lower variance, and both method performs similiar.

In terms of sensitivity and specificity, we can see that both model have higher truly predicted class 2 rate than class 1's, because there are more obs. in class 2 and our threshold is 0.5.

And we could see that in such detailed prediction correct rate setting, Sensitivities are similar, and LDA has lower variance on Specificity. Generally they Performs the same.

In the next post we will discuss how to exploit LDA's advantage, and if changing threshold what will happen in terms of total and detailed prediction error.

Compare LDA and Logistic Regression(1)

Recent Posts

Comments