LESSON: Effects of Outliers in Bivariate Data
Site: | Mountain Heights Academy OER |
Course: | Introductory Statistics Q3 |
Book: | LESSON: Effects of Outliers in Bivariate Data |
Printed by: | Guest user |
Date: | Friday, 4 April 2025, 11:54 AM |
Identifying Outliers in Scatter Plots
How do we identify outliers in scatter plots? Watch this video to see the different scenarios that can occur with outliers in bivariate data.
Visual Effects of Outliers on Correlation
We've learned about outliers before. An outlier is a data point that visually doesn't "fit" with the majority of the other data points. How do outliers affect bivariate data? Watch this video for a brief introduction to how outliers can affect bivariate data displayed in a scatter plot.
Effects of Outliers on Correlation
As we have seen, outliers can have a strong effect on our data when we are calculating the correlation coefficient. After we consider each of the outliers individually, sometimes it's appropriate to throw out outliers. Other times however, it remains necessary to include them.
Watch this video to see an example of how to consider individual outliers and to determine if they should be thrown out or included. (If you have a hard time understanding his accent, turn on Closed Captioning.)
Impact of Removing Outliers on Regression
In this lesson video from Khan Academy, you will see examples of the impact of removing outliers on a regression line.
Removing outliers can effect the calculation called "Coefficient of Determination"($$r^{2}$$, which we haven't talked about much in our class yet.
The coefficient of determination is a statistic that measures how well the regression line fits the data. It is calculated by first finding the Correlation Coefficient ($$r$$) and then squaring it. Squaring the correlation coefficient will always result in a positive number. So even data sets with a negative correlation coefficient will have a positive coefficient of determination. A coefficient of determination equal to 1 indicates that the regression line fits the data perfectly.