# Regression, Correlation and Hypothesis Testing

#### Prerequisite Knowledge

From Statistics 1

• Interpret diagrams for single-variable data, including an understanding that an area in a histogram represents the frequency
• Connect grouped frequency tables to probability distributions
• Interpret scatter diagrams and regression lines for bivariate data, including recognition of scatter diagrams which include distinct sections of the population
• Understand informal interpretation of correlation
• Understand that correlation does not imply causation
• Be able to calculate standard deviation, including from summary statistics
• Recognise and interpret possible outliers in data sets and statistical diagrams
• Be able to clean data, including dealing with missing data, errors and outliers

#### Success Criteria

Correlation:

1. Coefficient Calculation: Calculate and interpret the product-moment correlation coefficient (Pearson’s r).
2. Causation vs. Correlation: Understand the difference between causation and correlation, and the dangers of concluding based solely on correlation.

Regression:

1. Linear Regression:
• Understand the principles behind linear regression to model relationships between two variables.
• Calculate the equation of the regression line of y on x.
• Interpret the gradient and y-intercept of the regression line in context.
• Use the regression line to make predictions and understand the limitations of extrapolation.
2. Exponential Models via Logarithmic Transformation:
• Identify when data appears to fit an exponential model.
• Understand how to transform the exponential model y=abx or y=axb using logarithms to achieve a linear form.
• Perform a logarithmic transformation and plot the transformed data.
• For the transformed data, calculate the linear regression line.
• Reverse-transform the linear regression equation back to its exponential form.

Correlation Hypothesis Testing:

1. Hypotheses Formulation: Formulate the null and alternative hypotheses for correlation testing.
2. Critical Value: Use statistical tables or technology to determine critical values for a given significance level.
3. Conduct Test: Perform a hypothesis test to ascertain the significance of the correlation between two variables, given a dataset.
4. Results Interpretation: Analyze and interpret the results of the hypothesis test in context.

#### Key Concepts

Correlation:

1. Coefficient Calculation:
• Introduce the formula for Pearson’s r.
• Provide worked examples of its calculation.
2. Causation vs. Correlation:
• Discuss real-world examples where correlation does not imply causation.
• Emphasize the importance of external factors and confounding variables.

Regression:

1. Linear Regression:
• Define the terms “gradient” and “y-intercept.”
• Derive and explain the formula for linear regression.
• Provide exercises that involve making predictions based on a regression line.
2. Exponential Models via Logarithmic Transformation:
• Introduce the properties of logarithms and their application to data transformation.
• Demonstrate the transformation of exponential models to linear models.
• Provide examples of reverse transformation to retrieve the exponential model after regression analysis on transformed data.

Correlation Hypothesis Testing:

1. Hypotheses Formulation:
• Define the null hypothesis (H0​) and alternative hypothesis (H1​).
• Discuss scenarios where one would want to test the significance of a correlation.
2. Critical Value:
• Show students how to find critical values using statistical tables.
• Introduce how calculators can aid in this process.
3. Conduct Test:
• Walk through the steps of conducting a hypothesis test, emphasizing the importance of each step.
• Offer varied examples and practice problems for students to try.
4. Results Interpretation:
• Discuss the potential outcomes of a hypothesis test and their implications.

#### Common Misconceptions

Correlation:

1. All Correlation is Causation: Students often believe that if two variables are correlated, one must cause the other.
2. Strength of Correlation: Students might think a correlation coefficient of r=0.5 means one variable is “half caused” by the other.
3. Perfect Correlation: Some believe that a correlation coefficient of exactly 1 or -1 means the data points fall perfectly on a straight line.

Regression:

1. Extrapolation: Students might believe that a regression model is equally accurate everywhere, even beyond the range of observed data.
2. Intercept Significance: The y-intercept always has real-world significance.
3. Linearity Assumption: Linear regression is suitable for all types of data distributions.
4. Transformed Regression: After transforming exponential data to linear form using logarithms and creating a linear model, the data is now “linear”.

Correlation Hypothesis Testing:

1. Significance = Importance: A statistically significant result means the finding has practical or real-world significance.
2. P-value Misunderstanding: A p-value represents the probability that the null hypothesis is true.

### Mr Mathematics Blog

#### Problem-Solving with Angles in Polygons

How to teach problem solving with angles in polygons through scaffolding.

#### The Sum to Infinity of Geometric Series – A-Level Maths Tutorial

Explore geometric series in our A-Level Maths tutorial. Perfect for students/teachers, with resources to download at mr-mathematics.com.

#### Sequences and Series

Edexcel A-Level Mathematics Year 2: Pure 2: Algebraic Methods