Regression, Correlation and Hypothesis Testing

Prerequisite Knowledge

From Statistics 1

  • Interpret diagrams for single-variable data, including an understanding that an area in a histogram represents the frequency
  • Connect grouped frequency tables to probability distributions
  • Interpret scatter diagrams and regression lines for bivariate data, including recognition of scatter diagrams which include distinct sections of the population
  • Understand informal interpretation of correlation
  • Understand that correlation does not imply causation
  • Be able to calculate standard deviation, including from summary statistics
  • Recognise and interpret possible outliers in data sets and statistical diagrams
  • Be able to clean data, including dealing with missing data, errors and outliers

Success Criteria

Correlation:

  1. Coefficient Calculation: Calculate and interpret the product-moment correlation coefficient (Pearson’s r).
  2. Causation vs. Correlation: Understand the difference between causation and correlation, and the dangers of concluding based solely on correlation.

Regression:

  1. Linear Regression:
    • Understand the principles behind linear regression to model relationships between two variables.
    • Calculate the equation of the regression line of y on x.
    • Interpret the gradient and y-intercept of the regression line in context.
    • Use the regression line to make predictions and understand the limitations of extrapolation.
  2. Exponential Models via Logarithmic Transformation:
    • Identify when data appears to fit an exponential model.
    • Understand how to transform the exponential model y=abx or y=axb using logarithms to achieve a linear form.
    • Perform a logarithmic transformation and plot the transformed data.
    • For the transformed data, calculate the linear regression line.
    • Reverse-transform the linear regression equation back to its exponential form.

Correlation Hypothesis Testing:

  1. Hypotheses Formulation: Formulate the null and alternative hypotheses for correlation testing.
  2. Critical Value: Use statistical tables or technology to determine critical values for a given significance level.
  3. Conduct Test: Perform a hypothesis test to ascertain the significance of the correlation between two variables, given a dataset.
  4. Results Interpretation: Analyze and interpret the results of the hypothesis test in context.

Key Concepts

Correlation:

  1. Coefficient Calculation:
    • Introduce the formula for Pearson’s r.
    • Provide worked examples of its calculation.
  2. Causation vs. Correlation:
    • Discuss real-world examples where correlation does not imply causation.
    • Emphasize the importance of external factors and confounding variables.

Regression:

  1. Linear Regression:
    • Define the terms “gradient” and “y-intercept.”
    • Derive and explain the formula for linear regression.
    • Provide exercises that involve making predictions based on a regression line.
  2. Exponential Models via Logarithmic Transformation:
    • Introduce the properties of logarithms and their application to data transformation.
    • Demonstrate the transformation of exponential models to linear models.
    • Provide examples of reverse transformation to retrieve the exponential model after regression analysis on transformed data.

Correlation Hypothesis Testing:

  1. Hypotheses Formulation:
    • Define the null hypothesis (H0​) and alternative hypothesis (H1​).
    • Discuss scenarios where one would want to test the significance of a correlation.
  2. Critical Value:
    • Show students how to find critical values using statistical tables.
    • Introduce how calculators can aid in this process.
  3. Conduct Test:
    • Walk through the steps of conducting a hypothesis test, emphasizing the importance of each step.
    • Offer varied examples and practice problems for students to try.
  4. Results Interpretation:
    • Discuss the potential outcomes of a hypothesis test and their implications.

Common Misconceptions

Correlation:

  1. All Correlation is Causation: Students often believe that if two variables are correlated, one must cause the other.
  2. Strength of Correlation: Students might think a correlation coefficient of r=0.5 means one variable is “half caused” by the other.
  3. Perfect Correlation: Some believe that a correlation coefficient of exactly 1 or -1 means the data points fall perfectly on a straight line.

Regression:

  1. Extrapolation: Students might believe that a regression model is equally accurate everywhere, even beyond the range of observed data.
  2. Intercept Significance: The y-intercept always has real-world significance.
  3. Linearity Assumption: Linear regression is suitable for all types of data distributions.
  4. Transformed Regression: After transforming exponential data to linear form using logarithms and creating a linear model, the data is now “linear”.

Correlation Hypothesis Testing:

  1. Significance = Importance: A statistically significant result means the finding has practical or real-world significance.
  2. P-value Misunderstanding: A p-value represents the probability that the null hypothesis is true.

Mr Mathematics Blog

Problem-Solving with Angles in Polygons

How to teach problem solving with angles in polygons through scaffolding.

The Sum to Infinity of Geometric Series – A-Level Maths Tutorial

Explore geometric series in our A-Level Maths tutorial. Perfect for students/teachers, with resources to download at mr-mathematics.com.

Sequences and Series

Edexcel A-Level Mathematics Year 2: Pure 2: Algebraic Methods