Definition:

  • Prediction by Regression Model
  • Actual value of
    • Regression equation
  • Predicted valye of
    • is the point estimator of
  • In practice, and are not known, but can be estimated with and
    • where and are computed by least squares method
  • Some notations:
    • : given value of independent value of
    • : possible values (a range) of dependent value of when
    • : the point estimator and the predictor of an individual value of when

Finding line of best fit:

  • There are 2 main methods:
    1. Scattergraph method:
      • Draw a line through data points with about an equal number of points above and below the line.
    2. Linear regression:
      • Using correlation:
        • We need to choose and to minimize the Mean square error
        • Taking partial derivative and set them equal to 0 we have and when it is at minimum point
          • Cant have maximum due to the nature of the problem
        • The best linear predictor (lowest mean square error) is:
          • Happens when and is the Correlation of and
          • The Mean square error of this predictor is given by
      • Using :

Assumptions about in linear regression model:

  1. . This implies and are constants, and hence
  2. The variance of , denoted by , is the same for all
  3. The values of are independent.
  4. is a normally distributed random variable for all values of

Testing for significance of linear:

  • Testing for significance
  • If , then . In this case, we would conclude that and are not linearly related
  • If , we would conclude that the two variables are related.
  • The t test is commonly used.
    • It requires an estimate of , the variance of in the regression model.
  • With , we can use the Mean square error as an estimate
    • The standard error of the estimate is used to estimate
  • For sample distributin of :
    • Expected value:
    • sd:
      • then we use it to estimate:
    • Reject if -value , where follows a t distribution with degrees of freedom
  • Confidence Interval for is
    • with two-tailed
    • If the CI include 0, we can hypothesize value of might not be in the CI
    • We can reject as there is a chance the model is not significant enought to be used

Variance of predicting value:

    • where is the variance of the all collected
    • Think of it as var of the group to other group

Confidence interval for mean of predicting value :

  • Confidence interval of the mean of all predicting value when
    • with degree of freedom

Prediction interval for :