Definition:

  • Given a set of vectors as dataset, what can we infer about the mean

Simple Mean Vector Test:

  • Consider the test in with and
  • Define , T-squared Statistic
    • Where and
    • whe re denotes F-distribution with and degrees of freedom
  • At level of significant, we reject if
    • where is the upper (100)-th percentile of the distribution
    • If is too large, meaning is too far from

Confidence Regions of Component Means:

  • A confidence region for the mean of a -dimension normal distribution is the ellipsoid determined by all such that
    • where and
  • The confidence ellipsoid is
  • Let be a normal random sample from an population with positive definite.
    • Then, simultaneously for all , the interval will contain with probability
    • these simultaneous intervals are also referred as -invervals
    • ex: We can make statements about the differences by choosing where and .
      • In this case and the interval contains with probability
  • By choosing where , , the interval contains with probability

Large sample inferences about a population mean:

  • When the sample size is large, tests of hypotheses and confidence regions for can be constructed without the assumption of a normal distribution
  • Let be a random sample from a population with mean and positive definite covariance matrix and is large:
    1. , reject
      • for level of significant
    2. will contain for every , with probability approximately
      • Consequently, we can make the simultaneous confidence statements contains for

Multivariate Quality Control Chart, chart:

  • Control charts make the variation visible
  • Allow one to distinguish common from special causes of variation.
  • One useful control chart is the chart:
    1. Plot the individual observations or sample means in time order.
    2. Create and plot the centerline , the sample mean of all of the observations.
    3. Calculate and plot the control limits given by
      • Upper control limit + 3(standard deviation)
      • Lower control limit - 3(standard deviation)
    4. Plot with data points in order from 1; draw 3 sets of identical points for UCL, mean and LCL then connect them

Ellipsoid Format Chart:

  • only extract 2 dimensions, therefore
    • the 95% quality ellipse consists of all x that satisfy
  • Find Cov, Cov
  • Find mean
  • Determinant =
  • Trace
  • Find eigenvalues from det and trace (use quadratic)
  • Find
    • for small sample
    • . Chi-sq
  • Theta(rad)
  • Calculate rotation matrix
  • Sample for points on esclipse:
    • Generate n points from to :
    • x coordinate of that point
    • y coordinate of that point
    • plot points and connect them
  • Plot data:
    • For each data point: minus mean to normalize it and plot on same graph with ellipse
    • then find if that point is inside confidence interval
      • If : outside, otherwise inside

chart:

  • For more than 2 dimensions
  • When a point is out of the control region, individual charts are constructed.
  • When the lower control limit is less than zero for data that must be nonnegative, LCL is generally set to zero.
  • Points are displayed in time order rather than as a scatter plot, and this makes patterns and trends visible.
  • For the th points, we calculate the T-squared Statistic:
  • Then plot the -values on a time axis, the lower limit is 0, and upper limt is , there is no centerline in -chart
  • When the multivariable -chart signals that the -th unit is out of order, it should be determined which variables are responsible
  • A region based on Bonferoni Interval is frequently chosen for this purpose. The k-th variable is out of control if does not lie in the interval where is the total nb of measured variables

Inference when some observations are missing:

  • Often, some components of a vector observation are unavailable. We treat situations where data are missing at random.
  • To estimate the incomplete data, we use the EM algorithm.
    1. Prediction step. Given some estimate of the unknown parameters, predict the contribution of any missing observation to the (complete-data) sufficient statistics.
    2. Estimation step. Use the predicted sufficient statistics to compute a revised estimate of the parameters.
  • When the observations are a random sample from a p-variate normal population, the prediction–estimation algorithm is based on the complete data sufficient statistics
    • and
  • We assume that the population mean and variance are unknown and estimated with and
  • Estimation:
    • and
  • Prediction step:
    • for each vector with missing values, let denotes the vector of missing components and denotes vector of available components
    • Contribution estimation of to
    • Predicted contribution of to is