Multivariate Normal Distribution

Definition: $X \sim N_{p} (μ, \sum)$

Normal distribution
A $p$ –dimensional normal density for the random vector $X^{⊺} = (X_{1}, ..., X_{p})$ has the form $f (X) = \frac{1}{( 2 π ) ^{p /2} ∣ \sum ∣ ^{1/2}} e^{- (X - μ)^{⊺} \sum^{- 1} (X - μ) /2}$
- as $μ$ moves, the maximum point moves accordingly
$⎩ ⎨ ⎧ X_{1} X_{2} ... X_{n} ⎭ ⎬ ⎫ \sim N (⎩ ⎨ ⎧ μ_{1} μ_{2} ... μ_{n} ⎭ ⎬ ⎫, ⎩ ⎨ ⎧ σ_{X_{1}}^{2} σ_{X_{2}, X_{1}} ... σ_{X_{n}, X_{1}} σ_{X_{1}, X 2} σ_{X_{2}}^{2} ... ... ... ... ... ... σ_{X_{1}, X_{n}} ... ... σ_{X_{n}}^{2} ⎭ ⎬ ⎫)$
- therefore, $\sum$ is a symmetric matrix with $σ_{X_{1}, X_{2}} = σ_{X_{2}, X_{1}}$
- $ρ_{12} = ρ_{X_{1}, X_{2}} = σ_{12} / (σ_{11} σ_{22}) = Corr (X_{1}, X_{2})$ by definition of Correlation
Contours of constant density for the $p$ -dimensional normal distribution are ellipsoids defined by $X$ such the that $(X - μ)^{⊺} Σ^{- 1} (X - μ) = c^{2}$
- These ellipsoids are centered at $μ$ and have axes $\pm c λ_{i} e_{i}$
- where $\sum . e_{i} = λ_{i} e_{i}$ are Eigenvector. Eigenvalue for $i = 1, ..., p$
- In bivariate, these constant density is an ellipsoids formed when $f (x) =$ a constant and $e_{i}$ are the basis of that plane, ig?
The solid ellipsoid of $x$ values satisfying $(x - μ)^{⊺} \sum^{- 1} (x - μ) \leq χ_{p}^{2} (α)$ has probability $1 - α$
- The $p$ -variate normal density has a maximum value when the squared distance is zero, that is, when $X = μ$ . Thus, is the point of maximum density, or mode, as well as the expected value of $X$ , or mean.
- The fact that is the mean of the multivariate normal distribution follows from the symmetry exhibited by the constant-density contours: These contours are centered, or balanced, at $μ$
The following are true for a random vector $X$ having a multivariate normal distribution:
1. Linear combinations of the components of $X$ are normally distributed.
2. All subsets of the components of X have a (multivariate) normal distribution.
3. Zero covariance implies that the corresponding components are independently distributed.
4. The conditional distributions of the components are (multivariate) normal

Proposition:

If $\sum$ is a positive definite, so that $\sum^{- 1}$ exists, then $\sum e = λ e$ implies $\sum^{- 1} e = \frac{1}{λ} e$ so $(λ, e)$ are Eigenvalue-Eigenvector pair for $\sum$ corresponding to the pair $(1/ λ, e)$ for $λ^{- 1}$
- also, $\sum^{- 1}$ is positive definite
Linear combination variable of component of X: If $X$ is distributed as $N_{p} (μ, Σ)$ , then any linear combination of variables $a^{⊺} X$ is distributed as $N (a^{⊺} μ, a^{⊺} \sum . a)$ .
- Also, if $a^{⊺} X$ is distributed as $N (a^{⊺} μ, a^{⊺} \sum . a)$ for every $a$ , then $X$ must be $N_{p} (μ, \sum)$
Matrix tranformation variable: If $X$ is distributed as $N_{p} (μ, \sum)$ , the $q$ linear combinations $A_{q \times p} X_{p \times 1}$ (each equation reprensents 1 row of $A$ ) are distributed as $N_{q} (A μ, A \sum . A^{⊺})$ .
- $A_{q \times p} X_{p \times 1} = a_{11} x_{1} + ... + a_{1 p} x_{p} a_{21} x_{1} + ... + a_{2 p} x_{p} ... a_{q 1} x_{1} + ... + a_{qp} x_{p}$
  - This produce $q$ new variables then form a vector
- Also, $X_{p \times 1} + d_{p \times 1}$ where $d$ is a vector of constants is distrbuted as $N_{p} (μ + d, \sum)$
All subsets of $X$ are normally distributed. If we respectively partition $X$ by $q$ elements, its mean vector $μ$ and its covariance matrix $\sum$ as
- $X_{p + 1} = [X_{1 (q \times 1)} X_{2 ((p - q) \times 1)}], μ_{p \times 1} = [μ_{1 (q \times 1)} μ_{2 ((q - p) \times 1)}]$
- and $\sum_{p \times p} = [\sum_{q \times q}^{11} \sum_{(p - q) \times q}^{21} \sum_{q \times (p - q)}^{12} \sum_{(p - q) \times (p - q)}^{22}]$
- then $X_{1}$ is distributed as $N_{q} (μ^{1}, \sum^{11})$
Dependency between combination elements as variable:
- If $X_{1 (q_{1} \times 1)}$ and $X_{2 (q_{2} \times 1)}$ are independent, then $C o v (X_{1}, X_{2}) = 0$ , a $q_{1} \times q_{2}$ matrix of zeros
- If partition $[X_{1} X_{2}]$ is $N_{q_{1} + q_{2}} ([μ_{1} μ_{2}], [\sum_{11} \sum_{21} \sum_{12} \sum_{22}])$ then $X_{1}$ and $X_{2}$ are independent if and only if $\sum_{12} = 0$
- If $X_{1}$ and $X_{2}$ are independent and are distributed as $N_{q_{1}} (μ_{1}, \sum_{11})$ and $N_{q_{2}} (μ_{2}, \sum_{22})$ respectively, then $[X_{1} X_{2}]$ has multivariate normal distribution $N_{q_{1} + q_{2}} ([μ_{1} μ_{2}], [\sum_{11} 0 0 \sum_{22}])$
Conditional expectation: Let $X = [X_{1} X_{2}]$ be distributed as $N_{p} (μ, \sum)$ with $μ = [μ_{1} μ_{2}], \sum = [\sum_{11} \sum_{21} \sum_{12} \sum_{22}]$ and $∣ \sum_{22} ∣ > 0$ . Then the conditional distribution of $X_{1}$ given that $X_{2} = x_{2}$ , is normal and has:
- mean = $μ_{1} + \sum_{12} \sum_{22}^{- 1} (x_{2} - μ_{2})$
- variance = $\sum_{11} - \sum_{12} \sum_{22}^{- 1} \sum_{21}$
- note that the covariance does not depend on the value $x_{2}$ of the conditioning variable
Let $X$ be distributed as $N_{p} (μ, \sum)$ with $∣ \sum ∣ > 0$ . Then:
- $(X - μ)^{⊺} \sum^{- 1} (X - μ)$ is distributed as $χ_{p}^{2}$ , Chi-squared distribution with $p$ df
- The $N_{p} (μ, \sum)$ distribution assigns probability $1 - α$ to the solid ellipsoid ${x : (x - μ)^{⊺} \sum^{- 1} (x - μ) \leq χ_{p}^{2} (α)}$ where $χ_{p}^{2} (α)$ denotes upper $(100 α)$ -th percentile of $χ_{p}^{2}$ distribution
  - Diagonalizable matrix
Let $X_{1}, ..., X_{n}$ be mutually independent with $X_{j}$ distributed as $N_{p} (μ_{i}, \sum)$ , then $V_{1} = c_{1} X_{1} + ... + c_{n} X_{n}$ is distributed as $N_{p} (\sum_{j = 1}^{n} c_{j} μ_{j}, (\sum_{j = 1}^{n} c_{j}^{2}) Σ)$
- Moreover, $V_{1}$ ad $V_{2} = b_{1} X_{1} + ... + b_{n} X_{n}$ are jointly multivariate normal with covariance matrix $[(\sum_{j = 1}^{n} c_{j}^{2}) Σ (b^{⊺} c) Σ (b^{⊺} c) Σ \sum_{j = 1}^{n} b_{j}^{2}) Σ]$
- consequently, $V_{1}$ and $V_{2}$ are independent if $b^{⊺} c = 0$

Multivariate Normal Likelihood:

Assume that the $p \times 1$ vectors $X_{1}, ..., X_{n}$ represent a random sample from a multivariate normal population with mean vector $μ$ and covariance matrix $\sum$
The joint density function of all observations is product of the marginal normal density:
- $j = 1 \prod n \frac{1}{( 2 π ) ^{p /2} ∣ \sum ∣ ^{1/2}} e^{- (x_{j} - μ)^{⊺} \sum^{- 1} (x_{j} - μ) /2} = \frac{1}{( 2 π ) ^{p /2} ∣ \sum ∣ ^{1/2}} e^{- \sum_{j = 1}^{n} (x_{j} - μ)^{⊺} \sum^{- 1} (x_{j} - μ) /2}$
A function of $μ$ and $Σ$ for the fixed set of observations $x_{1}, ..., x_{j}$ is called the likelihood
Lemma: Let $A$ be a $k \times k$ symmetric matrix and $x$ be a $k \times 1$ vector:
- $x^{⊺} A x = t r (x^{⊺} A x) = t r (A x x^{⊺})$ , Trace
- $t r (A) = \sum_{i = 1}^{k} λ_{i}$
With that, we can have joint density function $L (μ, \sum) = (2 π)^{- n p /2} ∣ \sum ∣^{- n /2} \times exp {- t r [\sum^{- 1} (\sum_{j = 1}^{n} (x_{j} - \overset{x}{ˉ}) (x_{j} - \overset{x}{ˉ})^{⊺} + n (\overset{x}{ˉ} - μ) (\overset{x}{ˉ} - μ)^{⊺})] /2}$

Maximum likelihood estimation of $μ$ and $\sum$ :

Lemma: Given a $p \times p$ symmetric positive definite matrix $B$ and scalar $b > 0$ , it follows that $\frac{1}{∣ \sum ∣ ^{b}} exp {- t r (\sum^{-} 1 B) /2} \leq \frac{1}{∣ B ∣ ^{b}} (2 b)^{p b} e^{- p b}$ for all positive definite $p \times p$ matrix $\sum$ , with equality holding only for $\sum = \frac{1}{2 b} B$
Proposition: Let $X_{1}, ..., X_{n}$ be a random sample from a normal population with mean $μ$ and covariance $\sum$ .
- Then $\overset{μ}{^} = \overset{ˉ}{X}$ and $\sum^{^} = \frac{1}{n} \sum_{j = 1}^{n} (X_{j} - \overset{ˉ}{X}) (X_{j} - \overset{ˉ}{X})^{⊺} = \frac{n}{n - 1} S$ are maximum likehood estimators of $μ, \sum$
- Their observed values, $\overset{x}{ˉ}$ and $\frac{1}{n} \sum_{j = 1}^{n} (X_{j} - \overset{ˉ}{X}) (X_{j} - \overset{ˉ}{X})^{⊺}$ are maximum likelihood estimates of $μ$ and $\sum$
Maximum likelihood estimators possess an invariance property
- Let $\hat{θ}$ be the maximum likelihood estimator of $\hat{θ}$ and consider estimating the parameter $h (θ)$ Then the maximum likelihood estimate of $h (θ)$ is $h (\hat{θ})$
Let $X_{1}, ..., X_{n}$ be a random sample from a multivariate normal population with mean $μ$ and covariance $μ$ , then $\overset{ˉ}{X}$ and $S = \frac{1}{n - 1} \sum_{j = 1}^{n} (x_{j} - \overset{x}{ˉ}) (x_{j} - \overset{x}{ˉ})^{⊺}$ are sufficient statistics

The sampling distribution of $\overset{ˉ}{X}$ and $S$ :

Similar to Central limit theorem
For the multivariate case, $\overset{ˉ}{X}$ has a normal distribution with mean $μ$ and covariance matrix $(1/ n) \sum$
The sampling distribution of the sample covariance matrix is called the Wishart distribution, defined as the sum of independent products of multivariate normal random vectors
$W_{m} =$ Wishart distribution with $m$ df = distribution of $\sum_{j = 1}^{m} Z_{j} Z_{j}^{⊺}$ where $Z_{j}$ are each independently distributed as $N_{p} (0, \sum)$
- denote with $W_{p} (n, \sum)$
theorem Let $X_{1}, ..., X_{n}$ be a random sample of size $n$ from a $p$ -variate normal distribution with mean $μ$ and covariance matrix $\sum$ , then:
- $\overset{ˉ}{X} \sim N_{p} (μ, (1/ n) \sum)$
- $(n - 1) S \sim W_{n - 1}$
- $\overset{ˉ}{X}$ and $S$ are independent
Properties of Wishart distribution:
- if $A_{1}$ is distributed as $W_{m_{1}} (A_{1} ∣ \sum)$ as independently of $A_{2} \sim W_{m_{2}} (A_{2} ∣ \sum)$ then $A_{1} + A_{2}$ is distributed as $W_{m_{1} + m_{2}} (A_{1} + A_{2} ∣ \sum$ )
- If $A \sim W_{m} (A ∣ \sum)$ then $C A C^{⊺} \sim W_{m} (C A C^{⊺} ∣ C \sum C^{⊺})$

Large-sample be haviour of $\overset{ˉ}{X}$ and $S$

Let $X_{1}, ..., X_{n}$ be independent observations from any population with mean $μ$ and finite covariance $\sum$ .
Then $n (\overset{ˉ}{X} - μ)$ has an approximate $N_{p} (0, \sum)$ distribution for large sample sizes.
- Hence $n$ should also be large relative to $p$ .
In addition $n (\overset{ˉ}{X} - μ)^{⊺} S^{- 1} (\overset{ˉ}{X} - μ)$ is approximately $χ_{p}^{2}$ for $n - p$ is large
- $p$ is dimension of the space

Evaluating multivariate normality:

Similar to Testing univariate normal distribution but using mean as $\overset{x}{ˉ}$ and covariance matrix $S$
Use squared generalized distances $d_{j}^{2} = (x - \overset{x}{ˉ})^{⊺} S^{- 1} (x - \overset{x}{ˉ}), j = 1, 2, ..., n$
- applies for all variables with dimension $p \geq 2$
When the parent population is multivariate normal and both $n$ and $n - p$ are greater than 30, each of the squared distances $d_{j}^{2}$ should behave like a chi-square random variable.
Although these distances are not independent or exactly chi-square distributed, it is helpful to plot them as if they were. The resulting plot is called a chi-square plot.
Steps:
- find $μ$ by finding all $μ_{i}$
- find $S$ by finding all:
  - $S_{ii} = Va r (X_{i})$ , rmb sample variance
  - $S_{ij} = C o v (X_{i}, X_{j})$ , rmb sample covariance
- Construct chi-square plot:
  - order $d_{j}^{2}$ from smallest to largest
  - Graph the pairs $(q_{c, p} (j - 0.5) / n, d_{(j)}^{2})$
    - where $q_{c, p} (j - 0.5) / n$ is the $100 (j - 0.5) / n$ quantile of the chi-square distribution with $p$ degrees of freedom
    - $q_{c, p} (j - 0.5) / n = χ_{p}^{2} ((n - j + 0.5) / n, p)$ ← use chi square (=CHISQ.INV.RT((n-j+0.5)/n,2))
      - note that j=1,2,…,n
- Find the Correlation Coefficient of the line, it is our test statistics
- Compare it against Chi-squared distribution with n dof, reject if $t s < χ_{n}^{2} (α)$

StrixTheKiet Notes

Explorer

Multivariate Normal Distribution

Definition: $X \sim N_{p} (μ, \sum)$

Proposition:

Multivariate Normal Likelihood:

Maximum likelihood estimation of $μ$ and $\sum$ :

The sampling distribution of $\overset{ˉ}{X}$ and $S$ :

Large-sample be haviour of $\overset{ˉ}{X}$ and $S$

Evaluating multivariate normality:

Graph View

Table of Contents

Backlinks

StrixTheKiet Notes

Explorer

Multivariate Normal Distribution

Definition: X∼Np​(μ,∑)

Proposition:

Multivariate Normal Likelihood:

Maximum likelihood estimation of μ and ∑:

The sampling distribution of Xˉ and S:

Large-sample be haviour of Xˉ and S

Evaluating multivariate normality:

Graph View

Table of Contents

Backlinks

Definition: $X \sim N_{p} (μ, \sum)$

Maximum likelihood estimation of $μ$ and $\sum$ :

The sampling distribution of $\overset{ˉ}{X}$ and $S$ :

Large-sample be haviour of $\overset{ˉ}{X}$ and $S$