Definition:

  • Features: is the word at position
  • Predict label conditioned on feature variables
  • Assume features are conditionally independent given label
Model:
Prediction:
Parameters:
  • for each word there is probablity given the class
  • ex: Spam Email Filter:
    • MLE for Naive Bayes Spam Classifier:
    • Find a single parameter for each word as
    • Aware of Fitting: Problems with relative-frequency parameters
      • Unlikely to see occurrences of every words in training data.
      • Likely to see occurrences of a word for only 1 class in training data.

Parameter estimation:

  •  means the probability of event x occurring, given the parameter  θ.
with maximum likelihood:
  • Estimating the distribution of a random variable
  • Empirically: use training data (learning!)
  • E.g.: red and blue
    • For a simple example of guessing if a bean is red/blue, the parameter is the probability of it being red/blue
    • for each outcome x, look at the empirical rate of that value:
    • Maximum Likelihood Estimation
      • a function that assigns a value to different possible parameter values based on how well they explain the observed data.
      • Higher values of indicate that the parameter value θ is more likely to have resulted in the observed data
  • General cases of n outcomes:
    • Flips are independent and identially distributed
    • is sequence of data ,
    • Hypothesis space: binomial distributions
    • Learning: finding q which is optimal
    • MLE solve: choose that maximize
    • ex: 2 heads and 1 tail
    • for observations:

Smoothing:

Laplace Smoothing:
  • Laplace’s estimate:
    • Pretend you saw every outcome once more than you actually did
    • with Dirichlet priors
  • Laplace’s extended estimate:
    • Pretend you saw every outcome k times more than you actually did
    • where is the strenth of the prior
  • Laplace for conditionals:
    • Smooth each condition independently:

Naive Bayes classifier: