Definition:

  • Input: an email
  • Output: Spam/ham
  • Setup:
    • Get a large collection of example emails, each labeled “spam” or “ham” (by hand)
    • Want to learn to predict labels of new, future emails
    • Features: The attributes used to make the ham / spam decision
      • Words: FREE!, limited,…
      • Text Patterns: $dd (dollar sign with nb), CAPS
      • Non-text: SenderInContacts, WidelyBroadcast

two