Definition:
- Input: an email
- Output: Spam/ham
- Setup:
- Get a large collection of example emails, each labeled “spam” or “ham” (by hand)
- Want to learn to predict labels of new, future emails
- Features: The attributes used to make the ham / spam decision
- Words: FREE!, limited,…
- Text Patterns: $dd (dollar sign with nb), CAPS
- Non-text: SenderInContacts, WidelyBroadcast
- …