Spam Classifier

Paste an email and the model scores it as spam or not using a Naive Bayes classifier trained on the Enron corpus. All inference runs in your browser, no server involved.

Binary Naive Bayes with Laplace smoothing. Trained on 5,000 spam + 5,000 ham emails from the Enron dataset. Vocabulary: 8,000 most frequent words.

How it works

A Naive Bayes classifier is trained on 10,000 emails from the Enron spam dataset. For each email, the model scores the probability that the email belongs to the spam or ham class by summing log-likelihoods over the words present in the message.

Binary bag-of-words representation: each word is counted once per email regardless of frequency. Laplace smoothing handles words not seen during training. Model weights are pre-computed and loaded as JSON at startup.

Training data source: Enron email corpus spam subset. The original classifier used Apache Spark for distributed processing; this demo runs the scoring step entirely in the browser.

Email Spam Classifier

How it works