N-gram Deceptive Review Classifier

Paste a hotel review and the model computes its perplexity under two language models trained on truthful vs. deceptive reviews. The lower-perplexity model wins.

Unigram model, Laplace (add-1) smoothing. Training set: 256 truthful + 256 deceptive hotel reviews from the Chicago Opinion Spam Dataset.

Truthful Perplexity

Lower = more truthful-like

Deceptive Perplexity

Lower = more deceptive-like

Deceptive / Truthful

Ratio above 1.0 = truthful signal

How it works

Two unigram language models are trained: one on 256 truthful hotel reviews, one on 256 deceptive reviews. Both use Laplace (add-1) smoothing to handle unseen words.

For a new review, perplexity is computed under each model. Perplexity measures how surprised the model is by the text. A truthful-trained model will assign lower perplexity to genuinely truthful reviews and vice versa.

The verdict is determined by which model assigns lower perplexity. Training data: Chicago Opinion Spam Dataset (Ott et al. 2011).

N-gram Review Classifier

How it works