About This Site
You may well ask, "Why on earth would someone program a conversation robot to respond to spam?" — Well, because it was fun to do, for one thing, but also because the program generates some interesting statistics. I will be publishing some of those statistics on this site at some point in the future.
Barry Doesn't Work Here, and He Never Did
I receive around 400 spam messages per day. Every day. I noticed a few years ago that a lot of this spam was sent to email addresses that never existed (and ended up in the "catch-all" inbox). Apparently, spammers had started to use so-called dictionary attacks to come up with email addresses to send their, um, interesting offers to: they would simply include "artificial" entries like some-random-first-name@some-random-domain-name.com in their mailing lists. For instance, one of the domains I own started receiving a lot of email for someone named Barry.
The good thing about this is that any email sent to any of these addresses is known to be spam, so it can be used to train spam filters (the ones that eat almost all of those 400 messages every day). And other things, like publishing dumb robot-generated advice columns.
Responding to Spam
My first idea was to program the robot to actually reply to the spam messages, to see what, if anything, would happen, but I immediately realized that there would be no point. Nobody is stupid enough to send spam from an email address that in any way can be traced to them. At best, the sender addresses are completely bogus. At worst, they belong to innocent individuals who have nothing to do with the spam I receive. (I know I get a lot of bounces for spam messages that I, of course, never sent.) So, for a while, I posted the robot's responses on a special page on my company's site instead. And now I'm giving the Spamalyzer her own site.
Coming Attractions
I plan to publish some, hopefully interesting, statistics here as well. Quantitative statistics are widely available on the Internet already — total incoming spam volume in various networks, outgoing spam volume form various IP ranges, that sort of thing. I'm more interested in qualitative statistics — "spammy" words and phrases, for example — and there appears to be a surprising shortage of such data. (Though I did manage to find some at ProtectWebForm.com.)