Referrer Spam

So, we’re back up after exceeding our allotted bandwidth. A victim of our own success? Actually, no. A victim of referrer spammers.

Over the last couple months, we’ve noticed that almost all of our top referrers are referrer spam sites, accounting for, we would guess, about 30 percent of our traffic. And there aren’t just a couple that we can easliy block. There are new deceptively named referrer spam sites that seem to pop up every day. In practice, it’s like a DOS attack. Any advice on how to deal with this would be appreciated.

13 Responses to “Referrer Spam”


  1. I don’t have any good answers, but you might want to post this over at ArborBlogs too. I did find a post about killing referrer spam, which is helpful if you are getting it all from one URL.

    For reducing bandwidth usage, you may want to try adding mod_deflate to your .htaccess.

    How much transfer do you get per month?


  2. and here I thought it was operatives from the OFW Association…:)


  3. Wait, color me stupid. How does this work? What’s the point?
    I mean, from what I would guess, their goal is to increase their search engine ranking by inserting fake referals from your site to theirs, right? And since that’s a large part of how Google ranks people, that’s probably effective, right?
    Since they probably spider Google too, why doesn’t Google set up a fake blog and then just disallow any URLs that use this technique, thus denying them their goal of a higher page rank?
    I mean, unless I’m totally off on what they’re doing. Then, if I am, just tell me.


  4. Putting MTBlacklist is always a good first step.

    If you have the server horesepower, this guy also has a script that uses MTBlacklist to auto-add referrer spammers to your blacklist.

    This article has some ideas as well.


  5. You could probably use a more robust host. What kind of monthly traffic does this site do (referral spam included)?


  6. No, no, referrer log spam is based on the idea that every webmaster is anxious to find out about incoming links. When a site shows up in the referrer log, it implies that a human followed a link on the referring site to your site.

    Sometimes sites show up in the referrer log by accident. In some web browsers, if a person is at site A, and types in the URL for unrelated site B, the hit to site B (as reported in B’s log) will give site A as the referrer. It is also possible to control what your browser provides as a referring site. But most people are not even aware that their browser is sharing this information with every web site they visit.

    Naturally, when a new site shows up in the referrer log, many webmasters are strongly inclined to check it out. After all, it might be someone reviewing or commenting on your site!

    Using spiders that mimic the behavior of human browsers, spammers put their sites in millions of web sites’ referrer logs — and get rewarded with millions of hits.


  7. Wow, that’s elaborate when you think that most webmasters are savvy enough to say “Gee, I doubt that the 300 hits from ‘emortgagepenisenlargment.com’ were all here for my content…”
    I mean, I would figure this must have been much more effective when only 10-20 sites were doing it, and they were hitting a bigger swath of the web. Because, y’know, I can’t imagine someone being fooled more than once by this.
    But hey, maybe I have more faith in webmasters than I should.


  8. js - I think most of it is trying to hit sites that have the referrers linked from somewhere on the web, so that the linked site gets PageRank from the site being spammed.

    They don’t bother to check if the site is displaying referrers though, whichs is how sites like AAiO get caught in the crossfire.


  9. AAiO: I’ve begun aggressive use of Mod_rewrite in my .htaccess to counter this. Use regexes to identify spammers (urls that contain “-foo-”, for example, are good candidates), then rewrite the URL to the spammer’s own URL, sending the spammer back where he came from, to use up his bandwidth. As the coup de grace, feed the client an http 301 status code (”file permanently moved”) with a new address of his own address, telling the bot that, instead of ever coming to your page, he should just go straight to the new address instead–preventing even the small bandwidth hit of taking incoming requests and rewriting them. (alternately, you could rewrite everything to php-soft.net, which appears to be a website that sells lists of referrer/comment-spammable blogs.

    My spam referrals and comment spam have dropped immensely since I started this, though they do come up with new domains at a regular pace. I only wish I could claim credit for this approach.


  10. Thanks for the suggestions, everyone. Murph, when I first started getting referrer spam, it was all stuff like freeviagra.info, but now it’s random URLs like crepesuzette.com. The language of spam sites appears not to be a regular language. Although there are so many poker sites this month that filtering for that might help.

    (thanks for the monkeyfilter post, js - didn’t see it till I checked my referrers.)


  11. Well, obviously, 300 hits listing an obvious spammer site as referrer are aimed at sites which automatically link to top referring sites. But I have never had such automatic links, and I don’t see that kind of spam.

    Rather, in my referrer log, I find plausible-looking URLs which turn out to point to pages of nothing but banner ads. Presumably the advertisers pay per impression, so (unless images are disabled) you’re enriching the spammer just by following the link.


  12. Try the de-spamming service project honeypot. You can sign up for free–I did; shazam, spam gone.


  13. Referrer spam is an attempt to game Google’s PageRank system. The more links you have out there, the higher your site gets listed in Google’s returns. Or so the theory goes. Spammers don’t care about your site or who follows their link (or even if they follow it); they only care that Google’s robots register it. Easiest way to get your links spread all across the web: put them in blog comments. Quick, easy, don’t have to ask anyone.

    Google, working with Six Apart, Live Journal and others, recently came up with the NoFollow concept. Basically, putting the NoFollow tag in your web page’s header information tells Googlebots “that a particular link shouldn’t be factored into their PageRank calculations.” So, that’s one option.

    The other option is a bit painful, but it certainly worked wonders for our site; trash Moveable Type and go with something like Textpattern. I was forced to do so when my hosting provider announced they would no longer allow MT on their servers, thanks mainly to comment spamming. After this forced switch, I’m pleased to say that we have not received one comment spam in the one year we’ve been using Textpattern, which I also think is easier to use.

    FWIW.