Referrer spam

Have you wondered why you have thousands of page views according to Google Analytics, but Google AdSense only shows a few?

That happened to me. First I thought that Google was mistakingly not counting my web site visitors in Google AdSense.

Here is what I observed
Why do they do it?
How does it work?
How do I block it?
Why is referrer misspelled?

Here is what I observed

Login to your Google Analytics account. Choose AcquisitionAll TrafficReferrals in the Google Analytics page for your web site:

Now select Referral Path as the Secondary dimension.

Notice who your referrals are. Mine loooked like this:

The first one from semalt.com was obviously a crawler. I Googled the web site, and found a lot of negative comments about them.

The next one from buttons-for-website.com could be real users, and I was quite surprised to find that they would have a link to my blog on their main page, so I started to go to the web site. As I was typing their address in, I noticed the following, that caught my eye:

I thought, what is this referral spam?

Why do they do it?

By the report that my Google Analytics plugin is sending to Google, they are trying to make it look like there is a lot of traffic coming from these sites, and thereby making their Google ranking higher.

How does it work?

When you click on a link in a web page, the browser adds a Referer header that tells the receiving end of the link where you came from.
You can observe this yourself if you are using the Google Chrome browser.
Type <Shift + Ctrl + I> (hold the Shift and Ctrl keys while pressing the I letter key).

Click the Network tab.

Now make a Google search, or click on some link. Notice how you see the requests in the Network tab.

Now click on any of the paths listed in the left side, and under Headers on the right side, you should be able to find the Referer or referer header.

How do I block it?

I am using Apache version 2.2, and I was able to block these requests by adding the following to my httpd.conf file:

  RewriteEngine on
  RewriteCond %{HTTP_REFERER} darodar\.com [NC,OR]
  RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC,OR]
  RewriteCond %{HTTP_REFERER} semalt\.com [NC]
  RewriteRule .* - [F]

RewriteEngine on turns the mod_rewrite module on.
RewriteCond sets a condition for rewrite. The %{HTTP_REFERER} says to match on the Referer header in the request. The darodar\.com is the regular expression to match it with.

The stuff in square brackets at the end of the lines are flags. The ones used here are:
NC: No Case, meaning that character case is ignored. Used in RewriteCond statement.
OR: Takes the next RewriteCond statement into consideration too, meaning or.
F: This causes the server to respond with an HTTP error code of 403; Forbidden. Used in the RewriteRule statement.

You can see the full description of mod_rewrite, and how to use it here: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html

If you are using Microsoft's Internet Information Server; IIS, there is a good description here about how to create rewrite rules: http://www.iis.net/learn/extensions/url-rewrite-module/using-the-url-rewrite-module

Basically what you want is something like this in your web.config file:

<?xml version="1.0" encoding="utf-8" ?>  
<configuration>  
  <system.webServer>
    <rewrite>
      <rules>
        <rule name="block semalt referer" patternSyntax="Wildcard" stopProcessing="true">
          <match url="*" />
          <conditions>
            <add input="{HTTP_REFERER}" pattern="*.semalt.com" />
          </conditions>
          <action type="AbortRequest" />
        </rule>
      </rules>
    </rewrite>
  </system.webServer>
</configuration>  

Repeat the rule element for each referrer you want to block.

Why is referrer misspelled?

When Phillip Hallam-Baker proposed the use of a Referer header, it was misspelled. No one noticed until the header had been implemented on thousands of servers, and it was too late to change it.

Tim Berners-Lee was the head of the HTTP Working Group at the Internet Engineering Task Force when the mistake was documented and solidified. Mark Nottingham is the current chair.

You can find the Hypertext Transfer Protocol Working Group here: http://datatracker.ietf.org/wg/httpbis