How to Install and Automate Referrer Spam Protection on an NGINX Server via the CLI

Install and Automate Referrer Spam Protection on an NGINX Server via the CLI

How to Install and Automate Referrer Spam Protection on an NGINX Server via the CLI

Introduction

Referrer spam is a common nuisance that can pollute your web analytics and clog up your server logs. In this tutorial, I’ll demonstrate how to block referrer spam on your NGINX server by leveraging a community-maintained referrer spam list from Matomo. I’ll guide you through cloning the Matomo Referrer Spam List, processing it into NGINX configuration rules, and automating the update process every 14 days using systemd timers.


Audience & Environment

Important: This tutorial is intended for users with root-level access to their web server. It is specifically tailored for a Debian-based server running NGINX. If you are not comfortable with using the command line as the root user, or if your server does not use Debian or NGINX, you may need to adapt these instructions to suit your environment.


Prerequisites

Before you begin, ensure your server meets the following requirements:

  • Operating System: Debian 12 (or another Debian-based distribution)
  • Web Server: NGINX
  • User Privileges: Root access or sudo privileges
  • Required Packages:
    • nginx – The web server software.
    • git – For cloning the referrer spam list repository.
    • rsync – For synchronizing files safely.
    • systemd – For managing services and timers (comes pre-installed on Debian 12).
    • bash – The shell in which the script runs (typically installed by default).

To install any missing packages on Debian-based systems, you can run:

sudo apt-get update
sudo apt-get install git rsync

Directory Structure and Permissions

For security, the referrer spam list will reside in /etc/nginx/referrer-spam-list. This directory is owned by root, ensuring that only privileged users can modify its contents. A typical listing of the directory might look like:

drwx------ 4 root root  4096 Mar 30 13:08 .
drwxr-xr-x 9 root root  4096 Mar 12 09:50 ..
drwxr-xr-x 8 root root  4096 Mar 30 13:08 .git
drwxr-xr-x 2 root root  4096 Mar 30 13:08 .github
-rw-r--r-- 1 root root   865 Mar 30 13:08 CONTRIBUTING.md
-rw-r--r-- 1 root root  4291 Mar 30 13:08 README.md
-rw-r--r-- 1 root root   254 Mar 30 13:08 composer.json
-rw-r--r-- 1 root root 37045 Mar 30 13:08 spammers.txt

While these permissions are typically sufficient (especially since /etc/nginx is not web-accessible), you might consider further restricting sensitive directories like .git and .github if desired.


The Update Script

We will create a Bash script named makespammer.sh that:

  1. Downloads the latest referrer spam list from GitHub.
  2. Processes the list to generate NGINX configuration rules.
  3. Tests and reloads the NGINX configuration if the test passes.

Handling Repository Updates Safely

Instead of deleting the current referrer spam list (which might leave your server unprotected if the clone fails), we’ll clone the repository into a temporary directory and use rsync to update the active directory only if the clone is successful.

The Script Code

Below is the complete script with inline comments explaining each section:

#!/bin/bash
# makespammer.sh
# This script updates the referrer spam list and configures NGINX to block spam referrers.

# Define the target directory for the referrer spam list
TARGET_DIR="/etc/nginx/referrer-spam-list"

# Create a temporary directory for safely cloning the repository
tmpdir=$(mktemp -d)

echo "Cloning the referrer spam list from GitHub..."
# Attempt to clone the repository into the temporary directory
if git clone https://github.com/matomo-org/referrer-spam-list.git "$tmpdir"; then
    echo "Repository cloned successfully."
    # Use rsync to synchronize the temporary directory with the target directory
    # The --delete flag removes files that no longer exist in the source, keeping the list in sync
    rsync -a --delete "$tmpdir/" "$TARGET_DIR/"
else
    echo "Failed to update the referrer spam list from remote repository."
    echo "Keeping the existing list to ensure protection."
fi

# Clean up the temporary directory
rm -rf "$tmpdir"

# Define the NGINX snippet file path
NGINX_SNIPPET="/etc/nginx/snippets/referer_spam.conf"

echo "Updating NGINX configuration snippet..."
# Clear the existing configuration snippet file to avoid duplicates
> "$NGINX_SNIPPET"

# Process the spammers list and append rules to the configuration snippet
# The sed command escapes dots for regex matching in NGINX configuration
sort "$TARGET_DIR/spammers.txt" | uniq | sed 's/\./\\\\./g' | while read host; do
    echo "if (\$http_referer ~ '$host') { return 403; }" >> "$NGINX_SNIPPET"
done

echo "Testing NGINX configuration..."
# Test the updated NGINX configuration and reload if the test passes
if nginx -t; then
    echo "NGINX configuration test passed. Reloading NGINX..."
    systemctl reload nginx
else
    echo "NGINX configuration test failed. Please review the changes."
fi

Key Points:

  • Safe Update:
    The script clones the repository into a temporary directory and then uses rsync to update the active directory. This avoids removing the current spam list if the clone fails.
  • NGINX Snippet Generation:
    It processes spammers.txt to generate NGINX configuration rules, escaping periods (.) for proper regex matching.
  • NGINX Reload:
    The script tests the configuration with nginx -t before reloading NGINX, ensuring that no errors are introduced.

How the referer_spam.conf Snippet Is Included in NGINX

To activate the referrer spam protection, you need to include the referer_spam.conf snippet inside each server block (or in a common config if you’re using a shared structure like default.conf).

Here’s the line you should add inside your NGINX server block(s):

include snippets/referer_spam.conf;

Example:

server {
    listen 80;
    server_name example.com;

    include snippets/referer_spam.conf;

    location / {
        try_files $uri $uri/ =404;
    }
}

Why This Is Useful

Using a snippet like referer_spam.conf is a modular and reusable approach that provides several benefits—especially when hosting multiple sites (virtual hosts) on a single server:

  • Centralized Management: You only need to maintain and update the spam-blocking logic in one place.
  • Consistency Across Sites: Ensures every domain or subdomain is equally protected from spam referrers.
  • Reduced Errors: Keeps your server blocks clean and easier to read, minimizing the chance of copy-paste mistakes.
  • Flexible Automation: Our script overwrites the snippet directly—so your changes propagate to all server blocks on reload.

In short, by using include snippets/referer_spam.conf; in your server blocks, you maintain clean, maintainable configs while benefiting from powerful and centralized spam protection logic.


Automating the Script with Systemd Timers

Manual updates can be error-prone, so we’ll automate the update process using systemd timers. This ensures that your spam protection is kept up-to-date every 14 days.

Step 1: Create the Service Unit

Create a file at /etc/systemd/system/referrer-spam-update.service with the following content:

[Unit]
Description=Update NGINX Referrer Spam List

[Service]
Type=oneshot
ExecStart=/usr/local/bin/makespammer.sh

Make sure to place your makespammer.sh script in /usr/local/bin/ and mark it as executable.

Step 2: Create the Timer Unit

Next, create the timer file at /etc/systemd/system/referrer-spam-update.timer:

[Unit]
Description=Run referrer spam update script every 14 days

[Timer]
OnBootSec=5min
OnUnitActiveSec=14d
Persistent=true

[Install]
WantedBy=timers.target

Timer Explanation:

  • OnBootSec=5min:
    The timer waits 5 minutes after boot before running the script.
  • OnUnitActiveSec=14d:
    The script is scheduled to run 14 days after the last execution.
  • Persistent=true:
    If the scheduled time is missed (e.g., because the server was offline), the service runs immediately after boot.

Step 3: Enable and Start the Timer

Reload systemd and enable the timer with:

sudo systemctl daemon-reload
sudo systemctl enable --now referrer-spam-update.timer

You can check the status and next run time by listing timers:

sudo systemctl list-timers --all

Conclusion

By following these steps, you now have a robust solution to protect your NGINX server from referrer spam. This tutorial has guided you through:

  • Downloading and integrating the Matomo Referrer Spam List into your server configuration.
  • Creating a secure update script that only updates your configuration if the remote repository is available.
  • Automating the update process using systemd timers to run the script every 14 days.

This setup is designed for administrators with root-level access on Debian-based servers running NGINX. With the required packages (nginx, git, rsync, and systemd) installed, your server will remain safeguarded against referrer spam, keeping your logs and analytics clean.

References:

By William McGill

William McGill is an IT veteran and independent contractor with over 20 years of experience in technology, networking, and insurance. He blends tech expertise with real-world problem-solving, working across industries from flood insurance claims to system administration. While most of his writing focuses on tech, freelancing, and adapting to an ever-evolving digital landscape, he occasionally explores topics that simply spark his curiosity—because life isn’t just about work, but about being human.

Leave a comment

Your email address will not be published. Required fields are marked *

William McGill