Monitoring Web Page Changes with Python

There are times when I need to know that a web page has changed without actively watching it. That might be a policy page, documentation I rely on, pricing information, or even a competitor’s landing page. Checking these pages manually is easy at first, but it does not scale well. You forget to check. You check too late. Or you end up refreshing pages that never change.

What I wanted was something quiet and predictable. A small Python script that runs when I tell it to run, checks a page, remembers what it saw last time, and lets me know when something is different. No accounts. No third party services. No dashboards. Just a local tool that does one job and does it well.

This article walks through building a simple web page change monitor using Python. The script fetches a page, generates a cryptographic fingerprint of its contents, compares it to the previous run, and sends a desktop notification when a change is detected. It is intentionally straightforward and easy to modify, which makes it useful long after you forget how it was originally written.

Why I Built This

Hosted monitoring services solve this problem, but they tend to introduce friction. Some require subscriptions. Others limit how often checks can run. Many store your monitoring data remotely. For something as basic as “tell me when this page changes,” that felt unnecessary.

I already run Linux everywhere. I already use cron. I already have Python available. That makes a local script the most direct solution. It also means I can inspect every line of code, adjust behavior when needed, and trust exactly what the script is doing.

This approach fits especially well if you already maintain servers, workstations, or automated tasks. It becomes just another small utility in your toolbox.

How the Monitor Thinks

The script follows a simple mental model. It does not attempt to understand HTML structure or semantic meaning. It treats the page as plain text and answers one question. Is this content the same as last time?

It accomplishes this by hashing the page contents using SHA 256. Hashes are extremely sensitive to change. Even a single character difference produces a completely different result. That makes them ideal for detecting updates without storing the entire page.

The script remembers the last hash it saw by writing it to a JSON file. On the next run, it calculates a new hash and compares the two. If they differ, a notification is sent and the new hash is saved.

That is the entire loop. Simple, reliable, and easy to reason about.

Prerequisites

This script assumes you are running Python 3 and that you are comfortable executing scripts from the command line. It also assumes a Linux environment where notify-send is available for desktop notifications.

The only external Python dependency is requests. If it is not installed already, it can be installed with pip.

pip install requests

Desktop notifications on most Linux systems are handled by libnotify. If notify-send is missing, install it using your package manager.

sudo apt install libnotify-bin

The script will still work without notify-send, but notifications will fall back to terminal output.

Breaking the Script Down

Before looking at the complete script, it helps to walk through it piece by piece. Each section builds on the previous one, and seeing the code first makes the intent clearer.

Imports and Setup

The script begins with a standard Python shebang and a set of imports.

#!/usr/bin/env python3
import hashlib
import json
import os
import requests
import subprocess
import sys
from datetime import datetime
from pathlib import Path

Each import serves a specific purpose. hashlib is used to generate a SHA 256 hash of the page content. json handles configuration and state storage. os is used to check whether files exist. requests fetches the web page. subprocess runs notify-send for desktop alerts. sys allows command line arguments. datetime provides timestamps for logging and notifications.

Pathlib is imported but not strictly required in the current version. It is useful if you later want to normalize paths or expand the script.

The PageMonitor Class

The entire script is wrapped inside a PageMonitor class.

class PageMonitor:
    def __init__(self, config_file='page_monitor_config.json'):
        self.config_file = config_file
        self.config = self.load_config()
        self.state_file = self.config.get('state_file', 'page_monitor_state.json')

Using a class keeps everything organized and makes it easier to extend the script later. When the class is initialized, it loads the configuration file and determines which state file should be used. If the config does not specify a state file, a default name is used.

This design allows multiple monitors to run side by side using different configuration files without modifying the script.

Loading the Configuration

The configuration file is mandatory. If it does not exist, the script creates an example and exits.

    def load_config(self):
        """Load configuration from JSON file"""
        if not os.path.exists(self.config_file):
            print(f"Error: Config file '{self.config_file}' not found")
            print("Creating example config file...")
            self.create_example_config()
            sys.exit(1)

        with open(self.config_file, 'r') as f:
            return json.load(f)

This behavior is intentional. It prevents silent misconfiguration and ensures that the script always runs with explicit settings. Creating the example config also provides documentation by example.

Creating an Example Config

When the config file is missing, this method writes a usable template.

    def create_example_config(self):
        """Create an example configuration file"""
        example_config = {
            "url": "https://example.com",
            "notification": {
                "enabled": True,
                "title": "Page Change Detected",
                "urgency": "normal",
                "timeout": 10000
            },
            "state_file": "page_monitor_state.json",
            "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        }

        with open(self.config_file, 'w') as f:
            json.dump(example_config, f, indent=4)

        print(f"Example config created: {self.config_file}")
        print("Please edit this file with your settings")

The url field is the page being monitored. The notification section controls desktop alerts. The state_file determines where the last known hash is stored. The user_agent helps avoid being blocked by sites that reject default HTTP clients.

Fetching the Page Content

Once configured, the script retrieves the page contents.

    def get_page_content(self):
        """Fetch the web page content"""
        url = self.config['url']
        headers = {
            'User-Agent': self.config.get('user_agent', 'Mozilla/5.0')
        }

        try:
            response = requests.get(url, headers=headers, timeout=30)
            response.raise_for_status()
            return response.text
        except requests.RequestException as e:
            print(f"Error fetching page: {e}")
            return None

A timeout prevents the script from hanging indefinitely. raise_for_status ensures that HTTP errors are treated as failures rather than false positives. The method returns the page content as a string or None if something goes wrong.

Hashing the Content

The next step is generating a fingerprint of the page.

    def calculate_hash(self, content):
        """Calculate SHA-256 hash of content"""
        return hashlib.sha256(content.encode('utf-8')).hexdigest()

This single line is the core of change detection. Any difference in content produces a different hash. There is no need to store the page itself, only the hash.

Loading and Saving State

The script remembers the last hash it saw by writing it to disk.

    def load_state(self):
        """Load previous state from file"""
        if not os.path.exists(self.state_file):
            return None

        try:
            with open(self.state_file, 'r') as f:
                return json.load(f)
        except json.JSONDecodeError:
            return None

If the state file does not exist or is corrupted, the script treats the run as a first execution.

Saving state writes the new hash along with a timestamp.

    def save_state(self, content_hash):
        """Save current state to file"""
        state = {
            'hash': content_hash,
            'timestamp': datetime.now().isoformat(),
            'url': self.config['url']
        }

        with open(self.state_file, 'w') as f:
            json.dump(state, f, indent=4)

This file becomes the point of comparison for future runs.

Sending Notifications

When a change is detected, the script sends a desktop notification.

    def send_notification(self, old_hash, new_hash):
        """Send desktop notification about the change"""
        if not self.config.get('notification', {}).get('enabled', True):
            return

        notification_config = self.config.get('notification', {})
        title = notification_config.get('title', 'Page Change Detected')
        urgency = notification_config.get('urgency', 'normal')
        timeout = notification_config.get('timeout', 10000)

        message = f"{self.config['url']}\n\nThe page content has changed at {datetime.now().strftime('%I:%M %p EST')}"

        try:
            # Use notify-send for desktop notifications
            subprocess.run([
                'notify-send',
                '-u', urgency,
                '-t', str(timeout),
                title,
                message
            ], check=True)
            print("Desktop notification sent successfully")
        except subprocess.CalledProcessError as e:
            print(f"Error sending notification: {e}")
        except FileNotFoundError:
            print("notify-send not found. Install libnotify-bin package.")
            print(f"Change detected at: {datetime.now().strftime('%Y-%m-%d %I:%M:%S %p EST')}")

If notify-send is missing, the script falls back to terminal output. This makes it usable on servers without a graphical environment.

Checking for Changes

This method ties everything together.

    def check_for_changes(self):
        """Main monitoring logic"""
        print(f"Checking page: {self.config['url']}")

        content = self.get_page_content()
        if content is None:
            print("Failed to fetch page content")
            return

        current_hash = self.calculate_hash(content)
        previous_state = self.load_state()

        if previous_state is None:
            print("First run - saving initial state")
            self.save_state(current_hash)
            return

        previous_hash = previous_state.get('hash')

        if current_hash != previous_hash:
            print("Change detected!")
            self.send_notification(previous_hash, current_hash)
            self.save_state(current_hash)
        else:
            print("No changes detected")

The first run saves the initial state and exits quietly. Subsequent runs compare hashes and notify only when a change occurs.

main() and Command Line Support

This is the entry point that makes the script convenient to run.

def main():
    config_file = 'page_monitor_config.json'

    if len(sys.argv) > 1:
        config_file = sys.argv[1]

    monitor = PageMonitor(config_file)
    monitor.check_for_changes()


if __name__ == '__main__':
    main()

By default, it looks for page_monitor_config.json in the current directory. If I pass a filename as the first command line argument, it will use that instead. That small detail matters because it lets me run multiple monitors with different config files without copying the script.

The if __name__ == '__main__': guard ensures the script only runs when executed directly. If I ever import this file into another Python project, it will not automatically start monitoring, which keeps things clean and predictable.

The Complete Script

Here is the full script exactly as discussed above.

#!/usr/bin/env python3
import hashlib
import json
import os
import requests
import subprocess
import sys
from datetime import datetime
from pathlib import Path


class PageMonitor:
    def __init__(self, config_file='page_monitor_config.json'):
        self.config_file = config_file
        self.config = self.load_config()
        self.state_file = self.config.get('state_file', 'page_monitor_state.json')

    def load_config(self):
        """Load configuration from JSON file"""
        if not os.path.exists(self.config_file):
            print(f"Error: Config file '{self.config_file}' not found")
            print("Creating example config file...")
            self.create_example_config()
            sys.exit(1)

        with open(self.config_file, 'r') as f:
            return json.load(f)

    def create_example_config(self):
        """Create an example configuration file"""
        example_config = {
            "url": "https://example.com",
            "notification": {
                "enabled": True,
                "title": "Page Change Detected",
                "urgency": "normal",
                "timeout": 10000
            },
            "state_file": "page_monitor_state.json",
            "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        }

        with open(self.config_file, 'w') as f:
            json.dump(example_config, f, indent=4)

        print(f"Example config created: {self.config_file}")
        print("Please edit this file with your settings")

    def get_page_content(self):
        """Fetch the web page content"""
        url = self.config['url']
        headers = {
            'User-Agent': self.config.get('user_agent', 'Mozilla/5.0')
        }

        try:
            response = requests.get(url, headers=headers, timeout=30)
            response.raise_for_status()
            return response.text
        except requests.RequestException as e:
            print(f"Error fetching page: {e}")
            return None

    def calculate_hash(self, content):
        """Calculate SHA-256 hash of content"""
        return hashlib.sha256(content.encode('utf-8')).hexdigest()

    def load_state(self):
        """Load previous state from file"""
        if not os.path.exists(self.state_file):
            return None

        try:
            with open(self.state_file, 'r') as f:
                return json.load(f)
        except json.JSONDecodeError:
            return None

    def save_state(self, content_hash):
        """Save current state to file"""
        state = {
            'hash': content_hash,
            'timestamp': datetime.now().isoformat(),
            'url': self.config['url']
        }

        with open(self.state_file, 'w') as f:
            json.dump(state, f, indent=4)

    def send_notification(self, old_hash, new_hash):
        """Send desktop notification about the change"""
        if not self.config.get('notification', {}).get('enabled', True):
            return

        notification_config = self.config.get('notification', {})
        title = notification_config.get('title', 'Page Change Detected')
        urgency = notification_config.get('urgency', 'normal')
        timeout = notification_config.get('timeout', 10000)

        message = f"{self.config['url']}\n\nThe page content has changed at {datetime.now().strftime('%I:%M %p EST')}"

        try:
            # Use notify-send for desktop notifications
            subprocess.run([
                'notify-send',
                '-u', urgency,
                '-t', str(timeout),
                title,
                message
            ], check=True)
            print("Desktop notification sent successfully")
        except subprocess.CalledProcessError as e:
            print(f"Error sending notification: {e}")
        except FileNotFoundError:
            print("notify-send not found. Install libnotify-bin package.")
            print(f"Change detected at: {datetime.now().strftime('%Y-%m-%d %I:%M:%S %p EST')}")

    def check_for_changes(self):
        """Main monitoring logic"""
        print(f"Checking page: {self.config['url']}")

        content = self.get_page_content()
        if content is None:
            print("Failed to fetch page content")
            return

        current_hash = self.calculate_hash(content)
        previous_state = self.load_state()

        if previous_state is None:
            print("First run - saving initial state")
            self.save_state(current_hash)
            return

        previous_hash = previous_state.get('hash')

        if current_hash != previous_hash:
            print("Change detected!")
            self.send_notification(previous_hash, current_hash)
            self.save_state(current_hash)
        else:
            print("No changes detected")


def main():
    config_file = 'page_monitor_config.json'

    if len(sys.argv) > 1:
        config_file = sys.argv[1]

    monitor = PageMonitor(config_file)
    monitor.check_for_changes()


if __name__ == '__main__':
    main()

Running the Monitor Automatically

Once the script is working, it becomes much more useful when run automatically. Cron is the simplest option. A job that runs every 10 or 15 minutes is usually sufficient.

Running it under your user account allows desktop notifications to appear normally. On a server, redirecting output to a log file is often enough.

Dealing with Dynamic Pages

Some pages change constantly due to timestamps, ads, or tracking scripts. In those cases, hashing the entire page may produce frequent false positives. The simplest workaround is to modify the script to strip known dynamic sections before hashing.

Using BeautifulSoup to extract only specific elements is a natural extension if you need more control.

Why This Script Holds Up

This monitor works because it avoids unnecessary complexity. It does not attempt to interpret meaning. It does not depend on external services. It does not assume anything about the page structure beyond the fact that it is text.

That simplicity makes it reliable. It also makes it easy to adapt. Whether you run it once a day or every five minutes, the behavior remains predictable.

Tools like this rarely get attention once they are written, but they quietly solve problems for years. That is usually a sign that they were designed correctly.