Building Gatehouse: A REST API for OpenBSD's PF Firewall

Every operation on my home firewall used to start with ssh gw and end with pfctl -f /etc/pf.conf. It works — but it doesn’t scale when you want to toggle YouTube blocking from your phone, give family members limited control, or automate device-level parental controls.

So I built Gatehouse: a REST API that wraps PF operations in authenticated, rate-limited HTTP endpoints.

The Setup

I run a multi-VLAN home network on OpenBSD 7.8 with six network segments, content filtering, device-level blocking, and a growing collection of PF anchors. The configuration lives in /etc/pf.d/ across dozens of files, managed through a Makefile with 80+ targets. It works well from a terminal — but I wanted something I could hit from a browser or a script.

Architecture Overview

Gatehouse sits between clients and the PF firewall. It doesn’t replace pfctl — it wraps it. Every operation ultimately calls a system command (pfctl, tcpdump, make, shell scripts) through a centralised executor, validates inputs, and returns structured JSON.

┌─────────────┐    HTTPS/TLS    ┌────────────────────┐  HTTP   ┌───────────────┐
│   Browser   │ ──────────────► │       relayd       │ ──────► │   Gatehouse   │
│   Script    │   :8443 (TLS)   │ (TLS termination)  │  :8080  │   (FastAPI)   │
│   App       │                 │                    │         │               │
└─────────────┘                 └────────────────────┘         └───────┬───────┘
                                                                       │
                                                          ┌────────────┼────────────┐
                                                          │            │            │
                                                     ┌────▼───┐  ┌────▼───┐  ┌────▼────┐
                                                     │ pfctl  │  │tcpdump │  │ scripts │
                                                     │  (PF)  │  │(pflog) │  │ (make)  │
                                                     └────────┘  └────────┘  └─────────┘

TLS Termination with relayd

The API listens on 127.0.0.1:8080 — localhost only. OpenBSD’s built-in relayd handles TLS termination and reverse proxying:

http protocol "gatehouse" {
    match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
    match request header append "X-Forwarded-Port" value "$REMOTE_PORT"
    match request header set "X-Forwarded-Proto" value "https"
    match response header remove "Server"
    tls keypair "gw.home.arpa"
    tcp { nodelay, socket buffer 65536 }
    websockets
}

relay "gatehouse-tls" {
    listen on 192.168.1.2 port 8443 tls
    forward to <gatehouse> port 8080
}

TLS without baking certificate handling into the application, WebSocket proxying for live monitoring, and automatic X-Forwarded-For headers for audit logging.

Service Isolation

Gatehouse runs as the _gatehouse system user via OpenBSD’s rc.d framework:

daemon="/usr/local/share/gatehouse/venv/bin/python3"
daemon_flags="-m uvicorn app:app --host 127.0.0.1 --port 8080 --log-level info"
daemon_execdir="/usr/local/share/gatehouse"
daemon_user="_gatehouse"
daemon_logger="daemon.info"

The _gatehouse user has doas permissions for exactly the commands it needs — pfctl, tcpdump, rcctl, crontab, and make in /etc/pf.d. Nothing else.

The Technology Stack

  • Python 3 + FastAPI — async-capable, automatic OpenAPI docs, Pydantic validation

  • Uvicorn — ASGI server

  • python-jose — JWT token creation and validation

  • passlib — bcrypt password hashing (original auth)

  • ctypes — BSD auth_userokay() integration (current auth)

  • relayd — TLS reverse proxy (native OpenBSD)

FastAPI was chosen because it generates Swagger documentation at /docs automatically. Every endpoint, request body, and response model is documented without writing a single line of API docs.

Authentication: From JSON Files to BSD Auth

The authentication system went through an evolution during development.

Version 1: JSON User Database

The initial implementation stored users in a users.json file with bcrypt-hashed passwords:

{
  "admin": {
    "password_hash": "$2b$12$..."
  }
}

A manage_users.py CLI handled user CRUD. Simple, portable, but it meant maintaining a separate user database alongside the system’s own accounts.

Version 2: Native BSD Authentication

The current version uses OpenBSD’s auth_userokay(3) — the same function that validates passwords for SSH, console login, and every other authentication on the system. One user database, one password policy, one source of truth.

import ctypes

_libc = ctypes.CDLL("libc.so")
_libc.auth_userokay.restype = ctypes.c_int
_libc.auth_userokay.argtypes = [
    ctypes.c_char_p,  # name
    ctypes.c_char_p,  # style
    ctypes.c_char_p,  # type
    ctypes.c_char_p,  # password (zeroed by callee)
]

def bsd_auth_verify(username: str, password: str) -> bool:
    pw_buf = ctypes.create_string_buffer(password.encode("utf-8"))
    result = _libc.auth_userokay(
        username.encode("utf-8"),
        None,   # style: use default from login.conf
        None,   # type: default
        pw_buf, # mutable buffer - auth_userokay zeroes it
    )
    return result != 0

The ctypes FFI call is straightforward — auth_userokay takes a username, optional style/type, and a mutable password buffer. It returns non-zero on success and zeroes the password buffer for security. No subprocess, no shelling out, just a direct C library call.

Role-Based Access Control

Roles map to OpenBSD system groups:

GroupRoleAccess Level

gatehouseadmin

Admin

Full control — reload PF, modify rules, manage users

gatehouseoperator

Operator

Toggle features, manage tables and devices

gatehouseviewer

Viewer

Read-only — inspection, analytics, status

Adding a user to the appropriate group grants API access at that level. Standard Unix group management, no application-level user administration needed.

JWT Tokens

After authentication, clients receive a JWT token valid for 60 minutes:

{
  "sub": "username",
  "role": "admin",
  "exp": 1708732800,
  "iat": 1708729200
}

Every subsequent request includes Authorization: Bearer <token>. The middleware validates the token, extracts the role, and enforces minimum role requirements per endpoint.

Rate Limiting

Two layers of rate limiting protect the login endpoint:

  • Per-IP: 5 attempts per 60 seconds — stops brute-force attacks from a single source

  • Per-account: 5 failures per 300 seconds — prevents credential stuffing against a known username

Both use in-memory sliding windows with automatic cleanup. No external dependencies, no Redis, no database — just timestamp lists in a dictionary.

The Command Executor

Every PF operation ultimately runs a system command. The executor centralises this with consistent error handling, timeout management, and logging:

class CommandResult:
    success: bool       # True if returncode == 0
    stdout: str         # Full stdout
    stderr: str         # Full stderr
    returncode: int     # Exit code (-1 on exception)

class PFExecutor:
    @staticmethod
    def run(cmd: list[str], timeout: int = 30) -> CommandResult:
        # subprocess.run with capture, timeout, logging

    @staticmethod
    def pfctl(*args: str) -> CommandResult:
        # Shorthand: pfctl("-sr") → run(["/sbin/pfctl", "-sr"])

    @staticmethod
    def run_script(script: str, *args: str) -> CommandResult:
        # Run from /etc/pf.d/scripts/

    @staticmethod
    def make(target: str, **kwargs: str) -> CommandResult:
        # Run make target in /etc/pf.d with variables

Every command is logged before execution. Failures log stderr. Exceptions are caught and returned as CommandResult with returncode=-1. Nothing raises — callers always get a result they can inspect.

This pattern makes the API predictable: every mutation endpoint returns {success, message, output}, where output contains the raw command output for debugging.

API Endpoints: 66 Routes Across 9 Modules

The API is organised into nine controller modules, each handling a domain of PF management.

Health Check

GET /  →  {"status": "ok", "service": "gatehouse", "version": "1.0.0"}

Core Operations (6 endpoints)

The "big red buttons" — PF control operations that affect the entire firewall:

POST /core/check         # Syntax-check the production config
POST /core/reload        # Check + reload PF rules
POST /core/restart       # Full disable/enable cycle
POST /core/load-safe     # Load emergency fallback rules
POST /core/flush-reload  # Nuclear option: flush everything, reload
POST /core/backup        # Backup /etc/pf.d to timestamped directory

load-safe is the panic button. It loads a minimal ruleset that allows SSH from internal networks and basic outbound — enough to maintain access while you fix whatever went wrong.

Inspection (9 endpoints)

Read-only queries against the running PF state:

GET /inspection/rules              # pfctl -sr (filter rules)
GET /inspection/nat                # pfctl -sn (NAT rules)
GET /inspection/states             # pfctl -ss (connection states)
GET /inspection/stats              # pfctl -si (statistics)
GET /inspection/tables             # pfctl -sT (table list)
GET /inspection/tables/{name}      # pfctl -t <name> -T show
GET /inspection/anchors            # Status of all 16 anchors
GET /inspection/anchors/{name}     # pfctl -a <name> -sr
GET /inspection/device-block-status  # All 4 device blocking types

The anchors endpoint queries all 16 configured anchors and reports which have rules loaded. A single-call dashboard of what features are active.

Anchor/Feature Management (5 endpoints per feature)

This is where Gatehouse really shines — toggling PF features without editing config files or remembering pfctl syntax:

GET  /anchors/status              # All features
GET  /anchors/{feature}/status    # Single feature
POST /anchors/{feature}/enable    # Load anchor rules
POST /anchors/{feature}/disable   # Clear anchor rules
POST /anchors/{feature}/update    # Refresh IP lists

Sixteen features are mapped to their anchor names and config files:

FeatureAnchorWhat It Does

youtube

youtube-block

Block YouTube on guest/IoT VLANs

fortnite

fortnite-block

Block Fortnite/Epic Games

blackhole

blackhole

Manual IP blacklisting

pfbadhost

pfbadhost

Security feed blocklists

squid

squid-proxy

Transparent HTTP/S proxy

vlan20

vlan20-isolated

Isolate IoT VLAN

mdns

mdns

mDNS/Bonjour service discovery

youtube-device

youtube-device-block

Per-device YouTube blocking

device-full

time-based/devices-full-block

Full internet blocking per device

device-services

time-based/devices-services-block

Service-specific blocking

The update endpoint runs IP resolution scripts — for example, resolving YouTube’s current IP ranges via DNS and updating the PF table.

Table Management (12 endpoints)

Direct manipulation of PF tables — the data structures that hold IP lists:

Blackhole blocking:

GET    /tables/blackhole            # List blackholed IPs
POST   /tables/blackhole            # Add IP (with RFC1918 safety check)
DELETE /tables/blackhole            # Remove IP
POST   /tables/blackhole/flush      # Clear auto-detected IPs
POST   /tables/blackhole/auto-scan  # Scan pflog for nuisance IPs

FTSO RPC whitelist (dual-table system):

GET    /tables/ftso-rpc             # List permanent + temporary IPs
POST   /tables/ftso-rpc/permanent   # Add permanent (file-backed)
DELETE /tables/ftso-rpc/permanent   # Remove permanent
POST   /tables/ftso-rpc/temporary   # Add temporary (memory only)
DELETE /tables/ftso-rpc/temporary   # Remove temporary
POST   /tables/ftso-rpc/flush-temp  # Clear all temporary

The dual-table pattern — permanent (file-backed) plus temporary (memory-only) — appears throughout the API. Permanent entries survive PF reloads because they’re written to .ips files that PF reads on startup. Temporary entries exist only in kernel memory and vanish on reload, making them perfect for testing.

Device Blocking (10 endpoints per type)

Four independent device blocking modes, each with identical endpoint structure:

POST   /devices/{type}/enable       # Load blocking rules
POST   /devices/{type}/disable      # Unload rules
GET    /devices/{type}/list         # Show permanent + temporary IPs
POST   /devices/{type}/add          # Add device IP (permanent)
POST   /devices/{type}/add-temp     # Add device IP (temporary)
DELETE /devices/{type}/remove       # Remove from permanent
DELETE /devices/{type}/remove-temp  # Remove from temporary
POST   /devices/{type}/flush-temp   # Clear temporary list
POST   /devices/{type}/kill-states  # Kill active connections

Types: full-block, services-block, youtube-24x7, youtube-sched

Every add operation validates the IP address: it must be RFC1918 (private), properly formatted, and not a gateway IP. You can’t accidentally block the router.

The kill-states endpoint is crucial — adding an IP to a block table doesn’t affect existing connections. You need to explicitly kill the connection states for the block to take immediate effect.

Connection States (4 endpoints)

POST /states/kill           # Kill all states from an IP
POST /states/kill/youtube   # Kill YouTube connection states
POST /states/kill/fortnite  # Kill Fortnite states
POST /states/kill/netflix   # Kill Netflix states

Analytics (6 endpoints)

Parse /var/log/pflog for traffic analysis:

GET /analytics/summary     # Total blocked, protocol breakdown
GET /analytics/top-ips     # Top 20 blocked source IPs
GET /analytics/top-ports   # Top 20 blocked destination ports
GET /analytics/protocols   # Protocol distribution with percentages
GET /analytics/hourly      # 24-hour activity breakdown
GET /analytics/recent      # Last 50 blocked connections (raw)

These endpoints run tcpdump -n -e -ttt -r /var/log/pflog and parse the output. Not the fastest approach for large logs, but it works without any additional infrastructure and the pflog binary format ensures accurate packet data.

Cron Management (4 endpoints)

GET  /cron/status           # Status of all 5 scheduled jobs
GET  /cron/{job}/status     # Single job status
POST /cron/{job}/enable     # Add cron entries
POST /cron/{job}/disable    # Remove cron entries

Five managed cron jobs:

JobSchedulePurpose

device-block

6pm on / 9am off

Time-based device blocking

youtube-update

Sundays 2:30 AM

Refresh YouTube IP ranges

fortnite-update

Sundays 3:00 AM

Refresh Fortnite IPs

pfbadhost

Daily 3:00 AM

Update security blocklists

blackhole-auto

Every 15 min

Auto-scan pflog for nuisance IPs

Crontab management uses marker comments (# gatehouse: device-block) to identify managed entries. The API reads the crontab, adds/removes marked lines, and writes it back. No external cron libraries — just crontab -l and crontab -.

DHCP Management (6 endpoints)

GET  /dhcp/status           # Service running/enabled status
POST /dhcp/start            # rcctl start dhcpd
POST /dhcp/stop             # rcctl stop dhcpd
POST /dhcp/restart          # rcctl restart dhcpd
GET  /dhcp/leases           # Parse dhcpd.leases
GET  /dhcp/subnets          # Subnet utilisation from dhcpd.conf

Input Validation and Safety

Every user-provided IP address goes through validation before it touches PF:

def validate_device_ip(ip: str) -> tuple[bool, str]:
    if not validate_ip(ip):
        return False, "Invalid IP address format"
    if not is_rfc1918(ip):
        return False, "Only RFC1918 (private) addresses allowed"
    if is_gateway_ip(ip):
        return False, "Cannot block gateway IP addresses"
    return True, ""

Gateway IPs are hardcoded as protected — you physically cannot block your router through the API:

GATEWAY_IPS = {
    "192.168.1.2",    # Primary LAN gateway
    "192.168.3.2",    # Secondary LAN
    "192.168.110.2",  # Guest WiFi
    "192.168.120.2",  # IoT WiFi
    "192.168.130.2",  # Admin WiFi
}

The blackhole endpoint adds an extra layer: by default, it refuses to blackhole RFC1918 addresses. You can override with force: true, but you have to be explicit about it.

Parsing PF Output

PF’s output is designed for humans, not machines. The parsers bridge that gap.

PF statistics (pfctl -si) produces lines like:

State Table                          Total             Rate
  current entries                        47
  searches                         12847293          234.2/s
  inserts                            89432            1.6/s

The parser extracts key-value pairs with regex and returns structured JSON.

pflog analysis is more involved — tcpdump output contains timestamps, interfaces, protocols, IPs, and ports in a format that varies by protocol:

2024-01-15 14:23:45.123456 rule 0/(match) block in on igc1: 203.0.113.5.44231 > 192.168.1.2.22: S

The analytics parsers use regex to extract source IPs, ports, protocols, and hours, then aggregate with Python’s Counter.

The Modular Architecture

The codebase follows a controller-service pattern:

api/src/
├── auth/           # Authentication + JWT + rate limiting
├── core/           # PF check/reload/restart/backup
├── inspection/     # Read-only PF queries
├── anchors/        # Feature toggle management
├── tables/         # PF table operations
├── devices/        # Device-level blocking
├── states/         # Connection state management
├── analytics/      # pflog analysis
├── cron/           # Scheduled task management
├── dhcp/           # DHCP service management
└── common/
    ├── executor.py   # Command execution
    ├── parsers.py    # Output parsing
    └── validators.py # Input validation

Controllers handle HTTP concerns — route definitions, request parsing, response formatting, authentication guards. They’re thin wrappers that delegate to services.

Services contain business logic — input validation, command construction, output parsing, file I/O. They use the executor for all system commands and return structured results.

Common modules are shared utilities. The executor, parsers, and validators have no HTTP awareness — they’re pure functions that could be used from a CLI tool or test harness.

What I Learned

OpenBSD is a great API platform

The base system includes everything you need for a secure API deployment: relayd for TLS termination, rc.d for service management, doas for privilege escalation, crontab for scheduling, and auth_userokay for authentication. No third-party packages needed for infrastructure.

PF anchors are the perfect API primitive

PF anchors — named rule containers that can be loaded and unloaded at runtime — map perfectly to REST endpoints. POST /anchors/youtube/enable loads rules into the youtube-block anchor. POST /anchors/youtube/disable clears it. The main ruleset never changes, so there’s no risk of a syntax error taking down the firewall.

Dual tables solve the testing problem

The permanent + temporary table pattern appeared organically. When testing device blocking, I wanted to add an IP temporarily, verify it worked, then either make it permanent or remove it. File-backed tables persist across reloads; memory-only tables don’t. Both are first-class concepts in the API.

State killing is as important as rule loading

The first version of device blocking added IPs to tables but didn’t kill existing connection states. Blocked devices stayed connected for minutes until their TCP sessions naturally expired. Adding kill-states endpoints made blocks take effect immediately.

Parsing shell output is fragile but practical

Ideally, pfctl would have a --json flag. It doesn’t. Parsing text output with regex is fragile — a format change in a future OpenBSD release could break things. But it works today, the output formats have been stable for years, and the alternative (writing a C program that uses PF’s ioctl interface) is dramatically more complex for a home network API.

What’s Next

The Python implementation works, but it has deployment friction: a virtual environment, pip dependencies, Python runtime, and a multi-process ASGI server for what is fundamentally a single-binary problem.

The next step is a Go rewrite — same API surface, same endpoints, same JSON shapes — but compiled to a single static binary. No venv, no pip, no Python runtime. Just copy the binary and run it.

But that’s a story for another post.


Gatehouse is open-source and runs on OpenBSD 7.8. The PF configuration it manages handles six network segments with content filtering, device-level parental controls, security blocklists, and dynamic feature toggles — all controllable through 66 REST endpoints.

Comments

Loading comments...