BETA Part 3: Adding PurpleAir AQI (and the Wood Stove Surprise)

I added air quality data to BETA for wildfire smoke season. What I did not expect was to immediately discover a wood stove problem in the Skykomish valley. More on that in a minute.

Why AQI for a climbing tool

PNW wildfire season has gotten worse every year. Driving an hour to the crag only to spend the day breathing 180 AQI air is a real thing that happens, especially on the east side of the Cascades. I’d personally forgotten to check AQI before a trip more than once, so I wanted it surfaced automatically alongside the weather data.

PurpleAir was the obvious choice. Crowd-sourced sensors, global coverage, reasonably documented API, free read key available on request.

The API approach

PurpleAir’s v1 API has a /sensors endpoint that accepts a bounding box query. Rather than hardcoding sensor IDs (which would require manual updates every time sensors go offline or new ones appear), I query a box around each crag’s coordinates and pick the nearest outdoor sensor.

Think of it like casting a net and keeping the closest catch.

import math

PURPLEAIR_URL    = "https://api.purpleair.com/v1/sensors"
PURPLEAIR_MAX_KM = 30  # smoke is regional; 30km is still meaningful

def fetch_aqi(lat: float, lng: float) -> dict:
    if not PURPLEAIR_API_KEY:
        return None

    # ~0.27 deg lat ≈ 30km; scale lng by cos(lat)
    pad_lat = 0.27
    pad_lng = pad_lat / math.cos(math.radians(lat))

    params = {
        "fields":        "pm2.5_atm,latitude,longitude,last_seen,name",
        "location_type": "0",     # outdoor sensors only
        "max_age":       "3600",  # must have reported in last hour
        "nwlat":  round(lat + pad_lat, 6),
        "nwlng":  round(lng - pad_lng, 6),
        "selat":  round(lat - pad_lat, 6),
        "selng":  round(lng + pad_lng, 6),
    }

    url = PURPLEAIR_URL + "?" + urllib.parse.urlencode(params)
    req = urllib.request.Request(url, headers={"X-API-Key": PURPLEAIR_API_KEY})

    try:
        with urllib.request.urlopen(req, timeout=8) as resp:
            data = json.loads(resp.read().decode())
    except Exception as e:
        print(f"  ⚠️  PurpleAir API error: {e}")
        return None

    fields  = data.get("fields", [])
    sensors = data.get("data", [])

    if not sensors:
        return None

    i_pm   = fields.index("pm2.5_atm")
    i_lat  = fields.index("latitude")
    i_lng  = fields.index("longitude")
    i_name = fields.index("name")

    best      = None
    best_dist = float("inf")

    for row in sensors:
        s_pm = row[i_pm]
        if s_pm is None or s_pm < 0:
            continue

        dist = _haversine_km(lat, lng, row[i_lat], row[i_lng])
        if dist < best_dist and dist <= PURPLEAIR_MAX_KM:
            best_dist = dist
            best      = (s_pm, row[i_name], dist)

    if not best:
        return None

    pm25, sensor_name, distance_km = best
    aqi, category = _pm25_to_aqi(pm25)

    return {
        "aqi":         aqi,
        "category":    category,
        "pm25":        round(pm25, 1),
        "sensor_name": sensor_name,
        "distance_km": round(distance_km, 1),
    }

A few design decisions worth unpacking:

Bounding box over radius. The API doesn’t support a true radius query — you give it a rectangular box. I pad by ~0.27 degrees of latitude (≈30km) and scale the longitude padding by cos(lat) to account for longitude degrees being shorter at higher latitudes. Then I apply the actual radius filter in Python using Haversine distance.

30km max radius. For wildfire smoke this is actually fine — smoke events are regional. A sensor 25km away is going to show you the same smoke event. For hyperlocal stuff like the inversion layer scenario below, it’s less reliable, which is why I surface the distance to the user.

max_age=3600. Only returns sensors that have reported in the last hour. Without this, you can get readings from sensors that went offline weeks ago. I’d rather return null than stale data.

Field index mapping. PurpleAir returns data as arrays (not objects) for efficiency, with a separate fields array that maps column names to indices. I dynamically find each field’s index rather than hardcoding positions — if they add or reorder fields, nothing breaks.

The EPA PM2.5 to AQI conversion

PurpleAir gives you raw PM2.5 in µg/m³. Converting to AQI requires the EPA’s breakpoint table — linear interpolation within 7 bands:

def _pm25_to_aqi(pm: float) -> tuple[int, str]:
    breakpoints = [
        (0.0,   12.0,   0,   50,  "Good"),
        (12.1,  35.4,  51,  100,  "Moderate"),
        (35.5,  55.4, 101,  150,  "Unhealthy for Sensitive Groups"),
        (55.5, 150.4, 151,  200,  "Unhealthy"),
        (150.5, 250.4, 201, 300,  "Very Unhealthy"),
        (250.5, 350.4, 301, 400,  "Hazardous"),
        (350.5, 500.4, 401, 500,  "Hazardous"),
    ]
    for pm_lo, pm_hi, aqi_lo, aqi_hi, label in breakpoints:
        if pm_lo <= pm <= pm_hi:
            aqi = round((aqi_hi - aqi_lo) / (pm_hi - pm_lo) * (pm - pm_lo) + aqi_lo)
            return aqi, label
    return 500, "Hazardous"

Standard linear interpolation within each band. Nothing fancy.

The thing I didn’t expect

First run with the key set, Index Town Wall comes back AQI 120. Exit 38 at 115. It’s a rainy day in March — no wildfire anywhere near Washington.

Took me a minute to figure out what was happening. Cold inversion layer. When it’s wet and cold in the Skykomish valley, dense cold air gets trapped under a warmer layer above. The mixing that normally disperses particulates isn’t happening. Everyone in the valley fires up their woodstove. Smoke sits at crag level.

It looks identical to wildfire smoke in the data. PM2.5 is PM2.5. But the cause is completely different. Real data, unexpected insight.

I ended up writing a field note about it for the BETA audience since climbers don’t necessarily think about inversion layers.

Coverage reality

PurpleAir is crowd-sourced, so coverage is uneven. Leavenworth has a sensor 0.7km from the crags — basically local. Some of the more remote crags pull from sensors 20km out.

For wildfire smoke that’s fine. For the woodstove scenario, a sensor 20km away might be in a different microclimate entirely. I surface the sensor name and distance on the UI so users can factor that in themselves.

Out of 12 crags, 11 got sensor hits on the first run. Miller River was the surprise — I expected it to come back null, but there’s a sensor 5km out that covers it. The only consistent nulls are the two BC crags (Squamish) which gracefully return nothing since PurpleAir does cover Canada but BETA handles the null cleanly.

Full pipeline code in the BETA repo. The tool is at beta.trenigma.dev.

Part 3 of 3 in BETA: Building a Climbing Conditions Pipeline ← Part 2 — Wiring in a USGS River Gauge

Why AQI for a climbing tool

The API approach

The EPA PM2.5 to AQI conversion

The thing I didn’t expect

Coverage reality

Thanks for reading