Let me tell you about the gauge that looked alive but wasn’t.

I was building BETA — a climbing conditions tool for Cascade crags — and I wanted river data for the Skykomish corridor. Index Town Wall and Miller River Boulders both sit in that drainage. When the Skykomish is running high and angry, the approaches are a mess regardless of what the weather’s doing. That’s useful signal.

USGS publishes real-time streamflow data through their National Water Information System, free, no API key. I found a gauge right near Miller River — 12132000 — that showed up on the map with an active status. Perfect location, exactly what I needed.

Wired it up, ran the pipeline, got nothing back.

Not an error. Not a timeout. Just… no data. Crickets.

Turns out that gauge has been inactive for years. It’s still on the map. It still shows a green “active” indicator. It returns a valid JSON response with the right structure — just empty values arrays inside. The USGS equivalent of a store that’s been closed for years but never took down the “Open” sign.

values = data["value"]["timeSeries"][0]["values"][0]["value"]
# returns [] — perfectly valid, completely useless

This is the kind of thing that makes you want to flip a table. But okay. I fell back to gauge 12134500 — Skykomish River near Gold Bar — about 10 miles downstream from Miller River. Not ideal, but close enough given the scale of what I’m measuring.


The pipeline

The USGS IV (instantaneous values) API is dead simple:

GET https://waterservices.usgs.gov/nwis/iv/?sites=12134500&parameterCd=00060&period=PT2H&format=json

parameterCd=00060 is discharge in cubic feet per second. period=PT2H gives me the last 2 hours of 15-minute interval readings — about 8 values. I only need the last two to determine trend.

def fetch_streamflow(gauge_id: str) -> dict:
    params = {
        "sites":       gauge_id,
        "parameterCd": "00060",
        "period":      "PT2H",
        "format":      "json",
    }
    url = USGS_URL + "?" + urllib.parse.urlencode(params)

    try:
        with urllib.request.urlopen(url, timeout=10) as response:
            data = json.loads(response.read().decode())
    except Exception as e:
        print(f"  ⚠️  USGS API error for gauge {gauge_id}: {e}")
        return None

    try:
        values = data["value"]["timeSeries"][0]["values"][0]["value"]
    except (KeyError, IndexError):
        return None

    # Filter out USGS sentinel value for missing data
    valid = [v for v in values if float(v["value"]) >= 0]

    if len(valid) < 2:
        return None

    current_cfs  = float(valid[-1]["value"])
    previous_cfs = float(valid[-2]["value"])
    delta        = current_cfs - previous_cfs

    if delta > 5:
        trend = "rising"
    elif delta < -5:
        trend = "falling"
    else:
        trend = "steady"

    return {
        "cfs":      round(current_cfs),
        "trend":    trend,
        "gauge_id": gauge_id,
    }

A few things worth noting:

The -999999 sentinel. USGS uses this as a “no data” marker for individual readings within an otherwise valid response. If I don’t filter those out, I get wildly wrong delta calculations. I filter for value >= 0 which catches it without hardcoding the sentinel value.

5 CFS trend threshold. On a calm river, you get minor fluctuations in consecutive 15-minute readings that aren’t meaningful. 5 CFS eliminates the noise without masking real movement on a river that swings thousands of CFS during a storm.

Null-safe integration. In crags.json, only crags with a gauge_id field get streamflow fetched. The main loop checks if crag.get("gauge_id"): so non-gauged crags are completely unaffected. Adding a new gauged crag is a one-line config change.


What I learned

The USGS data infrastructure is genuinely impressive — real-time data from thousands of gauges nationwide, free, reliable, well-documented. But the map UI doesn’t clearly distinguish inactive gauges from active ones, which cost me maybe an hour of debugging before I figured out what was happening.

The fix was simple once I understood the problem: validate that you actually got readings, not just a valid response shape. A response with the right JSON structure but empty values arrays will pass most basic error checks. You have to go one level deeper.

Also: PT2H is ISO 8601 duration format. Took me longer than I’d like to admit to remember that.

The pipeline runs on GitHub Actions every 6 hours. USGS is polite to poll at that cadence — they recommend not hitting the API more than once per minute per gauge. 4 times a day for 2 gauges is nowhere near that limit.


Full pipeline code is in the BETA repo. The tool itself is at beta.trenigma.dev.


Part 2 of 3 in BETA: Building a Climbing Conditions Pipeline Part 1 — Why I Built It Next: Part 3 — Adding PurpleAir AQI (and the Wood Stove Surprise)