Quality needs a new interface.
Lead the release.
Release confidence for your CI. One ledger for humans and the agents you ship with.
- Failure inbox
- Every failure carries a verdict, confidence, and
reasons[]. Never a black box. - Release confidence
- One call, with the evidence behind it:
should_block_release(). Override is one click. - Reads from your CI
- GitHub Actions · Playwright · Cypress today. One YAML step. Built to absorb your full pipeline.
- Agent-readable
- Every screen has a JSON peek and an MCP tool. Humans, APIs, and agents read the same ledger.
Old word. A bellwether is the lead sheep that wears the bell, the one whose movement the flock reads to know which way the day is going. We took the name because that's the gap. Your CI emits a thousand signals an hour. Someone has to be the bell: the neutral, audible thing that says this is real, that isn't. Bellwether is that bell.
Software changed. QA changed. The way teams decide what to ship hasn't.
- 01 · WAS
QA was a handoff.
A team owned testing. They wrote suites, gated releases, and the rest of engineering trusted the green check. The interface to quality was a person.
- 02 · IS
QA is shared.
Agents write tests. SDETs review them. Engineers triage failures in Slack. Testing is faster and noisier; ownership is everywhere and nowhere. The handoff dissolved.
- 03 · NEEDS
QA needs a decision layer.
Reruns and gut feel don't scale to a thousand runs a day. What's missing is a neutral ledger that classifies every signal, names the owner, and tells you when to ship.
Bellwether is the bell.
This is your quality decision layer today.
It's a Slack thread. Flaky test triage runs on tribal knowledge, "rerun it once," and the muscle memory of whoever happens to be online. CI failure triage is whatever the on-call engineer can remember at 4pm on a Friday. It works. Then your test count crosses a threshold and the noise becomes the signal.
The same failure, twelve seconds later.
Bellwether reads your CI events as they land. Every failure ships with a verdict (flake, regression, infra, or new), a confidence score, weighted reasons, a recommended action, and the likely owner. The event model is neutral by design: build, deploy, and security signals join the same ledger as we widen coverage.
One classification, three calls: the human triage page, the REST API, and the MCP tool agents call. Same evidence, same confidence, same owner. No surface tells a different story.
No black-box verdicts.
Three things we wrote into the spec before we wrote a classifier. They are the difference between a confidence score you can trust and a number that gets ignored after the first wrong call.
Spec refs: CLASSIFY-05 · CLASSIFY-06 · LEARN-01 · LEARN-02. The learning loop is the moat. We built the affordance first.
The neutral layer. Not another silo.
Bellwether sits across your stack. We read CI events; we don't replace your test framework or your tracker. Coverage starts with test signals and absorbs build logs, deploy gates, and security scans as we widen.
test names · errors · stack frames · retries · durations · git SHAs · branches
source code · env vars · secrets · user data inside fixtures · production traffic
Three design partners. Twelve weeks. Numbers, not adjectives.
Cohort 01 starts Q3 ’26. We’ll publish weekly numbers (baselines and progress against these targets) once pilots are running. No retroactive revisions; the first reading is the first reading.
Cohort 01 for Q3 '26.
A small first cohort. Real CI traffic. Direct line to the founders.
We onboard design partners one cohort per quarter. Cohort 01 is the first set of teams shipping on Bellwether. Their classifications, overrides, and owner-mappings shape V1.1. Their feedback is weighted higher in the learning loop than anyone who comes after, because they showed up first.
What you get
- V1 access from day one
- Co-design on V1.1
- Founder Slack channel
- Classification feedback weighted ×3 in the learning loop
What we ask
- Real CI traffic, not a sandbox
- One review call per cohort
- Permission to learn from your labelled overrides
Lead the release.
Don't just run the tests.
Cohort 01 is forming for Q3 '26. Cohort 02 picks up Q4.
One repo, one CI, one Slack channel. Four minutes to set up.
No new dashboards to live in.
no credit card · revocable token · works with what you already ship on