Breaking StrideBoard: AI-Powered Regression Testing with Passmark

Writing code, breaking things, then pretending it was a feature. ๐คก | Senior Software Engineer | Dreaming of Big Tech, a GDE badge, and a Koenigsegg. ๐
I built StrideBoard as a real-time community hype wall for runners preparing for race day. A few weeks after shipping it, a running foundation reached out about productizing it. That made me want to stress-test it properly before going further โ which is exactly why the Breaking Apps Hackathon was the right moment.
I joined to pressure-test a real public app with Passmark and see how far natural-language regression testing can go before needing deterministic fallbacks.
App under test: Stride Board
About StrideBoard
StrideBoard is a lightweight, real-time community hype wall for runners. The core user loop: post a goal, get hyped by the community, track momentum.
What users can do:
Post a goal time, optional pace, and motivation
Choose a goal category (Personal Best, Sub-60 Attempt, First Ever Race)
React to other runners with ๐ฅ hype interactions
Follow community stats, countdown, and training progress widgets
The stack is intentionally lean:
Frontend: Vanilla HTML/CSS/JS
Backend: Vercel Serverless Functions (Redis proxy)
Data: Upstash Redis
I wrote about building it here: How I Built a Real-Time Community Hype Wall for Runners Using Redis and Vercel
Why StrideBoard was a good hackathon target
Public and accessible โ no auth walls
Interaction-heavy (posting, filtering, hype actions)
Simple enough to iterate quickly
Realistic enough to expose flaky automation patterns
I know the codebase intimately, so bugs are obvious
Setup
Start from a Playwright TypeScript project and add Passmark:
npm init playwright@latest my-hackathon-tests
cd my-hackathon-tests
npm install passmark dotenv
.env:
OPENROUTER_API_KEY=sk-or-...
playwright.config.ts:
import dotenv from "dotenv";
import path from "path";
import { configure } from "passmark";
dotenv.config({ path: path.resolve(__dirname, ".env") });
configure({
ai: {
gateway: "openrouter"
}
});
What I tested
Four user-critical flows:
Landing page integrity
Goal posting with anonymous toggle
Filter switching across all goal categories
Hype action confirmation behavior
Structure:
tests/strideboard.passmark.spec.tsโ main spectests/helpers/strideboard.tsโ shared helpers
Sample Passmark flow
Here's the anonymous posting test:
await runSteps({
page,
userFlow: "StrideBoard anonymous posting",
steps: [
{ description: `Navigate to ${STRIDEBOARD_URL}` },
{
description:
"Click the toggle Post anonymously โ hide my nickname so anonymous mode is enabled",
},
{
description: "In the race goal input, enter the message",
data: { value: goalText },
},
{
description: `Select the goal category ${GOAL_CATEGORIES[1]}`,
},
{
description: "Click POST TO BOARD",
waitUntil: "The newly posted goal appears on the board",
},
],
assertions: [
{
assertion: `You can see a posted goal containing the text ${goalText}`,
},
],
test,
expect,
});
The plain English steps are readable by anyone on the team โ no need to understand selectors or DOM structure to follow the intent.
What broke first (and how I fixed it)
The first flaky area was AI-heavy paths for filter switching and hype interactions. The app itself was fine โ tests were timing out during broad natural-language assertions on fast, repetitive interactions.
Changes I made:
Kept Passmark for high-level posting and user flow tests where it adds the most value
Switched repetitive UI interactions (filter clicks) to deterministic Playwright checks
Targeted the exact newly created post before triggering hype โ avoids false positives on a shared public board
Used unique test data per run (timestamped strings) to prevent state collisions across concurrent visitors
This hybrid approach gave consistent, debuggable results without losing Passmark's speed advantage for authoring flows.
Results
After tightening selectors and splitting responsibilities between AI and deterministic checks:
Suite runs consistently in Chromium
Flaky false negatives from generic modal/selector matching eliminated
Report output is clean and easy to debug
Run commands:
npx playwright test tests/strideboard.passmark.spec.ts --project=chromium --reporter=list
npx playwright test tests/strideboard.passmark.spec.ts
npx playwright show-report
Key learnings
Natural-language testing is excellent for fast scenario authoring. Writing intent in plain English is faster than hunting selectors, and it reads like documentation.
Deterministic selectors are still best for repetitive or exact-state checks. When you're clicking the same filter button five times or asserting an exact count, Playwright's precision wins.
Hybrid suites are the sweet spot. AI handles intent and broad user flows. Playwright handles precision and repetition. Neither replaces the other.
GitHub
Full test suite: github.com/skarthikeyan96/hashnode-hackthon-passmark
If you're joining the hackathon
Pick a public app with real interactions. Start with a thin happy-path suite. Then harden the flaky edges with deterministic checks as you learn the DOM patterns.
That gives you a practical submission and a regression suite worth keeping.
If you're building for the same hackathon, drop your article in the comments โ I'd love to compare notes on where Passmark shines most in your app.
Tagged: #BreakingAppsHackathon



