Skip to main content

Command Palette

Search for a command to run...

Breaking StrideBoard: AI-Powered Regression Testing with Passmark

Updated
โ€ข5 min read
Breaking StrideBoard: AI-Powered Regression Testing with Passmark
K

Writing code, breaking things, then pretending it was a feature. ๐Ÿคก | Senior Software Engineer | Dreaming of Big Tech, a GDE badge, and a Koenigsegg. ๐Ÿš€

I built StrideBoard as a real-time community hype wall for runners preparing for race day. A few weeks after shipping it, a running foundation reached out about productizing it. That made me want to stress-test it properly before going further โ€” which is exactly why the Breaking Apps Hackathon was the right moment.

I joined to pressure-test a real public app with Passmark and see how far natural-language regression testing can go before needing deterministic fallbacks.

App under test: Stride Board


About StrideBoard

StrideBoard is a lightweight, real-time community hype wall for runners. The core user loop: post a goal, get hyped by the community, track momentum.

What users can do:

  • Post a goal time, optional pace, and motivation

  • Choose a goal category (Personal Best, Sub-60 Attempt, First Ever Race)

  • React to other runners with ๐Ÿ”ฅ hype interactions

  • Follow community stats, countdown, and training progress widgets

The stack is intentionally lean:

  • Frontend: Vanilla HTML/CSS/JS

  • Backend: Vercel Serverless Functions (Redis proxy)

  • Data: Upstash Redis

I wrote about building it here: How I Built a Real-Time Community Hype Wall for Runners Using Redis and Vercel


Why StrideBoard was a good hackathon target

  • Public and accessible โ€” no auth walls

  • Interaction-heavy (posting, filtering, hype actions)

  • Simple enough to iterate quickly

  • Realistic enough to expose flaky automation patterns

  • I know the codebase intimately, so bugs are obvious


Setup

Start from a Playwright TypeScript project and add Passmark:

npm init playwright@latest my-hackathon-tests
cd my-hackathon-tests
npm install passmark dotenv

.env:

OPENROUTER_API_KEY=sk-or-...

playwright.config.ts:

import dotenv from "dotenv";
import path from "path";
import { configure } from "passmark";

dotenv.config({ path: path.resolve(__dirname, ".env") });

configure({
  ai: {
    gateway: "openrouter"
  }
});

What I tested

Four user-critical flows:

  1. Landing page integrity

  2. Goal posting with anonymous toggle

  3. Filter switching across all goal categories

  4. Hype action confirmation behavior

Structure:

  • tests/strideboard.passmark.spec.ts โ€” main spec

  • tests/helpers/strideboard.ts โ€” shared helpers


Sample Passmark flow

Here's the anonymous posting test:

await runSteps({
  page,
  userFlow: "StrideBoard anonymous posting",
  steps: [
    { description: `Navigate to ${STRIDEBOARD_URL}` },
    {
      description:
        "Click the toggle Post anonymously โ€” hide my nickname so anonymous mode is enabled",
    },
    {
      description: "In the race goal input, enter the message",
      data: { value: goalText },
    },
    {
      description: `Select the goal category ${GOAL_CATEGORIES[1]}`,
    },
    {
      description: "Click POST TO BOARD",
      waitUntil: "The newly posted goal appears on the board",
    },
  ],
  assertions: [
    {
      assertion: `You can see a posted goal containing the text ${goalText}`,
    },
  ],
  test,
  expect,
});

The plain English steps are readable by anyone on the team โ€” no need to understand selectors or DOM structure to follow the intent.


What broke first (and how I fixed it)

The first flaky area was AI-heavy paths for filter switching and hype interactions. The app itself was fine โ€” tests were timing out during broad natural-language assertions on fast, repetitive interactions.

Changes I made:

  • Kept Passmark for high-level posting and user flow tests where it adds the most value

  • Switched repetitive UI interactions (filter clicks) to deterministic Playwright checks

  • Targeted the exact newly created post before triggering hype โ€” avoids false positives on a shared public board

  • Used unique test data per run (timestamped strings) to prevent state collisions across concurrent visitors

This hybrid approach gave consistent, debuggable results without losing Passmark's speed advantage for authoring flows.


Results

After tightening selectors and splitting responsibilities between AI and deterministic checks:

  • Suite runs consistently in Chromium

  • Flaky false negatives from generic modal/selector matching eliminated

  • Report output is clean and easy to debug

Run commands:

npx playwright test tests/strideboard.passmark.spec.ts --project=chromium --reporter=list
npx playwright test tests/strideboard.passmark.spec.ts
npx playwright show-report

Key learnings

Natural-language testing is excellent for fast scenario authoring. Writing intent in plain English is faster than hunting selectors, and it reads like documentation.

Deterministic selectors are still best for repetitive or exact-state checks. When you're clicking the same filter button five times or asserting an exact count, Playwright's precision wins.

Hybrid suites are the sweet spot. AI handles intent and broad user flows. Playwright handles precision and repetition. Neither replaces the other.


GitHub

Full test suite: github.com/skarthikeyan96/hashnode-hackthon-passmark


If you're joining the hackathon

Pick a public app with real interactions. Start with a thin happy-path suite. Then harden the flaky edges with deterministic checks as you learn the DOM patterns.

That gives you a practical submission and a regression suite worth keeping.

If you're building for the same hackathon, drop your article in the comments โ€” I'd love to compare notes on where Passmark shines most in your app.

Tagged: #BreakingAppsHackathon

More from this blog

Code & Chaos

46 posts

Welcome to Code & Chaos ๐Ÿš€

I share thoughts on frontend dev, web performance, debugging nightmares, system design Stick around! ๐Ÿ˜ƒ