How Marriska works

This page is for curious minds. If you just want to run a test, start with the Quickstart.

The problem

Traditional test automation has two painful qualities:

Writing tests is code. You need Playwright, Cypress, or Selenium — plus the language they run in.
Maintaining tests is code. Button renamed? Test breaks. Design refactored? Selectors break. CSS class swapped? You’re filing tickets.

The result: QA and product teams wait on engineers to keep tests alive, and engineers resent writing them because it’s not what they were hired to do. Everyone loses.

The approach

You describe what a user should do and see, in plain English:

Click the "Sign in" button
Type "user@example.com" in the email field
Type "hunter2" in the password field
Click submit
Expect to see "Welcome back"

Marriska’s pipeline turns that into a Playwright test and runs it. The sequence:

1. Translation

If your description isn’t in English, an LLM translates it first. Tests get authored in whatever language reads naturally to the team that owns them; the executor only ever sees English.

2. Parsing

Each English sentence becomes a structured action. The parser uses an LLM with a focused system prompt; the LLM’s output is validated against the action schema before it goes anywhere. Hallucinated actions get rejected and re-asked.

The result is a YAML-shaped step list — see the YAML schema for the exact format and step actions for the catalog of actions the executor knows.

3. Locator resolution

At run time, Playwright needs to know which button to click. Assert AI builds locators contextually — accessibility roles first (role=button[name="Sign in"]), label associations next, falling back to common CSS patterns and fuzzy text matching as needed.

Because it’s accessibility-first, tests survive CSS refactors, class renames, and minor layout changes that would break hand-written Playwright.

4. Execution

Playwright drives a real Chromium, Firefox, or WebKit browser. On the hosted SaaS, that’s our cloud runners. With Starter+ you can also plug in a local agent — your laptop or a CI machine runs Playwright and streams results back over WebSocket.

5. Visual regression (optional)

For any step, you can capture a baseline screenshot. On later runs the fresh screenshot goes through a three-stage cascade — byte-equality, pixel diff, then a vision-LLM judgement if needed. The model is told to ignore antialiasing and dynamic content like timestamps and focus on meaningful regressions. See Visual regression for the full mechanics.

6. Reporting

Step events stream back over SSE (browser sessions) or WebSocket (local agents) so you watch each step finish in real time, not at the end. The report URL is stable and deep-linked — anyone in your org with the right role can open it. Public share links aren’t shipped yet; see Sharing reports.

Architecture at a glance

  ┌──────────┐  plain English  ┌───────────┐   YAML steps   ┌─────────────┐
  │  Author  ├────────────────►│  Parser   ├───────────────►│  Executor   │
  └──────────┘                 │   (LLM)   │                │ (Playwright)│
                               └───────────┘                └──────┬──────┘
                                                                   │
                               ┌───────────┐  real-time events     │
                               │  Reporter │◄──────────────────────┘
                               │  (SSE/WS) │
                               └─────┬─────┘
                                     ▼
                               Marriska Dashboard

Translation, parsing, the executor’s orchestration, and reporting all live in the FastAPI backend on Postgres. The Playwright runner can sit alongside it (cloud) or on your own machine (local agent).

Why this matters

When tests are plain English:

QA writes tests without waiting for engineers
Product managers can read tests like specs
Tests survive refactors because they describe intent, not DOM paths
Designers, engineers, and QA all touch the same artifact

This isn’t a replacement for low-level unit tests. It’s for the user-journey-level tests most teams skip because they cost too much to maintain.

Visual regression — the cascade in detail
BYOK — moving the AI inference cost off your Marriska quota
Security model — auth, isolation, and what gets logged
Step actions reference — the verbs Marriska knows