How Marriska works
This page is for curious minds. If you just want to run a test, start with the Quickstart.
The problem
Section titled “The problem”Traditional test automation has two painful qualities:
- Writing tests is code. You need Playwright, Cypress, or Selenium — plus the language they run in.
- Maintaining tests is code. Button renamed? Test breaks. Design refactored? Selectors break. CSS class swapped? You’re filing tickets.
The result: QA and product teams wait on engineers to keep tests alive, and engineers resent writing them because it’s not what they were hired to do. Everyone loses.
The approach
Section titled “The approach”You describe what a user should do and see, in plain English:
Click the "Sign in" buttonType "user@example.com" in the email fieldType "hunter2" in the password fieldClick submitExpect to see "Welcome back"Marriska’s pipeline turns that into a Playwright test and runs it. The sequence:
1. Translation
Section titled “1. Translation”If your description isn’t in English, an LLM translates it first. Tests get authored in whatever language reads naturally to the team that owns them; the executor only ever sees English.
2. Parsing
Section titled “2. Parsing”Each English sentence becomes a structured action. The parser uses an LLM with a focused system prompt; the LLM’s output is validated against the action schema before it goes anywhere. Hallucinated actions get rejected and re-asked.
The result is a YAML-shaped step list — see the YAML schema for the exact format and step actions for the catalog of actions the executor knows.
3. Locator resolution
Section titled “3. Locator resolution”At run time, Playwright needs to know which button to click. Assert
AI builds locators contextually — accessibility roles first
(role=button[name="Sign in"]), label associations next, falling
back to common CSS patterns and fuzzy text matching as needed.
Because it’s accessibility-first, tests survive CSS refactors, class renames, and minor layout changes that would break hand-written Playwright.
4. Execution
Section titled “4. Execution”Playwright drives a real Chromium, Firefox, or WebKit browser. On the hosted SaaS, that’s our cloud runners. With Starter+ you can also plug in a local agent — your laptop or a CI machine runs Playwright and streams results back over WebSocket.
5. Visual regression (optional)
Section titled “5. Visual regression (optional)”For any step, you can capture a baseline screenshot. On later runs the fresh screenshot goes through a three-stage cascade — byte-equality, pixel diff, then a vision-LLM judgement if needed. The model is told to ignore antialiasing and dynamic content like timestamps and focus on meaningful regressions. See Visual regression for the full mechanics.
6. Reporting
Section titled “6. Reporting”Step events stream back over SSE (browser sessions) or WebSocket (local agents) so you watch each step finish in real time, not at the end. The report URL is stable and deep-linked — anyone in your org with the right role can open it. Public share links aren’t shipped yet; see Sharing reports.
Architecture at a glance
Section titled “Architecture at a glance” ┌──────────┐ plain English ┌───────────┐ YAML steps ┌─────────────┐ │ Author ├────────────────►│ Parser ├───────────────►│ Executor │ └──────────┘ │ (LLM) │ │ (Playwright)│ └───────────┘ └──────┬──────┘ │ ┌───────────┐ real-time events │ │ Reporter │◄──────────────────────┘ │ (SSE/WS) │ └─────┬─────┘ ▼ Marriska DashboardTranslation, parsing, the executor’s orchestration, and reporting all live in the FastAPI backend on Postgres. The Playwright runner can sit alongside it (cloud) or on your own machine (local agent).
Why this matters
Section titled “Why this matters”When tests are plain English:
- QA writes tests without waiting for engineers
- Product managers can read tests like specs
- Tests survive refactors because they describe intent, not DOM paths
- Designers, engineers, and QA all touch the same artifact
This isn’t a replacement for low-level unit tests. It’s for the user-journey-level tests most teams skip because they cost too much to maintain.
Related reading
Section titled “Related reading”- Visual regression — the cascade in detail
- BYOK — moving the AI inference cost off your Marriska quota
- Security model — auth, isolation, and what gets logged
- Step actions reference — the verbs Marriska knows