We have Zapier already. Why would we need this?

Zapier is great for two-tool, low-stakes glue. The moment a workflow needs branching logic, retry handling, audit logs, or LLM calls inside a step, you hit the wall fast. We build the next layer down — custom code with proper observability — and only reach for an LLM when classification or generation is the actual job.

What models are you using under the hood?

Whichever fits the task. Claude for reasoning and code-adjacent work, GPT-4 for general agents, Gemini for cost-sensitive classification, locally hosted Llama for privacy-sensitive enterprise jobs. We tell you which model and why in the scoping doc, and we keep the choice swappable.

Will the agents work after you finish the build, or do we need a retainer?

They work. Every build ships with a runbook, monitoring dashboards, and an on-call escalation path. A monthly operating retainer is optional — it covers alerts, edge-case fixes, and model swaps as the providers update. Many teams take it for the first six months and then move ops in-house.

Where does the data live?

In your infrastructure. Postgres, Supabase, your VPC, your S3 bucket. We never run a multi-tenant database. API keys for the models stay in your environment variables. Everything we build is fully transferable on day one.

Can you build a custom CRM, or do you only do automations?

Yes. Custom CRMs are a regular engagement — same stack as the automations, with admin pages, role-based access, audit logs, and the workflow automations layered on top. Designed around your team's actual workflow, not adapted from Salesforce.

How do you handle errors and edge cases in agent workflows?

Every flow has a retry policy, a fallback path, and a Slack alert when both fail. The run log records every invocation with its status and duration. Edge cases that surface in production get added to the test suite. Nothing fails silently.

Node · WL-AI-2026/Dossier 07 of 09 · Automation practiceAutomation

← Practice index

Claude Agent SDK·n8n·RAG·MCP·Voice·Evals

Production AI agents. Shipped, not pitched.

Senior AI engineering. Real agents in production, not slideware. Internal ops bots, RAG-backed support chat, lead enrichment, content pipelines, voice agents. Built with Claude Agent SDK, n8n, LangGraph, and Langfuse for evals. Code lives in your repo. Token costs pass through transparently.

01Senior AI engineer named on day one
02Code in your repo, on your infra, your API keys
03Evals and tracing wired from day one
04Free workflow audit in four business days

Founding partner · Q4 2026 · Token costs pass through

Node · audit.intake

v 1.0

Free workflow auditForm 01 / 02

Send the workflow. Get the diagram.

Tell us what to automate. Inside four business days you get an agent diagram, an eval plan, a token-cost estimate, the named senior engineer, and a price.

Reply within one business day · Read by a senior engineer

Agent diagram · sanitized snapshot

The pipeline we actually ship.

Five nodes, each with a real latency and a real cost. This is the diagram you get back in your audit, with the exact stack we'd ship for your workflow. Evals run on every node. Tracing is wired before the agent goes live. Token cost is transparent, dashboarded, and yours to inspect.

Diagram v 04 · client-11

Pipeline · support.deflection · prod

Live · 1.2k req/hr

01
Trigger
Form / Webhook
p95
80ms
cost
$0.001
02
Retrieve
pgvector / Pinecone
p95
120ms
cost
$0.0008
03
LLM
Claude Opus 4.7
p95
1.4s
cost
$0.012
04
Eval
Langfuse / Braintrust
p95
60ms
cost
$0.0003
05
Action
CRM / Slack / DB
p95
90ms
cost
$0.0005

Total p95

1.75s

Cost / call

$0.014

Eval pass rate

94.2%

Shipped agents · sanitized · in production

Code in your repo

Support deflection bot
RAG · multi-turn47% deflection$0.014 / msg
Lead enricher
CRM · webhook12k leads / week$0.008 / lead
Call summarizer
Audio · agent640 hrs saved / qtr$0.21 / call
Content pipeline
Multi-agent · n8n1 pillar → 12 atoms$0.42 / piece

This is what an audit response looks like. A real diagram, real costs, real evals. No strategy deck.

Send the workflow

Workflow audit · sample from a Q3 onboarding

What we look at before an agent ships.

Every engagement opens with an eight-point workflow audit. Six rows below are pulled from a real onboarding. FAIL items become sprint one. PASS items get held in place.

Audit card · client-11 · 6 of 8 findingsExported with the diagram

FAIL
Hallucination management
No retrieval grounding. No confidence threshold. No HITL fallback for low-confidence answers.
FAIL
Observability and tracing
No tracing wired. Can't inspect prompt, retrieval context, or tool calls. Debugging is a black box.
WARN
Eval suite
Zero evals. No regression detection between prompt versions. Quality drift goes unnoticed.
WARN
Vendor lock-in
Hardcoded to single provider. No abstraction layer. Migrating away requires a rewrite.
PASS
Data residency and security
Vendor confirmed not to train on inputs. PII scrubbing in place. Audit log retained 90 days.
PASS
API key custody
Keys live in your vault, rotated quarterly. Per-environment scoping. No shared keys in the codebase.

Eight rows like these land in your inbox inside four business days, alongside the agent diagram.

Commission the audit

The offer · in four numbers04 / 04

I·Retainer floor
$3,500
/ month + tokens
II·Project pricing floor
$10K+
/ agent build
III·Production gate
Day 01
evals + tracing
IV·From brief to audit
4
business days

Multi-agent build from $50K · Voice agent from $25K · Token cost passthrough, transparent dashboards

agents.registry · production manifest

Six entries in the registry. Each one typed and shipped.

06 / 06 · live · code in your repo

agents · prod registry/v2026.11

06 entries · 5 prod · 1 staging

agent.ops@ v3.4.1
Internal agents
Syncprod
Ops bots, sales-call summarizers, lead enrichment, content drafters, internal admin tools. Models routed by task. Human-in-the-loop where it matters.
Signature
ops: (task: Brief) → Result<Action[]>
stack · Claude Agent SDK · n8n
- Model
  Claude Opus 4.7
- Latency · p95
  2.1s
- Cost / call
  $0.018
- Eval pass
  96.2%
agent.rag@ v2.7.0
RAG support chat
Streamingprod
Customer-facing chat with retrieval over your knowledge base. Confidence thresholds, citation surfacing, fall-back to human. Deflection metrics on the dashboard.
Signature
rag: (msg: Msg, ctx: Conv) → Stream<Reply>
stack · pgvector · LangGraph
- Model
  Claude Sonnet 4.6 · pgvector
- Latency · p95
  1.4s
- Cost / call
  $0.014
- Eval pass
  94.1%
agent.voice@ v1.9.0
Voice agents
Realtimeprod
Inbound qualification, outbound sales, appointment booking. Sub-600ms latency. Real-time transcription, structured output, CRM-side write-back.
Signature
voice: (audio: Stream<PCM>) → Stream<Turn>
stack · Vapi · Retell · WebRTC
- Model
  Haiku 4.5 · Vapi · Retell
- Latency · p95
  560ms
- Cost / call
  $0.21 / min
- Eval pass
  91.8%
flow.automate@ v4.1.2
Workflow automations
Asyncprod
Form-to-action chains, CRM enrichment, scoring, nurture cadence, reporting pipelines. n8n self-hosted for engineer-grade control.
Signature
automate: (trigger: Event) → Run<Job>
stack · n8n · Make · Zapier
- Model
  Routed · Sonnet / Haiku
- Latency · p95
  320ms
- Cost / call
  $0.004
- Eval pass
  98.4%
agent.content@ v2.3.0
Content pipelines
Multi-agentprod
One pillar to twelve atoms. Brief, draft, edit, schedule, distribute. Voice-locked to your tone document. AI drafts, senior editor reviews, you ship.
Signature
content: (pillar: Brief) → Atom[12]
stack · Claude · n8n
- Model
  Claude Opus 4.7 · Sonnet 4.6
- Latency · p95
  4.8s
- Cost / call
  $0.42 / piece
- Eval pass
  89.6%
agent.admin@ v1.5.4
Internal LLM admin tools
Syncstaging
Lightweight web apps backed by LLMs for ops teams. Tag classification, summarization queues, batch-extract jobs, knowledge-base curation.
Signature
admin: (rows: Record[]) → Annotated[]
stack · Next.js · Claude
- Model
  Haiku 4.5
- Latency · p95
  180ms
- Cost / call
  $0.0006 / row
- Eval pass
  95.3%

Stack is yours to inspect, swap, or fork. Every entry above is a senior call we can defend in writing — with the eval suite to back it.

Send the workflow

When the call usually comes in

Three reasons the workflow lands in our inbox.

Reason 01
You have an ops problem that should be a script.
Sales call summaries, lead enrichment, internal classification, content distribution. All of it manual. The senior AI engineer you wanted to hire wants $300k loaded. We're the bench you bring in instead, shipping production agents inside the first sprint.
Reason 02
Your team built a Zapier flow and it's breaking weekly.
Twelve steps, three branches, a janky LLM call in the middle, no error handling. We rebuild it on n8n self-hosted with evals, retries, and observability. Faster, cheaper, traceable. Code lives in your repo so your team can take it back any time.
Reason 03
You're tired of AI consultants who ship slides.
Three months and a strategy deck cost $80k. Zero shipped agents. We do the opposite. Audit lands in four business days, first agent prototype in your repo by day seven, evals wired before production traffic.

The honest comparison

When senior-led AI is the right answer. And when it isn't.

CriterionIn-house hireZapier consultantAI agencyGrovant

Senior AI engineer named on the workyesnodependsyes
Evals + tracing on day onedependsnorareyes
Code in your repoyesnodependsyes
Token cost passthrough, transparentn/anorareyes
HITL fallback wired independsnodependsyes
Mutual NDA + no-train clausen/anodependsyes
Fully loaded annual cost$280-400k$60-120k$60-180k$42-180k
Time to first agent in production10 weeks1 day (fragile)12 weeks1 week

Operating principles · in writing

Principles I to IV

Four principles.
Wired into every agent.

The reasons most AI projects fail are the reasons we put these four principles in the room before any agent goes live.

PI
You own the prompts, code, and infra
Every prompt lives in your repo, version-controlled. The agent code lives in your repo. The infra runs on your accounts (AWS, Vercel, Railway, your call). Token costs pass through transparently from the provider. We don't resell tokens or hold your API keys.
Principle I
PII
Evals and tracing on day one
Langfuse or Braintrust wired before the first production call. Every prompt change runs against the eval suite. Tracing on every node. You can inspect what the agent did, why it did it, and how much it cost, per request.
Principle II
PIII
Hallucination has a human-in-the-loop fallback
Confidence thresholds enforced on every output. Below threshold routes to a human queue, not a guess. Citations surfaced wherever the agent claims a fact. Retrieval is the default, generation is the assist.
Principle III
PIV
Mutual NDA, no training on your data
Signed before we look at your workflows. Vendor contracts confirmed not to train on your inputs. PII scrubbing in place. Audit log retained on your schedule, not ours.
Principle IV

Operator's mark

You can leave after the first sprint. We'd rather you stay because the agents are paying for themselves.

Send the workflow Book a 20-minute call

Eval ledger · v0.1 → v1.0 · 28 days

The cadence is an eval suite. No version ships without it.

Production gate · ≥ 90% pass

evals.run · support.deflection

06 iterations · 312 test cases

VersionDateDiff · what changedPass ratep95Cost

v0.1
Day 03
Baseline · zero-shot Claude Sonnet, no retrieval, no HITL.
84 tests · Senior engineer
46.2%
p951.9sCost$0.018
v0.2
Day 07
Added pgvector retrieval over your KB. Confidence threshold at 0.72.
124 tests · Senior engineer
62.8%▲ 16.6
p951.7sCost$0.014
v0.3
Day 12
System prompt rewrite. Voice locked to brand doc. Banned phrases enforced.
188 tests · Senior engineer
71.4%▲ 8.6
p951.6sCost$0.014
v0.4
Day 18
HITL fallback wired below 0.65 confidence. Citation surfacing added.
246 tests · Senior engineer
82.0%▲ 10.6
p951.5sCost$0.013
v0.5
Day 22
Router added. Haiku for tier-2 queries, Opus for tier-1. Cost halved.
312 tests · Senior engineer
87.6%▲ 5.6
p951.4sCost$0.009
v1.0shipped
Day 28
Eval suite past production gate. Rolled out to 100% traffic.
312 tests · Account lead
94.2%▲ 6.6
p951.4sCost$0.009

First agent ships to production inside three weeks. Diagram and audit land in four business days.

Start the audit

AI questions

What people ask before they let an agent into the stack.

Plain answers about prompt ownership, hallucination management, evals, costs, and how a production AI program actually runs.

Zapier is great for two-tool, low-stakes glue. The moment a workflow needs branching logic, retry handling, audit logs, or LLM calls inside a step, you hit the wall fast. We build the next layer down — custom code with proper observability — and only reach for an LLM when classification or generation is the actual job.
Whichever fits the task. Claude for reasoning and code-adjacent work, GPT-4 for general agents, Gemini for cost-sensitive classification, locally hosted Llama for privacy-sensitive enterprise jobs. We tell you which model and why in the scoping doc, and we keep the choice swappable.
They work. Every build ships with a runbook, monitoring dashboards, and an on-call escalation path. A monthly operating retainer is optional — it covers alerts, edge-case fixes, and model swaps as the providers update. Many teams take it for the first six months and then move ops in-house.
In your infrastructure. Postgres, Supabase, your VPC, your S3 bucket. We never run a multi-tenant database. API keys for the models stay in your environment variables. Everything we build is fully transferable on day one.
Yes. Custom CRMs are a regular engagement — same stack as the automations, with admin pages, role-based access, audit logs, and the workflow automations layered on top. Designed around your team's actual workflow, not adapted from Salesforce.
Every flow has a retry policy, a fallback path, and a Slack alert when both fail. The run log records every invocation with its status and duration. Edge cases that surface in production get added to the test suite. Nothing fails silently.

Don't see your question?Send a quick message →

Reply · within 1 business day

Still have a question we didn't cover? Ask it in the brief. A senior engineer reads every submission inside one business day.

Send the workflow Book a call

What happens after you send

Three steps. Four business days.

You send the workflow. The agent diagram and audit land four business days later. Here is exactly what runs in between.

01Within 24 hours
A senior replies
Not a coordinator, not an SDR. The senior AI engineer who would run your workflow replies from a real email, confirms scope, and sends the mutual NDA with the no-training clause.
02Days 2 to 3
We run the workflow audit
After the NDA is signed, we audit the workflow. Hallucination management, observability, evals, vendor lock-in, data residency, API key custody. Eight points across the agent surface.
03Day 04
Diagram and price land
You get the agent diagram, the audit document, an eval plan, a token-cost estimate, the named senior engineer, and a transparent price. Retainer or project, your call.

No follow-up sequence. No drip campaign. If we miss the four-day window, we eat the first sprint.

Start step 01

Node · WL-AI-2026/Closing brief

Page 09 of 09

One step

Send the workflow.
We'll send back a diagram.

Inside four business days, you get an agent diagram, a workflow audit, an eval plan, a token-cost estimate, the named senior engineer, and a transparent price. Retainer or project, your call.

01Free agent diagram + workflow audit in 4 biz days
02Senior AI engineer named on day one
03Code in your repo, evals on day one
04Token cost passthrough, transparent dashboards

Rather talk first?

Book a 20-minute call with a senior

Node · audit.intake

v 1.0

Free workflow auditForm 02 / 02

Send the workflow. Diagram comes back in four.

Five fields. One business day to a senior reply. No follow-up sequence.

Sealed · Read by a senior engineer

Free workflow audit · 4 biz days

Production AI agents, not slides

Send workflow

Production AI agents. Shipped, not pitched.

Send the workflow. Get the diagram.

The pipeline we actually ship.

What we look at before an agent ships.

Six entries in the registry. Each one typed and shipped.

Internal agents

RAG support chat

Voice agents

Workflow automations

Content pipelines

Internal LLM admin tools

Three reasons the workflow lands in our inbox.

You have an ops problem that should be a script.

Your team built a Zapier flow and it's breaking weekly.

You're tired of AI consultants who ship slides.

When senior-led AI is the right answer. And when it isn't.

Four principles.Wired into every agent.

You own the prompts, code, and infra

Evals and tracing on day one

Hallucination has a human-in-the-loop fallback

Mutual NDA, no training on your data

The cadence is an eval suite. No version ships without it.

Three steps. Four business days.

A senior replies

We run the workflow audit

Diagram and price land

Send the workflow.We'll send back a diagram.

Send the workflow. Diagram comes back in four.

Four principles.
Wired into every agent.

Send the workflow.
We'll send back a diagram.