Skip to main content
Node · WL-AI-2026/Automation

Claude Agent SDK·n8n·RAG·MCP·Voice·Evals

Production AI agents. Shipped, not pitched.

Senior AI engineering. Real agents in production, not slideware. Internal ops bots, RAG-backed support chat, lead enrichment, content pipelines, voice agents. Built with Claude Agent SDK, n8n, LangGraph, and Langfuse for evals. Code lives in your repo. Token costs pass through transparently.

  • 01Senior AI engineer named on day one
  • 02Code in your repo, on your infra, your API keys
  • 03Evals and tracing wired from day one
  • 04Free workflow audit in four business days
Founding partner · Q4 2026 · Token costs pass through
Node · audit.intake
v 1.0
Free workflow auditForm 01 / 02

Send the workflow. Get the diagram.

Tell us what to automate. Inside four business days you get an agent diagram, an eval plan, a token-cost estimate, the named senior engineer, and a price.

Mutual NDA · Code in your repo · Token cost passthrough

Reply within one business day · Read by a senior engineer

Agent diagram · sanitized snapshot

The pipeline we actually ship.

Five nodes, each with a real latency and a real cost. This is the diagram you get back in your audit, with the exact stack we'd ship for your workflow. Evals run on every node. Tracing is wired before the agent goes live. Token cost is transparent, dashboarded, and yours to inspect.

Diagram v 04 · client-11

Pipeline · support.deflection · prod
Live · 1.2k req/hr
  1. 01

    Trigger

    Form / Webhook

    p95

    80ms

    cost

    $0.001

  2. 02

    Retrieve

    pgvector / Pinecone

    p95

    120ms

    cost

    $0.0008

  3. 03

    LLM

    Claude Opus 4.7

    p95

    1.4s

    cost

    $0.012

  4. 04

    Eval

    Langfuse / Braintrust

    p95

    60ms

    cost

    $0.0003

  5. 05

    Action

    CRM / Slack / DB

    p95

    90ms

    cost

    $0.0005

Total p95

1.75s

Cost / call

$0.014

Eval pass rate

94.2%

Shipped agents · sanitized · in production
Code in your repo
  • Support deflection bot

    47% deflection$0.014 / msg
  • Lead enricher

    12k leads / week$0.008 / lead
  • Call summarizer

    640 hrs saved / qtr$0.21 / call
  • Content pipeline

    1 pillar → 12 atoms$0.42 / piece

This is what an audit response looks like. A real diagram, real costs, real evals. No strategy deck.

Send the workflow

Workflow audit · sample from a Q3 onboarding

What we look at before an agent ships.

Every engagement opens with an eight-point workflow audit. Six rows below are pulled from a real onboarding. FAIL items become sprint one. PASS items get held in place.

Audit card · client-11 · 6 of 8 findingsExported with the diagram
  • FAIL
    Hallucination management
    No retrieval grounding. No confidence threshold. No HITL fallback for low-confidence answers.
  • FAIL
    Observability and tracing
    No tracing wired. Can't inspect prompt, retrieval context, or tool calls. Debugging is a black box.
  • WARN
    Eval suite
    Zero evals. No regression detection between prompt versions. Quality drift goes unnoticed.
  • WARN
    Vendor lock-in
    Hardcoded to single provider. No abstraction layer. Migrating away requires a rewrite.
  • PASS
    Data residency and security
    Vendor confirmed not to train on inputs. PII scrubbing in place. Audit log retained 90 days.
  • PASS
    API key custody
    Keys live in your vault, rotated quarterly. Per-environment scoping. No shared keys in the codebase.

Eight rows like these land in your inbox inside four business days, alongside the agent diagram.

Commission the audit
The offer · in four numbers04 / 04
  • I·Retainer floor

    $3,500

    / month + tokens

  • II·Project pricing floor

    $10K+

    / agent build

  • III·Production gate

    Day 01

    evals + tracing

  • IV·From brief to audit

    4

    business days

Multi-agent build from $50K · Voice agent from $25K · Token cost passthrough, transparent dashboards

agents.registry · production manifest

Six entries in the registry. Each one typed and shipped.

06 / 06 · live · code in your repo

agents · prod registry
06 entries · 5 prod · 1 staging
  1. agent.ops@ v3.4.1

    Internal agents

    Syncprod

    Ops bots, sales-call summarizers, lead enrichment, content drafters, internal admin tools. Models routed by task. Human-in-the-loop where it matters.

    Signature

    ops: (task: Brief) → Result<Action[]>

    stack · Claude Agent SDK · n8n

    • Model

      Claude Opus 4.7

    • Latency · p95

      2.1s

    • Cost / call

      $0.018

    • Eval pass

      96.2%

  2. agent.rag@ v2.7.0

    RAG support chat

    Streamingprod

    Customer-facing chat with retrieval over your knowledge base. Confidence thresholds, citation surfacing, fall-back to human. Deflection metrics on the dashboard.

    Signature

    rag: (msg: Msg, ctx: Conv) → Stream<Reply>

    stack · pgvector · LangGraph

    • Model

      Claude Sonnet 4.6 · pgvector

    • Latency · p95

      1.4s

    • Cost / call

      $0.014

    • Eval pass

      94.1%

  3. agent.voice@ v1.9.0

    Voice agents

    Realtimeprod

    Inbound qualification, outbound sales, appointment booking. Sub-600ms latency. Real-time transcription, structured output, CRM-side write-back.

    Signature

    voice: (audio: Stream<PCM>) → Stream<Turn>

    stack · Vapi · Retell · WebRTC

    • Model

      Haiku 4.5 · Vapi · Retell

    • Latency · p95

      560ms

    • Cost / call

      $0.21 / min

    • Eval pass

      91.8%

  4. flow.automate@ v4.1.2

    Workflow automations

    Asyncprod

    Form-to-action chains, CRM enrichment, scoring, nurture cadence, reporting pipelines. n8n self-hosted for engineer-grade control.

    Signature

    automate: (trigger: Event) → Run<Job>

    stack · n8n · Make · Zapier

    • Model

      Routed · Sonnet / Haiku

    • Latency · p95

      320ms

    • Cost / call

      $0.004

    • Eval pass

      98.4%

  5. agent.content@ v2.3.0

    Content pipelines

    Multi-agentprod

    One pillar to twelve atoms. Brief, draft, edit, schedule, distribute. Voice-locked to your tone document. AI drafts, senior editor reviews, you ship.

    Signature

    content: (pillar: Brief) → Atom[12]

    stack · Claude · n8n

    • Model

      Claude Opus 4.7 · Sonnet 4.6

    • Latency · p95

      4.8s

    • Cost / call

      $0.42 / piece

    • Eval pass

      89.6%

  6. agent.admin@ v1.5.4

    Internal LLM admin tools

    Syncstaging

    Lightweight web apps backed by LLMs for ops teams. Tag classification, summarization queues, batch-extract jobs, knowledge-base curation.

    Signature

    admin: (rows: Record[]) → Annotated[]

    stack · Next.js · Claude

    • Model

      Haiku 4.5

    • Latency · p95

      180ms

    • Cost / call

      $0.0006 / row

    • Eval pass

      95.3%

Avg eval pass · 94.2% · all entries traced + dashboarded

Stack is yours to inspect, swap, or fork. Every entry above is a senior call we can defend in writing — with the eval suite to back it.

Send the workflow

When the call usually comes in

Three reasons the workflow lands in our inbox.

  1. Reason 01

    You have an ops problem that should be a script.

    Sales call summaries, lead enrichment, internal classification, content distribution. All of it manual. The senior AI engineer you wanted to hire wants $300k loaded. We're the bench you bring in instead, shipping production agents inside the first sprint.

  2. Reason 02

    Your team built a Zapier flow and it's breaking weekly.

    Twelve steps, three branches, a janky LLM call in the middle, no error handling. We rebuild it on n8n self-hosted with evals, retries, and observability. Faster, cheaper, traceable. Code lives in your repo so your team can take it back any time.

  3. Reason 03

    You're tired of AI consultants who ship slides.

    Three months and a strategy deck cost $80k. Zero shipped agents. We do the opposite. Audit lands in four business days, first agent prototype in your repo by day seven, evals wired before production traffic.

The honest comparison

When senior-led AI is the right answer. And when it isn't.

CriterionIn-house hireZapier consultantAI agencyGrovant
  • Senior AI engineer named on the workyesnodependsyes
  • Evals + tracing on day onedependsnorareyes
  • Code in your repoyesnodependsyes
  • Token cost passthrough, transparentn/anorareyes
  • HITL fallback wired independsnodependsyes
  • Mutual NDA + no-train clausen/anodependsyes
  • Fully loaded annual cost$280-400k$60-120k$60-180k$42-180k
  • Time to first agent in production10 weeks1 day (fragile)12 weeks1 week
Operating principles · in writing
Principles I to IV

Four principles.
Wired into every agent.

The reasons most AI projects fail are the reasons we put these four principles in the room before any agent goes live.

  1. PI

    You own the prompts, code, and infra

    Every prompt lives in your repo, version-controlled. The agent code lives in your repo. The infra runs on your accounts (AWS, Vercel, Railway, your call). Token costs pass through transparently from the provider. We don't resell tokens or hold your API keys.

  2. PII

    Evals and tracing on day one

    Langfuse or Braintrust wired before the first production call. Every prompt change runs against the eval suite. Tracing on every node. You can inspect what the agent did, why it did it, and how much it cost, per request.

  3. PIII

    Hallucination has a human-in-the-loop fallback

    Confidence thresholds enforced on every output. Below threshold routes to a human queue, not a guess. Citations surfaced wherever the agent claims a fact. Retrieval is the default, generation is the assist.

  4. PIV

    Mutual NDA, no training on your data

    Signed before we look at your workflows. Vendor contracts confirmed not to train on your inputs. PII scrubbing in place. Audit log retained on your schedule, not ours.

Operator's mark

You can leave after the first sprint. We'd rather you stay because the agents are paying for themselves.

Eval ledger · v0.1 → v1.0 · 28 days

The cadence is an eval suite. No version ships without it.

Production gate · ≥ 90% pass

evals.run · support.deflection
06 iterations · 312 test cases
  1. v0.1
    Day 03

    Baseline · zero-shot Claude Sonnet, no retrieval, no HITL.

    84 tests · Senior engineer

    46.2%
    p951.9sCost$0.018
  2. v0.2
    Day 07

    Added pgvector retrieval over your KB. Confidence threshold at 0.72.

    124 tests · Senior engineer

    62.8% 16.6
    p951.7sCost$0.014
  3. v0.3
    Day 12

    System prompt rewrite. Voice locked to brand doc. Banned phrases enforced.

    188 tests · Senior engineer

    71.4% 8.6
    p951.6sCost$0.014
  4. v0.4
    Day 18

    HITL fallback wired below 0.65 confidence. Citation surfacing added.

    246 tests · Senior engineer

    82.0% 10.6
    p951.5sCost$0.013
  5. v0.5
    Day 22

    Router added. Haiku for tier-2 queries, Opus for tier-1. Cost halved.

    312 tests · Senior engineer

    87.6% 5.6
    p951.4sCost$0.009
  6. v1.0shipped
    Day 28

    Eval suite past production gate. Rolled out to 100% traffic.

    312 tests · Account lead

    94.2% 6.6
    p951.4sCost$0.009
Production gate · cleared
v1.0 · pass 94.2% · ≥ 90% required · cost ↓ 50%

First agent ships to production inside three weeks. Diagram and audit land in four business days.

Start the audit
AI questions

What people ask before they let an agent into the stack.

Plain answers about prompt ownership, hallucination management, evals, costs, and how a production AI program actually runs.

  • Zapier is great for two-tool, low-stakes glue. The moment a workflow needs branching logic, retry handling, audit logs, or LLM calls inside a step, you hit the wall fast. We build the next layer down — custom code with proper observability — and only reach for an LLM when classification or generation is the actual job.

  • Whichever fits the task. Claude for reasoning and code-adjacent work, GPT-4 for general agents, Gemini for cost-sensitive classification, locally hosted Llama for privacy-sensitive enterprise jobs. We tell you which model and why in the scoping doc, and we keep the choice swappable.

  • They work. Every build ships with a runbook, monitoring dashboards, and an on-call escalation path. A monthly operating retainer is optional — it covers alerts, edge-case fixes, and model swaps as the providers update. Many teams take it for the first six months and then move ops in-house.

  • In your infrastructure. Postgres, Supabase, your VPC, your S3 bucket. We never run a multi-tenant database. API keys for the models stay in your environment variables. Everything we build is fully transferable on day one.

  • Yes. Custom CRMs are a regular engagement — same stack as the automations, with admin pages, role-based access, audit logs, and the workflow automations layered on top. Designed around your team's actual workflow, not adapted from Salesforce.

  • Every flow has a retry policy, a fallback path, and a Slack alert when both fail. The run log records every invocation with its status and duration. Edge cases that surface in production get added to the test suite. Nothing fails silently.

Don't see your question?Send a quick message →
Reply · within 1 business day

Still have a question we didn't cover? Ask it in the brief. A senior engineer reads every submission inside one business day.

What happens after you send

Three steps. Four business days.

You send the workflow. The agent diagram and audit land four business days later. Here is exactly what runs in between.

  1. 01Within 24 hours

    A senior replies

    Not a coordinator, not an SDR. The senior AI engineer who would run your workflow replies from a real email, confirms scope, and sends the mutual NDA with the no-training clause.

  2. 02Days 2 to 3

    We run the workflow audit

    After the NDA is signed, we audit the workflow. Hallucination management, observability, evals, vendor lock-in, data residency, API key custody. Eight points across the agent surface.

  3. 03Day 04

    Diagram and price land

    You get the agent diagram, the audit document, an eval plan, a token-cost estimate, the named senior engineer, and a transparent price. Retainer or project, your call.

No follow-up sequence. No drip campaign. If we miss the four-day window, we eat the first sprint.

Start step 01
Node · WL-AI-2026/Closing brief
Page 09 of 09

One step

Send the workflow.
We'll send back a diagram.

Inside four business days, you get an agent diagram, a workflow audit, an eval plan, a token-cost estimate, the named senior engineer, and a transparent price. Retainer or project, your call.

  • 01Free agent diagram + workflow audit in 4 biz days
  • 02Senior AI engineer named on day one
  • 03Code in your repo, evals on day one
  • 04Token cost passthrough, transparent dashboards
Node · audit.intake
v 1.0
Free workflow auditForm 02 / 02

Send the workflow. Diagram comes back in four.

Five fields. One business day to a senior reply. No follow-up sequence.

Mutual NDA · Code in your repo · Token cost passthrough

Sealed · Read by a senior engineer

Free workflow audit · 4 biz days

Production AI agents, not slides

Send workflow