The NLT Labs Pipeline

One sentence in.
Demo site deployed end-to-end.

Phase 1 researches and evaluates (five analysts in parallel, Devil's Advocate stress-test, then a PE Firm verdict). Phase 2 builds and deploys the demo. A FUND verdict (pe:fund) adds the issue to the scheduled POC queue; founder retrospective + GitHub review keep humans in the loop before deploy.

≤15

Specialists · full hardware path

2

Phases

~30–40 min

Wall-clock · 3 recorded runs

$15–20

Hardware POC total

~6+

Typical FUND threshold (avg)

What the pipeline now outputs

The pipeline is no longer abstract. Nine demos live on nltlabs.ai today — four featured here.

Open pipeline output ↗

Collage of demo sites deployed by the NLT Labs pipeline

Pawvlov

Hardware device that detects storms and rewards calm behavior so dogs unlearn thunder anxiety.

Complio

Compliance docs for the trades — lien waivers, subcontractor agreements, change orders at $39/mo.

BidFast

Voice notes turned into branded, legally-compliant estimates — win the job before leaving the driveway.

Clario

HIPAA, OSHA, and state-health compliance monitoring for small healthcare practices.

Why This Exists

The machine behind every pipeline run.

01

Most companies spend months and $50,000 figuring out if an idea is worth building. NLT Labs runs the same evaluation through a nine-section rubric, with forced Optimist, Skeptic, and Pragmatist lenses, then a Devil's Advocate stress-test. A FUND verdict (typically ~6+ average after the full rubric) queues the build crew for a deployed demo site with evaluation notes attached.

02

The key innovation is research-first. Every agent builds on real competitive data, real customer quotes from Reddit, real market evidence. Nothing is invented cold. The Creative Director runs gap-filling searches before writing a brief. The Market Researcher cites primary sources. The Competitive Analyst pulls actual Crunchbase funding data. By the time the PE Firm scores the idea, it's working with substance, not temperature.

03

Hardware products (when explicitly in scope, default pipeline skews software-first) get a dedicated Product Designer that thinks like an industrial designer: use scenarios, mechanism specs, electrical schematics, real BOM pricing at 1K and 10K unit volumes. All before code or 3D. The 3D render prompt is derived from the mechanism spec, not invented.

04

The output is a demo deployed as a two-door site: a polished consumer experience for browsers, and a single-scroll evaluation-notes page (/inside) with real market data, unit economics, roadmap, capital ask, and a contact button. Hardware POCs add a rotating 3D model, lifestyle hero render, studio render, and a downloadable Tech Specs sheet. Everything grounded in what the agents actually found.

End-to-End Flow

The machine that runs from idea to URL.

Each box is a real specialist agent. The Creative Director briefs five parallel researchers; the Devil's Advocate stress-tests the bull case while the Number Auditor inspects every number for broken math; the PE Firm then issues the verdict. Dig Deeper only runs when more research is required. Not on every idea.

Phase 1: Evaluation

~13–15 min wall-clock · brief → 5 parallel → Devil's Advocate + Number Auditor → PE Firm

pe:fund on the issue → POC pipeline queues on schedule · GitHub review as needed

Phase 2: Build

~17–25 min wall-clock · 4–6 agents

Software path

Hardware-only path

Meet the Team

15 specialists wired into `workflows/*.yaml`. Each one does one job.

Software-only evaluations use fewer roles; hardware adds Product Designer. Dig Deeper is a follow-on loop, not a full-time seat on every idea.

 ✓ 9 Phase 1 · 6 Phase 2 · synced 2026-05-13 

Phase 1: Evaluation 9 roles · ~13–15 min wall-clock

🎨

Creative Director

Claude Sonnet

Expands the seed into a research brief and checks scope. No scoring. Gap-fills before researchers start: missing customer pain? Pulls real Reddit threads. Thin comps? Searches pricing pages. The five analysts inherit a brief that is specific and sourced, not invented.

📊

Market Researcher

Claude Sonnet

Finds the real market, with sources. No invented TAM numbers. Real customer voice from Reddit and reviews. Real industry data with citations.

🥊

Competitive Analyst

Claude Sonnet

Maps the competitive landscape with real funding data. Crunchbase raises in the space, what they built, why they raised. Grounds the PE Firm's valuation reality check.

⚙️

Feasibility Analyst

Claude Sonnet

Runs scope:deep sequential mode, not a shallow scan. Answers "can we actually build this?" The real questions: what tech, what team, what timeline, and what might blow up in year two.

💰

Financial Modeler

Claude Sonnet

Runs the numbers with primary sources required. Customer cost to acquire, lifetime value, break-even point. Broken math shows up in the PE verdict. Not as a rubber stamp.

⚖️

Regulatory Analyst

Claude Sonnet

Hunts for legal landmines with citations. Licensing requirements, data privacy exposure, liability. Primary source required. No guessing.

😈

Devil's Advocate

Claude Opus

Before the PE write-up: attacks the thesis. Fact-checks quantitative claims, stress-tests financials, hunts hidden competitors, documents kill shots with evidence. The PE Firm reads this pass first, then owns the verdict.

🧮

Number Auditor

Claude Sonnet

The math gate. Inspects every quantitative claim across the briefs: TAM/SAM/SOM math, unit economics, contradictions between sections, missing formulas, implausible projections. Owns a broken-math hard cap that floors the PE Firm score when arithmetic is unsound. The reason "derived from cited numbers" actually means something.

🏦

The PE Firm

Claude Opus

The judge. Runs a nine-section Gate 1 rubric (forced Optimist / Skeptic / Pragmatist lenses), then summarizes the headline scorecard most people see: Market, Execution, Moat, and Timing, each with explicit justification.

Dig Deeper (conditional): when the PE verdict asks for more research, a targeted follow-up run answers specific questions, sometimes with a human gate, then the issue is re-scored. It does not replace the Devil's Advocate pass on a full evaluation.

Phase 2: Build 4–6 agents · ~17–25 min wall-clock

🎯

POC Director

Claude Opus

Names the product, researches competitors, and writes a brief grounded in real market data. Saves a competitive-research.md with competitor table, pain points, category visual language, and differentiation angle. Every downstream agent reads it.

🏭

Product Designer

New

Claude Sonnet

Hardware only. The industrial design brain. Runs 6 stages: use scenario narrative, form language rationale, mechanism spec, electrical schematics, sensory design, competitive contrast. Outputs an interactive BOM at 1K and 10K unit volumes, a three-size SKU strategy (S/M/L breed-matched dimensions, retail and gross margin per tier), and materials + regulatory cert sheet (FDA food-grade, UL94 V-0, FCC Part 15 pre-launch). The Visual Assets prompts are derived from this spec. Not invented.

🖼️

Visual Assets

New

Claude Opus

Hardware only. Generates four photorealistic product renders — studio, lifestyle hero, exploded internals, and S/M/L product family — via Nano Banana (Gemini 2.5 Flash Image). An anchor-then-variant pattern locks subject identity across scenes so the dog, the device, and the room are the same across renders. Containerized agent; ships or blocks on judge scores.

🎨

Web Designer

Claude Sonnet

Reads competitive research before touching any layout. A Visual Differentiation section is mandatory: "competitors do X, we do Y, because Z." Every layout decision is grounded in something real.

✍️

Copywriter

Claude Sonnet

Research-first. Every line grounded in real customer pain points from the competitive-research.md. Hardware copy must match the actual mechanism. Runs at the same time as the Web Designer.

🔨

Frontend Builder

Claude Opus

Builds the two-door site (consumer-facing + the 8-tab /inside investor brief), wires the Visual Assets renders and BomTable into the Product & BOM tab, then deploys under a custom subdomain and notifies Bill with the live link + a 1–5 star quality rating prompt. Also ships Path to Product and Tech Specs.

The Output

Not a prototype. A demo site with full evaluation notes attached.

Every FUND-verdict idea is built into all of this. The /inside evaluation-notes page alone would cost $10,000+ from a consultant.

🏠

Consumer Homepage

Emotional, product-focused. Hero, pain statement, feature callouts, lifestyle imagery, and a CTA that converts.

⚙️

Product Experience Pages

How It Works, Dashboard, and Pricing. Not a feature list. A story about what changes for the customer.

🚪

/inside Evaluation Notes

New

Tabbed investor brief, 8 panels: Overview → Market Research → Competitive → Product & BOM → Financial Model → Engineering → Path to Product → Process.

🛣️

Path to Product

New

Derived roadmap with real timelines and costs. Capital ask from actual component pricing and development estimates. Not templated.

🔩

Tech Specs + 3D Model

Hardware

Hardware only. Four photorealistic renders (studio, lifestyle hero, exploded internals, S/M/L product family), interactive BomTable at 10K volume, three-size SKU strategy with breed-matched dimensions and per-tier margin, materials + cert sheet, dedicated Engineering tab (architecture, integrations, data model, security posture, scalability), and a rotating GLB model viewer.

📋

Business Plan

Full PE analysis, competitive table with funding data, headline scorecard (Market/Execution/Moat/Timing from the nine-lens rubric), and GTM. All designed in.

📱

Fully Responsive

Right on a phone, a tablet, and a widescreen monitor. No exceptions. Mobile-first from the start.

Self-Improvement

The system improves itself.

Every deploy feeds back into the pipeline. Over time, the system gets better at predicting which ideas produce good outcomes.

⭐

01

Quality Rating

After each deploy, the Provisioning Agent prompts Bill for a 1–5 star quality rating on Telegram. Site quality, brief quality, anything off.

📡

02

Market Signal Check

7 days later: a check-in. 🔥 Strong interest / 👍 Some / 😐 None / 🗑 Kill. Real market response from real people who saw the site.

📝

03

Structured Debrief

Hot signals trigger a structured debrief. What worked? What resonated? That becomes the v2 brief, or a funded product.

🧠

04

Hive Memory Update

All ratings and signals save to shared memory. Every agent recalls them at the start of each run. The pipeline learns what works, and what doesn't.

The feedback loop is what separates a pipeline from a machine that learns. Each rated POC makes the Creative Director's seed enrichment more calibrated, the PE Firm's scoring more predictive, and the whole system more likely to surface the ideas that actually become products.

Why It Matters

The old way costs a fortune. This doesn't.

Most ideas die not because they're bad, but because validating them is expensive. We changed that math.

Traditional approach

Strategy consultant $15,000–$40,000

Market research firm $8,000–$25,000

Design agency (MVP) $20,000–$60,000

Development team $30,000–$100,000+

Industrial designer $5,000–$20,000

Timeline 3–9 months

Total $78K–$245K+

NLT Labs pipeline

PE evaluation: 5 analysts + PE + Devil $3–6 in AI tokens

Hardware POC: 6 agents + Nano Banana renders $8–15 in AI tokens

Software POC: 4 agents $3–5 in AI tokens

Hosting (Render static CDN) Marginal · bundled in ops

Custom subdomain + deploy Automated under NLT umbrella

Timeline ~30–40 min wall-clock

Hardware POC total ~$15–20

"We're not replacing human judgment. We're running it at a scale and speed no human team could match. So the ideas that deserve to exist get a real shot."

Each AI agent has a single job and does it the way a specialist consultant would. The PE Firm scores through nine forced-lens sections, then lands on a verdict; the Devil's Advocate tries to tear the thesis apart with evidence. The Product Designer outputs a real BOM before anyone generates a 3D model. The Copywriter reads customer pain quotes before writing a headline. The result is an honest evaluation and a real demo deployment. Not a pitch deck.

See what the pipeline outputs.

Every demo above was built by these agents through this pipeline. Click any card to open the demo site and the evaluation notes behind it.

View pipeline output →