DONE FOR YOU · OUTSOURCED AI ENGINEERING

Ship production AI for your startup in 4 weeks.

For seed/Series-A founders without an in-house AI team. RAG, agents, ingestion pipelines, evals. One scoping week, then a 4-week build sprint to production.

Book a 30-min scoping call See the proof

Outcomes

What you get

RAG-in-prod in 4 weeks

Discovery + spec + a working pipeline behind your data.

Agents with eval-CI guardrails

Tool-using agents with eval CI, red-team, and observability.

Document ingestion pipelines

OpenSearch / Bedrock ingestion that stays inside your AWS account.

Track record

Real numbers, real systems

99.9%

API availability on a client's production platform

10×

cut in a client's infrastructure costs

93%

retrieval accuracy on a client's production document AI (CI-gated)

15+ yrs

shipping production software

Real numbers from production systems. The full portfolio, with NDA-gated client names and the eval/ADR receipts, is in the brief. Read the full brief →

Receipts

Proof of capability

Case study

Production agentic platform — multi-model orchestration + MCP toolchain

An agent-driven analytics platform for a Series-B DeFi company (founder-direct under NDA; founder takes reference calls behind a mutual NDA). MCP tool servers for token resolution, asset search, swap quotes and oracle/gas; multi-model routing across Claude, GPT-5.5 and Gemini behind LiteLLM; a live multi-service Gemini 2.5→3.5 migration with RAG indexing, AI guardrails and PII anonymisation. The hard 90% wasn't the model call — it was making tool-using agents reliable in production: 99.9% API availability, p95 swap-quote under 300 ms, MTTR under 5 min.

Read the full case study →

A working RAG + agent stack

Full pipeline: ingest, retrieve, agent, eval CI. AI-coded, walked through live on the scoping call.

Eval CI as a merge gate

Recorded judge cassettes gate every PR, so no model regression ships to production.

Observability from day one

OpenTelemetry + Prometheus + structlog wired in, not bolted on later.

Architecture you own

ADRs, a chunked implementation spec, and the code transferred to you — no lock-in.

Models

Which LLMs we run — and why

Claude (Opus 4.8 / Sonnet 4.6)

Default for agentic and orchestration work — Sonnet 4.6 front-desk routing into Opus 4.8 specialists, strongest tool-calling reliability and code reasoning.

GPT-5.5 / GPT-5.4 mini + text-embedding-3-large

Structured extraction and batch matching where GPT-5.4 mini wins on cost-per-call (capped 200–800 tok); 3072-dim embeddings for hybrid retrieval.

Gemini 2.5 → 3.5

Migrated multi-service under live traffic — proof the routing layer survives a provider version jump without a rewrite.

AWS Bedrock (Titan V2 / Cohere Embed v4)

When embeddings and inference must stay inside the customer's AWS account — regulated-data engagements end-to-end.

Everything sits behind LiteLLM with a per-task primary + fallback chain (ADR: single adapter, not per-provider SDKs). We don't marry a model — each workload is chosen on tool-calling reliability, cost-per-token, and data residency.

Stack

The AI automation stack

Models

Claude · GPT-5.5 · Gemini · Bedrock · LiteLLM gateway

Orchestration

LangGraph · Claude Code · MCP tool servers · eval CI

Retrieval

pgvector · OpenSearch hybrid · RRF k=60 · Bedrock embeddings

Backend

Python · FastAPI · TypeScript · Go / Rust hot path

Apps & voice

Tauri · Electron · Speech-to-text streaming · Next.js

Data

PostgreSQL · Redis · Elasticsearch · S3

Infra & obs

AWS (EKS / ECS) · OpenTelemetry · Prometheus · Langfuse

Skills

Specialized skills for every component we ship

Each engagement runs on versioned, productized skills: the domain expertise our coding agents load before they touch your system. The relevant skills transfer with the handover, so your team keeps shipping after the sprint.

RAG & retrieval

Chunking, hybrid retrieval (pgvector + OpenSearch), recall measured in CI — not assumed.

Federated + semantic search, reranking, MCP-compliant search surfaces.

Pipelines & DAGs

Airflow / Temporal orchestration — idempotent, schema-versioned, resumable.

Specialized agents

Tool-calling loops with guardrails, MCP tool servers, scoped and audit-logged write actions.

Evals

Golden sets and judge cassettes that gate every merge — regressions stop before production.

Ingestion

OCR, parsing, embedding, index sync — documents into queryable knowledge.

Observability

OTel traces, per-tenant token cost accounting, alerting that fires before customers notice.

Deploy & SRE

Containerized, autoscaled, circuit-broken — operated under real traffic.

~85 skills under ADOPT / FORK / BUILD governance with semver pinning — the same library both products draw from.

Integrations

Wired into your real systems.

Agents that take action have to reach your stack — HubSpot, MS Graph, Twilio, SendGrid, internal REST/GraphQL APIs, and event webhooks. Webhooks ship idempotent, signature-verified, and dead-lettered, with RFC-7231 Retry-After backoff; write actions are scoped, confirmed, and audit-logged.

Laws we live by · LLMs in production

10 laws for shipping AI to production without burning cash

Every sprint runs on these laws. They're how a RAG system gets to production in four weeks.

01
Spec hard, code soft.
A page of working spec is worth a week of throwaway code. LLMs accelerate the wrong thing if the spec is wrong.
02
Evals before the model call.
Write the failing eval first. Without a passing bar you don't have a product, you have a forever-prototype.
03
Tools beat prompts.
A 20-line tool with a strict schema beats a 2,000-token system prompt. The model recovers from a wrong tool call; it doesn't recover from a vague instruction.
04
Cache aggressively, route ruthlessly.
Prompt cache, embedding cache, response cache. Cheap model routes; expensive model produces. 80% of the bill is the wrong model on the wrong call.
05
Monolith for LLMs, microservices for humans.
LLMs read a monolith faster than a 12-service mesh. Split early and you lose the one thing they're best at: holding the whole system at once.
06
Curate the context, don't polish the prompt.
The model is only as good as what's in the window. Cut every token that isn't a fact, an example, or a constraint.
07
Receipts, not claims.
Every “it works” is a test run, an eval row, or a git log line. Vibes don't ship.
08
Subagent review = 2 stages.
Finder + adversarial verifier. One agent gets confidently wrong; two disagreeing agents force the truth.
09
Manager mode: 5 → 1.
One operator orchestrating 5 parallel agents ships what used to need a team. Lanes, async, documented handoffs.
10
Player-coach.
Ship code and review agents in the same day. The skill is reading whether today needs hands-on-keyboard or orchestration.

Fit

Who this is for

✓ Pre-seed – Series A
✓ No in-house AI team
✓ Regulated or proprietary data
✓ Wants to own the code

— Needs a 24/7 support contract
— Wants commodity per-hour body-shop pricing
— No clear data or problem yet

Who you're hiring

You work directly with the engineer who builds it

Teo Deleanu

I'm Teo. I've shipped production software for 15+ years — lead engineer at Comodo, MiOS, and forfone, then nine years owning an IoT platform's cloud infrastructure, and production AI on contract since 2025. On a sprint you work directly with me: I scope it, build it, and own the result.

LinkedIn ↗X ↗

Pricing

Fixed-fee sprints — no hourly billing

$20k flat, per 4-week build sprint

✓ One flat fee for the sprint — you know the number before we start
✓ Scope, timeline, and deliverables agreed in the scoping week
✓ Code, ADRs, eval CI, and the specialized skills transferred to you — no lock-in, no retainer

Most seed and Series-A builds fit one sprint. Larger scopes get a fixed multi-sprint quote on the free scoping call.

FAQ

Questions founders ask

How much does it cost?

A flat $20k per 4-week build sprint. No hourly billing, no range — you know the number before the build starts. Larger scopes get a fixed multi-sprint quote on the free scoping call.

What if you miss the 4-week timeline?

That's what the scoping week is for. We agree the deliverable up front, and if something slips we cut scope, not quality. You ship something that works.

Is my data safe? Can it stay in my cloud?

Yes. For regulated or proprietary data we run embeddings and inference inside your own AWS account (Bedrock / OpenSearch). Write actions are scoped, confirmed, and audit-logged.

Do I own the code?

Completely. You get the code, ADRs, a chunked implementation spec, and eval CI — transferred to you. No lock-in, no mandatory retainer.

What do you need from me to start?

A real problem, access to (or a sample of) your data, and one scoping call. Week one is discovery + spec; then a 4-week build sprint to production.

Can I see proof before I commit?

Yes. There are live demos, the numbers above, and a working POC I walk you through on the call. The full brief with NDA-gated receipts is available too.

Book a 30-min scoping call Email us