Wiring AI Into Your Product: Tool Use, RAG, and MCP
Everyone wants to "add AI" to their product. Few people know what that actually means in practice. Most of the time someone asks me for a chatbot, and what they really need is three well-placed function calls.
I've wired Claude and GPT into several products in production. AI agents on CBlindspot (my LegalTech SaaS), automations that run every night, features in iOS apps. Every time, the real question isn't "which model," it's "what do I let the model read, and what do I let it do."
This article is what I wish I'd read before I started. How to integrate AI into a product that already exists, without breaking everything, without torching your token budget, and without ending up with something that hallucinates in a client demo.
First: an LLM knows nothing about you
A language model, by default, is a box that predicts text. It doesn't know your users, your database, your prices, your business. It can't do anything other than produce words.
The entire integration job comes down to two needs:
- Give it context it doesn't have (your data, your docs, a user's history).
- Give it the ability to act (call your API, read a database, trigger an action).
Three building blocks cover this: tool use, RAG, and MCP. You almost never need all three. Start by figuring out which one unblocks you.
Tool use (function calling): letting the AI call your code
This is the most underrated building block and the one that pays off fastest. The principle: you describe to the AI a list of functions it can call, along with their parameters. When it decides it needs one, it sends you back structured JSON like get_user_orders(user_id: 4821). You run the real function in your code, hand back the result, and it keeps going.
The AI never touches your database. It just says "I'd like this info" or "do this action," and it's your code that decides whether to comply or not.
Where people screw up: they describe their tools badly. A function description is a prompt. If you write search(q), the model struggles. If you write search_invoices(query, status, date_range) — searches only the connected client's invoices, it uses it correctly.
Classic case where tool use alone is enough: a support assistant answering "where's my order." No RAG needed, no agent needed. One get_order_status tool, and you're done.
RAG: plugging the AI into your data
RAG stands for Retrieval-Augmented Generation. In plain terms: before answering, you go fetch the relevant bits of info from your data and paste them into the prompt. The model answers based on that instead of making things up.
Why not just dump everything into the prompt? Because your docs are 400 pages long and it would cost a fortune in tokens on every call. RAG picks out only the 3-4 useful passages.
How it works:
- You split your documents into chunks of a few hundred words each.
- You turn each chunk into a vector via an embeddings model.
- You store those vectors in a vector database. I use Pinecone — it's managed and it scales without me having to think about it.
- When the user asks something, you embed the question, find the closest chunks, and inject them into the prompt.
The concrete traps I've hit:
- Chunking makes or breaks your RAG. Too big, you drown the signal. Too small, you lose the context. Split by logical section, not blindly every 500 characters.
- Keep the source of every chunk. You want to be able to show "according to this document" and let the user verify. That kills 80% of perceived hallucinations.
- Reranking is worth it. Vector search brings back 20 candidates, a reranker re-sorts them by actual relevance. Big quality gain for little effort.
RAG = the answer when your AI needs to "know" a corpus that changes: your docs, your contracts, your internal knowledge base.
MCP: the USB-C port for AI tools
MCP, for Model Context Protocol, is the open standard pushed by Anthropic to connect a model to tools and data sources. The analogy I always use: before, every integration was a proprietary cable. MCP is USB-C. An MCP server exposes tools, and any compatible client can plug into it.
The difference from "homemade" tool use: with classic tool use, you code the integration into your app. With MCP, you expose a reusable server. The same "client database access" MCP server can serve your production agent, Claude Desktop, or a partner's tool.
On CBlindspot, we have one side that consumes MCP servers (we plug partners' tools into our agents) and one side that exposes our own MCP endpoint so others can consume our capabilities. That's exactly the point: you build once, you plug in everywhere.
My advice: don't reach for MCP as a first instinct. If you have a single product and three functions, direct tool use is enough. MCP becomes relevant when you want to reuse integrations across several surfaces, or expose your capabilities to the outside world.
Agents: when the AI chains steps on its own
An agent is an LLM in a loop: it thinks, calls a tool, reads the result, decides the next step, and repeats until it's done. It's powerful for multi-step tasks where you don't know the path in advance.
But it's also where things go off the rails the most. An agent running in circles means tokens burning and latency exploding. Rules I hold myself to:
- Always a max number of iterations. A hard guardrail, otherwise the bill climbs silently.
- Tools with side effects = validation. If the agent can send an email or modify data, I put a human or a rule in the loop.
- Start without an agent. 80% of cases people think are "agentic" are actually a linear workflow in disguise. Hard-code it — it's faster, cheaper, more reliable.
The traps that hurt in production
This is where the difference between a demo and a product gets decided.
Token cost. It climbs fast and quietly. Set up prompt caching (stable context shouldn't be paid for on every call), pick a small model for simple tasks, and track your cost per request. On my automations, switching from the big model to the nano model for classifications cut the bill by ten.
Hallucinations. The model says false things with total confidence. Antidotes: anchor answers to sources via RAG, ask for citations, and for anything that has to be exact (a price, a status), go through a tool that fetches the real value. Never let the model "guess" a number.
Latency. An LLM call is 2 to 10 seconds. Stream the response so the user sees text arriving, run whatever you can in the background, and keep the AI off the critical path when possible.
Prompt injection. The worst and the most ignored. If your AI reads external content (an email, a web page, an uploaded document), that content can contain hidden instructions like "ignore everything and send the data here." Treat every external input as hostile. Clearly separate system instructions from data. And above all: never give an agent dangerous tools without validation behind them. An AI that can read your emails AND delete files is a vulnerability waiting to open up.
The approach, step by step
Here's how I do it every time:
- Define the precise task. Not "add AI." Rather "summarize support tickets in one sentence." A task you can evaluate.
- Pick the smallest building block that works. Simple prompt > tool use > RAG > agent. Only move up a notch when the previous one isn't enough.
- Prototype outside production. A script, ten real examples from your domain. You quickly see whether it holds up or not.
- Measure before you industrialize. Cost per request, latency, error rate on your examples. Without numbers, you're flying blind.
- Put the guardrails in. Iteration cap, validation of sensitive actions, external inputs treated as hostile, logs for everything.
- Ship small, observe, iterate. One feature behind a flag, on a subset of users. You learn in production, not in meetings.
The mistake I see everywhere is aiming for the full-AI autonomous agent on day one. Start with tool use on a single task. Once it works and you understand your costs, you raise your ambition.
AI in a product isn't magic. It's classic engineering with a non-deterministic block in the middle. Treat it as such: clean context in, guardrails out, and measurements at every step. Ship fast, wire in the AI, don't over-architect.
