B2B marketing · AI division build-out
Rescoping a chatbot from embedded product data to grounded retrieval.
A B2B marketing company had built a conversational product-recommendation widget with product data packed into prompts — accurate enough in small catalogues, fragile at enterprise scale. We moved the product knowledge into a retrieval layer six months ahead of the business case for it. The first regulated-enterprise pilot onboarded on the strength of the demo.
The shift
When a B2B marketing operator first pitched an AI-powered sales funnel in early 2023, the novel capability was that a large language model could hold a multi-turn qualifying conversation about a technical product. The constraint that mattered: the operator's niche was specialty catalogue suppliers, where sales conversations turn fast on product-level detail — composition, grade, regulatory class, application — and the generalist salespeople in the industry's hiring pool weren't trained to carry those conversations. The commercial question was whether a domain-competent conversational AI could do what the generalists couldn't.
The stakes
The operator's wager was commercial, not existential — a marketing agency with healthy fundamentals betting on a new product line as a path to exit-sized valuations. For the operator's clients — specialty-catalogue manufacturers whose products demanded technical conversations — the stakes were real. Leads were going cold at the first technical question. An AI that could qualify on grade, specificity, and application before routing to human sales would close that gap. Nobody in their vendor market was pitching one.
The promise
The destination was a widget-scale product line: a domain-competent conversational AI at the top of a specialty catalogue, capable of qualifying a prospect's requirement in the first three exchanges and routing the lead to a human sales team with context attached. Reusable across catalogues of widely different sizes. Hardened enough for regulated industrial review. Priceable both as an SME subscription and as an enterprise tier. If any of those failed individually the product line wouldn't work; if they all held, the market was substantially empty.
The obstacles
Three obstacles shaped the programme.
Domain knowledge didn't live in the prompt. Specialty-catalogue product data — safety data sheets, PDF catalogues, troubleshooting hierarchies — was encoded in unstructured documents, not in clean spec tables. The initial chatbot held product context in the system prompt; this worked for a six-product pilot and failed as catalogues crossed a few dozen items. Each customer catalogue was structurally different.
The business case assumed a simpler architecture than the problem warranted. The commercial sponsor initially treated retrieval as a narrow feature — useful only for specific troubleshooting steps, not as the underlying architecture of a system that could serve catalogues of any size. Six months separated the technical decision from the commercial team's mental model of it.
Regulated-enterprise clients required guardrails that small-business clients didn't care about. Regulated industrial buyers brought review processes, data-residency questions, and regulatory constraints the specialty-catalogue niche had never surfaced. The guardrail discipline would eventually travel down to every client — but it had to be built first for the enterprise tier.
The work and the outcome
The controversial call came early in what became the 2.0 architecture. Product data moved out of the prompt and into a retrieval layer — vector-indexed documents surfaced at inference time, with sources shown alongside every answer. The decision preceded the business case by six months, and it held against pressure to treat retrieval as an optional feature rather than the new core.
The programme shipped across three product lines from the same team:
- A conversational product-recommendation widget for specialty-catalogue sites, tiered by the depth of domain training each catalogue required.
- A natural-language product-search layer over enterprise e-commerce — ingesting structured and unstructured catalogue data, classifying intent through a purpose-fit model, and returning results through a white-labelled search index.
- A data-enrichment pipeline that extracted structured attributes from unstructured descriptions at catalogue scale, so the search and recommendation layers had filterable facets to work with.
All three were held together by a single prompt-versioning and red-team discipline, tested continuously against an LLM evaluation framework mapped to the public security taxonomies for large-language-model systems.
The first specialty-catalogue customer, on the product-search layer:
“The search is much better than anything we ever had.”
The first enterprise pilot — a regulated multinational — onboarded on the strength of the demo and reported new incoming business directly attributable to the system within months. Specialty-catalogue clients stayed in production across multiple years; one expansion stalled on customer-side organisational change, not on product performance.
The honest version: sustained production use across a multi-year customer book, customer-stated preference over legacy alternatives, one enterprise pilot that would not have onboarded without the retrieval rescope.
The carry-forward
The decisions that earned their keep belong in every subsequent AI programme:
- Grounding product knowledge in retrieval — not in prompts — is the default architecture for any catalogue-scale AI product.
- Fine-tuning is rarely the right lever for accuracy; prompt and retrieval work almost always dominates on a dollars-per-point-of-accuracy basis.
- One LLM per stage beats one LLM for every stage, when accuracy, latency, and cost differ by task.
- Guardrail discipline scales down from regulated-enterprise clients to everyone else, not up.
- Low-code earns its place in internal tools, not on the customer path.
Each of these was a live argument at the time. None is automatic now.