We rewrote the search on prosystem.com.bd a few weeks ago. The box looks the same, but underneath it's two things now: a live typeahead that resolves pages in under 50 ms, and an AI agent that answers open-ended questions with inline citations. This is a short write-up of how it's put together, what it costs, and what we'd choose differently if we did it again.

Two tiers, not one

Most visitors to a site's search bar aren't asking a question. They're navigating — "bms", "pricing", "case studies" — and they want the right page in the next keystroke. We didn't want every such query to round-trip through a model.

So the search panel runs a client-side index as the user types. A MiniSearch instance holds every page, blog post, case study, client company, job listing and FAQ anchor on the site — flattened into a few dozen searchable documents — and renders grouped results under the box in real time. Typing "how long does a rollout take" surfaces the matching BMS FAQ; typing a client's name surfaces the case study; an empty query shows a curated "popular" list.

The AI layer kicks in only when the user hits Enter. At that point the panel switches from a list of page matches into a streaming conversation, and the question goes to the server.

The index, built once

Every content source in the repo — posts.js, case-studies.js, companies.js, jobs.js, plus the static landing pages — exports its own content module. A small server-side layer imports them all at boot, flattens each FAQ into its own deep-linked document, and hands the result to a MiniSearch index with weighted fields (title 3×, summary 2×, tags 2×, keywords 1.5×).

That same index is serialised into a slim JSON payload for the browser. The client build carries ids, titles, summaries, URLs and tags, but not full bodies. The server keeps the bodies and exposes them through a tool the agent can call when it needs to quote or reason across detail. One index, two views.

“One index, two views.”

An agent, not a mega-prompt

The temptation with any AI-on-a-website feature is to stuff the whole corpus into the system prompt. We didn't. The system prompt is short — a site map, a block of company facts (legal name, offices, hours, emails, registrations, leadership, socials) and a four-line tool-use policy. Everything else the agent needs, it fetches.

Three tools hang off it. search_site runs MiniSearch over the server-side index and returns a ranked list. get_document returns the full body of one doc by id, so the model can quote or cross-reference. web_search is a thin fetch around Tavily, used only when the question is clearly off-site (industry trends, competitor info, current events) or when the first tool came back thin.

The agent calls search_site first, escalates to get_document if the summaries aren't enough, and only reaches for web_search when the question genuinely isn't about us.

Because the company-facts block lives in the system prompt, any identity or contact question — "what's the CEO's name", "what are your office hours", "how do I email support" — is answered without spending a tool call. Everything else does the full search dance.

Three defenses before the model sees a thing

A public AI endpoint is a free lunch someone will take. Before a request ever reaches the model, it passes three checks.

First, a Sec-Fetch-Site / Origin / Referer same-origin test. If the browser claims this request isn't coming from our own page, it's rejected at the door. Second, an HMAC-signed session cookie — minted on first page load, re-verified on every /api/search hit. A cookie forged on the client won't verify. Third, a per-IP sliding-window rate limit — 30 requests per 10 minutes, with separate buckets for /api/search and /api/message-assist so the contact-form helper doesn't steal the search budget.

None of these are impervious on their own. Together they raise the cost of abuse well above what this endpoint is worth.

Streaming, citations, state machine

The route handler streams the response with streamText and toUIMessageStreamResponse, and the client renders it through streamdown so the markdown arrives progressively. The model is told to cite inline with relative URLs — [BMS launch FAQ](/bms#faq-1) — and external sources with full origins, and to lead with the answer rather than narrate the search.

The search panel itself is a small state machine. Empty input shows popular items. Non-empty input renders live typeahead with no API calls. Submitting swaps the list for the streaming conversation; a "← View results" link swings back to the typeahead without losing scroll position. Follow-ups accumulate in the same thread for the session.

What we'd do differently

Two things, if we started over. First, we'd separate content indexing from the Next build. Right now each deploy rebuilds the MiniSearch index at boot; a separate worker that re-indexes on content changes would remove a small warm-up cost for the first user of the day.

Second, we'd add light observability — tool latency, token counts, cache status — from day one. We bolted it on later, when an interesting answer came back slower than expected and we had no way to see which tool call was slow.

The rest has held up. The typeahead is free, the AI path is cheap, abuse hasn't materialised, and most importantly the search does what a search is supposed to do: the next thing you see is almost always useful.

Putting an agent behind our own search bar

Two tiers, not one

The index, built once

An agent, not a mega-prompt

Three defenses before the model sees a thing

Streaming, citations, state machine

What we'd do differently

Agentic RAG, in practice

Making inventory feel like a conversation

Why we build both ends in-house

Start a project, request a demo, or just say hello.