Case Study

Conversational analytics in practice: building an AI analyst for the entire business

This case study shows how an AI data agent can transform the way a company works with data. By giving every team member an always-on AI analyst, questions no longer wait hours, days, or weeks for answers — turnaround drops from weeks to minutes, and many answers arrive in seconds, cutting time-to-answer by ~90%. Decisions move faster and at lower cost. Together we turned a classic reporting DWH into an AI-ready architecture with a semantic layer, evals, and an optimized NL2SQL pipeline, so teams now get accurate answers to complex questions in seconds — without writing SQL by hand.

Book a free call with our expert View service overview

Impact snapshot

Business impact, delivered fast

A quick look at what changed for the client once the AI-ready architecture, semantic layer, and evals were in place. It turned queued BI requests into on-demand answers for every team, cutting response times from days or weeks to minutes (often 10x+ faster). The full story is below.

Want to explore what this could look like for your team? We can walk you through it.

Book a call Ask a question

100%

Accuracy on simple questions

93%

Complex query accuracy (up from 50%)

65%

LLM token usage reduction

3 days

First audit insights delivered

The Client & the Challenge We Partnered to Solve

Our client is a high-growth, data-driven consumer platform operating across multiple channels, and data has long been a critical asset — from optimizing operations to management-level strategic decisions. As business intelligence demands grew, teams still had to queue for answers, often waiting hours or days, and some questions stretched into weeks. That slowed decisions and raised the cost of insight. Leadership and the data team decided to elevate reporting and began building an in-house AI data agent that could translate natural-language questions directly into SQL.

Value snapshot

Decision velocity, unlocked

With an always-on AI analyst for every team, questions move from weeks to minutes and many answers arrive in seconds. The result is faster decisions without the BI queue.

In practice, we saw 10x+ faster response times and about a 90% reduction in time-to-answer.

10x+

Faster response times

~90%

Time-to-answer reduction

Minutes

Complex answer turnaround

Seconds

Everyday questions answered

The client had a working data warehouse, a clear vision, and the courage to pursue AI analytics ahead of most of the market. But like every pioneer, it hit a challenge that was not yet widely understood: moving from a classic data warehouse to an AI-ready data architecture takes more than just new technology. The agent could generate SQL, but metric calculations were inaccurate and numbers did not match internal dashboards, which led users to stick with the proven methods of reporting and data work.

The client made a smart move — instead of a long period of experimentation, it decided to bring in specialized expertise. That is when QuantumSpring.ai entered as a partner for an AI-first data architecture.

The Core Challenge: Classic DWH Requirements vs. AI Analytics

The client wanted a functional conversational analytics experience that would significantly relieve the BI team, provide people with reliable real-time answers, and speed up analysts’ work. The ambition was an always-on AI analyst for every team so questions no longer waited in a reporting queue, with turnaround measured in minutes instead of days — often an order of magnitude faster. It had a great data warehouse that worked perfectly for dashboards and classic reporting — but AI analytics has entirely different requirements.

Readiness gap

Why classic DWH fell short for AI

Classic reporting worked for dashboards, but the AI agent needed explicit metric definitions, relationships, and business language to avoid ambiguity.

Without that layer, complex questions looped or returned inconsistent answers, especially when multiple metrics were involved.

50%

Baseline complex accuracy

Unified semantic layer

Ambiguous

Metric definitions

Looping

Multi-step questions

While classic reporting only needs well-structured data, an AI agent needs more: detailed metadata, business context, explicit metric definitions, synonyms, and unambiguous relationships between entities. With more complex questions, the agent often got stuck in a loop and eventually returned an “unable to answer” error. It was clear that moving to an AI-first approach required extending the data architecture with a new layer.

The core DWH technology was in place, and the reporting-ready data infrastructure worked — now it had to be prepared for AI. That is where our collaboration began.

Our Approach: a Partnership Built on AI-First Data Expertise

QuantumSpring did not join as a vendor delivering the entire solution, but as a strategic AI & Data partner. In close collaboration with the client’s data team, we focused on what truly determines the success of conversational analytics.

Approach blueprint

A partnership that shipped, not just advised

We embedded with the data team, aligned on business-critical metrics, and iterated quickly. The focus was accuracy, reliability, and scale — not just a demo.

The result was a clear path from audit to production, with measurable quality gains at every step.

3 days

AI readiness audit

Semantic layer

Metric definitions & logic

Evals

Quality guardrails

Weekly syncs

Client team cadence

AI Readiness Audit: 3 Days to First Insights

The first step was clear: we needed to understand exactly what the AI agent required for higher accuracy. We systematically reviewed table and column metadata, naming consistency, metric definitions, table relationships, and—most importantly—how users’ business language mapped to the data structure.

The audit pinpointed specific areas for expanding the data architecture and delivered an immediate, prioritized plan of next steps. Thanks to the audit’s speed, we could move straight into implementation.

The audit also included interviews with business users across key teams. Their experience helped us rank priorities correctly and focus on the metrics and problems that truly hurt in day-to-day work.

Semantic Layer: Built from the Ground Up with the Team

Together with the client’s data team, we built a semantic layer that defines what metrics truly mean in the business context—what they are calculated from, which columns are critical, what the business-language synonyms are, and how the LLM should choose between multiple possible interpretations.

Practical example: Take the “number of orders” metric that every e-commerce company knows. But each business defines it slightly differently. An AI agent without internal knowledge will run a generic calculation:

SELECT COUNT(DISTINCT order_id)
FROM orders

The problem is that this answer ignores client-specific context — should canceled orders be counted? Offline orders? With the semantic layer, the agent generates the correct query:

SELECT COUNT(DISTINCT order_id)
FROM orders
WHERE order_status <> 'canceled'
AND order_stream = 'online'

Technical implementation: We currently built the semantic layer as structured JSON files with metric definitions that the LLM agent searches and then uses to improve answer quality. Each metric has clear parameters — the calculation formula, source tables, columns that determine granularity, and client-specific business rules. Longer term, the plan is to expand toward a RAG architecture and build a broader knowledge base that will allow the agent to handle even more complex questions.

Semantic layer

Business logic the model can trust

We codified how metrics are calculated, which tables and columns are authoritative, and how people phrase questions across teams.

The result is a single source of truth that aligns answers with dashboards and eliminates ambiguity.

Single source

Metric logic

Business rules

Context-aware math

Synonym map

Aligned language

Clarity

Consistent answers

Thanks to this, the AI agent started delivering answers that matched the business reality — not just the model’s general logic. The difference was immediately visible.

Optimizing the NL2SQL Pipeline: Efficiency and Stability

One of the key outcomes was a dramatic drop in the number of generated SQL queries: from 763 queries down to just 270 for the same set of questions. That meant a 65% reduction in token usage, lower load on BigQuery, faster answers, and—most importantly—less room for errors.

Even more important was resolving the looping issue. By optimizing the pipeline and adding the semantic layer, we ensured that even complex questions with multiple metrics are answered consistently.

Pipeline impact

Faster answers, lower costs

We streamlined the NL2SQL flow so the agent needed fewer retries and produced reliable queries faster.

That meant fewer tokens, less warehouse load, and a stable path to scale.

65%

Token usage reduction

763 → 270

Queries per question set

Minutes

Complex answers

Stable

Loop-free results

Evals: A Complete Evaluation Framework

We built a comprehensive eval system that continuously measures SQL safety, answer accuracy, question understanding, correct use of business metrics, and the impact of changes in both models and the database. Thanks to evals, we can quantify improvements and track that quality does not degrade with further changes. Evals became a core part of both development and validation of the agent.

Working Closely with the Client’s Data Team

We delivered the project in close collaboration with the internal data team. Their knowledge of business logic and the data model was critical to success. Together we iteratively defined metrics, aligned interpretations, started expanding metadata, validated answers, and prioritized semantic layer development. This partnership accelerated the project and ensured the outcome matched real business needs.

Results snapshot

Outcomes you can measure

Accuracy improved, answer times collapsed, and costs dropped without sacrificing reliability.

We tracked progress with evals and user feedback at each stage of rollout.

100%

Core KPI question accuracy

93%

Complex analytical questions accuracy (from 50%)

10x+

Faster response times

65%

Token usage reduction

Implementation Speed

First audit insights were available in just three days, allowing immediate work on key areas. From project kickoff to the first production version, it typically takes no more than three weeks — a pace that lets companies test hypotheses quickly and see the value of their investment.

Performance & Efficiency

A 65% reduction in token usage and a drop from 763 to 270 SQL queries for the same set of questions means not only lower operating costs, but also a more stable and faster system. That efficiency is what makes an always-on AI analyst viable at company scale, keeping answers in minutes, not days.

Why this matters for the business: an AI analyst can be a powerful helper, but at scale across the company you have to account for LLM token costs. By optimizing the NL2SQL pipeline, we significantly reduced operating costs and enabled the client to scale without proportional cost growth.

Answer Quality

We evaluated answer quality in two phases — simple questions focused on a single metric and complex analytical questions.

Measurable accuracy improvements were dramatic:

Simple questions (single metric — number of orders, AOV, revenue, different margin views, and other metrics defined in the semantic layer): 100% accuracy
Complex analytical questions: 93% accuracy (up from 50%)

Remaining inaccuracies on more complex questions are driven by the fact that the semantic layer currently covers only a portion of metrics, and we are continuing to expand it. With each newly defined metric, accuracy continues to improve.

Failure elimination: For complex questions, the agent now consistently delivers an answer instead of an “unable to answer” error.

Team Impact

The BI team reclaimed capacity for real analysis instead of routine operational tasks. Managers gained faster access to data insights. With an always-on AI analyst, teams self-serve answers in seconds or minutes instead of waiting days or weeks, shifting decision cycles from multi-day to same-day. And the client’s entire data ecosystem moved toward an AI-first approach.

"Our client was among the first in its market to embrace AI analytics and had the courage to experiment with technology that is not yet common. The collaboration with their data team was excellent — their knowledge of business logic and willingness to iterate were key to the rapid success. This project shows how a reporting-ready data warehouse can quickly transform into an AI-ready architecture."

Radek Duha, QuantumSpring

Three Key Lessons from the Project

Key lessons

The three takeaways that mattered most

Our collaboration confirmed three lessons about how to build conversational analytics the right way.

Evals first

Quality measurement is non-negotiable.Accuracy and correctness must be tracked during development, with each new release, and on every user answer across the company.

AI-ready DWH

Reporting models lack the needed context.LLMs need richer metadata, definitions, and relationships to reason correctly.

Semantic layer

Turns data into consistent answers.It encodes business logic, synonyms, and edge cases so the agent stops guessing.

Next step

Want similar results for your team?

Speak directly with Radek Duha

A short expert call to evaluate your data environment, identify the biggest risks, and see if AI & Data Foundations is the right move.

Clear guidance. Senior expertise. No sales talk.

Book a call with Radek Send a question