Article

Nov 13, 2025

How Apna scaled 75 Lakh AI interview minutes using BlueMachines.ai and ElevenLabs.ai

The Hidden Engineering Behind a “Simple” Mock Interview by the BlueMachines.ai and Apna Engineering Teams

The Hidden Complexity Behind a “Simple” Mock Interview

A mock interview sounds trivial - a question, an answer, a score.

At 60 million users and 10,000 + companies across 30,000 + roles, that illusion collapses.

Interview preparation has long been broken - generic, one-size-fits-all, and disconnected from real hiring experiences.

Apna set out to make interview preparation feel like a real interview, personalised to every role, company, and candidate.

The goal wasn’t to build a course, but to engineer conversation, lifelike timing, empathy, and domain depth — at planetary scale.

To achieve this, Apna partnered with BlueMachines.ai, an enterprise-grade Voice AI orchestration platform, and ElevenLabs.ai, the world leader in hyper-realistic, multilingual voice technology.

Together, they built one of the world’s most advanced AI interview ecosystems, 15 lakh AI interviews, 75 lakh voice minutes, sub-300 ms latency, and one guiding belief:

Simple experiences demand extraordinary engineering.


A mock interview sounds trivial - a question, an answer, a score.

At 60 million users and 10,000 + companies across 30,000 + roles, that illusion collapses.

Interview preparation has long been broken - generic, one-size-fits-all, and disconnected from real hiring experiences.

Apna set out to make interview preparation feel like a real interview, personalised to every role, company, and candidate.

The goal wasn’t to build a course, but to engineer conversation, lifelike timing, empathy, and domain depth — at planetary scale.

To achieve this, Apna partnered with BlueMachines.ai, an enterprise-grade Voice AI orchestration platform, and ElevenLabs.ai, the world leader in hyper-realistic, multilingual voice technology.

Together, they built one of the world’s most advanced AI interview ecosystems, 15 lakh AI interviews, 75 lakh voice minutes, sub-300 ms latency, and one guiding belief:

Simple experiences demand extraordinary engineering.


Engineering for Human Realism at Machine Scale

A mock interview that feels real demands four simultaneous guarantees:

  1. Contextual Accuracy — the AI must know what the role and company expect.

  2. Conversational Flow — no talking over, no dead air, perfect hand-offs.

  3. Emotional Calibration — tone shifts with difficulty, empathy, or stress.

  4. Systemic Scale — thousands of these happening concurrently, anywhere.

BlueMachines.ai’s orchestration stack was designed around one principle:

Latency is the new UX.

If human pause perception begins at ~400 ms, then 399 ms is the frontier.

RAG × Persona Graph — Context is Computation

Every company interviews differently.

A Flipkart PM expects metrics reasoning, an HDFC Credit Officer probes compliance logic, a Zomato Ops Lead checks route optimisation.

Apna and BlueMachines built a Retrieval-Augmented Generation (RAG) graph for each role × company intersection:

  • 10,000 + companies × 50–100 roles = ~500 million micro-models.

  • Each model anchored to company-specific rubrics, tone, and vocabulary.

  • Served via BlueMachines’ hot-cache vector store (< 120 ms fetch).

When a candidate says, “I handled pricing for electronics,” the AI doesn’t nod — it queries pricing heuristics, pulls category metrics, and probes elasticity, all mid-conversation, without breaking flow.

“Every conversation is a distributed query. Each candidate turn fans out across ASR, RAG, and persona logic, all returning before the brain expects a reply.”

-Abhishek Ranjan, CTO, BlueMachines.ai


⁠The BlueMachines Stack
A Technical Marvel of Voice Orchestration

In Apna’s AI Interview Prep, this stack powers every moment — from when a candidate says “Tell me about yourself” to when an AI interviewer responds empathetically with “That’s interesting — could you explain how you handled that project?”.

Each turn triggers real-time orchestration of speech-to-text, language understanding, context retrieval, and voice synthesis, all synchronized seamlessly through BlueMachines.

It’s the same backbone that powers enterprise-grade deployments of BlueMachines.ai.

Under the hood, BlueMachines.ai runs on a six-layer architecture engineered for precision, compliance, and resilience — designed not just to run voice AI, but to orchestrate conversation itself in under 300 milliseconds.


1. Connectivity & Compliance Layer

Regional edge gateways ensure all sessions stay low-latency and compliant with Indian TRAI and RBI norms.

Apna’s voice data never leaves the region — essential for BFSI and education partners that demand full data residency.

2. Speech Processing Layer

Every interview begins with a candidate’s voice — captured, cleaned, and transformed in real time.

The Speech Processing layer converts raw audio into precise text using multilingual ASR capable of understanding English, Hindi, and regional dialects, often mixed in the same sentence (e.g., “Main ne growth 25% tak increase kiya last quarter”).

Key features include:

  • Noise and echo cancellation for crowded environments where even the “fan” noise gets removed to ensure accurate downstream processing.

  • Voice Activity Detection (VAD) to distinguish speech from silence.

  • Adaptive Code-Switching Models for fluid bilingual conversations between Hindi <> English <> Hinglish.

Each spoken phrase reaches the language model in ~100 ms, preserving the natural rhythm of speech.

3. Conversational Intelligence (RAG and Workflow Layer)

Once text is available, BlueMachines routes it into the Conversational Intelligence layer — where meaning, tone, and intent take shape.

This layer blends RAG-based retrieval, Persona logic and User Workflows to decide the next move in the conversation.

For Apna’s AI Interview Prep, this means dynamically generating role-specific and company-specific questions based on a candidate’s resume and the targeted job description.

For instance, a candidate interviewing for a Sales Manager role at HDFC might face questions around conversion ratios, while one preparing for a Data Analyst position at Flipkart gets probed on SQL joins and funnel optimization.

The process is near-instantaneous:

  1. The candidate’s latest response is parsed and classified.

  2. The RAG engine fetches contextual data from a pre-indexed library of 500M role–company patterns.

  3. The LLM ensemble (OpenAI, Anthropic, or in-house tuned models) composes the next question, calibrated for difficulty and tone.

All of this happens in parallel threads — returning output to the Orchestration Core within ~120 ms.

4. Security & Reliability

BlueMachines ensures AES-256 encryption, RBAC controls, and zero-downtime failover.

If an AI node drops mid-interview, session state transfers instantly to a sibling pod with no perceptible pause.

Every turn is logged through the observability mesh for latency, routing accuracy, and contextual drift — all while remaining compliant with ISO 27001 and SOC 2 standards.

5. Orchestration Core — The Heartbeat of the System

The Orchestration Core is where BlueMachines’ engineering truly shines.

It coordinates every millisecond of traffic between the STT, LLM, and TTS pipelines — the three pillars of real-time conversation.

STT (Speech-to-Text)
Receives the candidate’s voice - cleaned through VAD and echo suppression, and streams partial transcripts to the NLU before full sentences are complete. This streaming mode minimizes delay and allows predictive intent modeling.

LLM (Language + Context Layer)
Runs the reasoning logic, retrieves relevant context from Apna’s company-role knowledge graph, and triggers persona-specific question generation. This enables realistic interviewer personas — HR, Hiring Manager, or Technical Lead — each with distinct tone, patience, and follow-up depth.

TTS (Text-to-Speech)
This is where ElevenLabs.ai transforms machine responses into human voice with remarkable fidelity.

ElevenLabs’ multilingual, emotionally nuanced voices are tuned for Indian English, Hindi, and code-mixed speech, allowing the AI interviewer to modulate tone — empathetic when encouraging, firm when challenging.

Each synthesized response begins playback within ~150–180 ms, thanks to ElevenLabs’ low-latency streaming APIs integrated directly into BlueMachines’ orchestration layer.

Combined, these three layers create a continuous conversational loop — fast enough that users forget they’re talking to AI.

“Every millisecond is a design choice. We don’t just monitor latency — we architect it,” says Abhishek Ranjan, CTO, BlueMachines.ai.

6. Continuity & Governance

The final layer ensures explainability and consistency.

Each session is auditable, replayable, and testable through nightly synthetic interviews that catch prompt drift or bias before production.

Every score is logged, replayable, and auditable — meeting enterprise compliance without sacrificing latency.

Persistent memory allows the AI to remember candidate progress and resume from prior sessions, enabling a continuous, personalized learning journey.

Every millisecond is designed, not assumed.

Every voice packet is a proof of intent — that infrastructure can feel human.

Inside the Sub-300 ms Pipeline


At 300ms end-to-end, the system reaches conversational invisibility — where latency ceases to exist and human realism begins.

Every microservice, queue, and cache lives under an SLO of < 50 ms, monitored by per-turn observability meshes. Fail one, and the illusion breaks.

Persona Logic Engine & Turn Detection

BlueMachines’ Persona Engine runs interviewer archetypes on independent logic threads — HR, hiring manager, tech lead — each with unique tolerance, question depth, and speech rhythm.

ElevenLabs synchronizes tone and pacing in real time, producing sub-200 ms voice synthesis streams that match candidate tempo.

“Empathy itself is computed. The system listens for hesitation or over-confidence and modulates warmth or challenge on the next question.”

-Piotr Dabkowski, Co-Founder and CTO, ElevenLabs.ai

Scale & Reliability

  • Thousands of multilingual interviews run concurrently.

  • Micro-agents cold-start < 150 ms, auto-scale per interview.

  • Hot-swap recovery within one dialogue turn.

  • Per-region isolation + end-to-end encryption + RBAC.

“Every millisecond is a design choice. We don’t just monitor latency; we treat it as a product metric.”

-Abhishek Ranjan, CTO, BlueMachines.ai

Impact


“For the first time, candidates can practice interviews that feel truly real — tailored to their résumé, company, and dream role.”

Kartik Narayan, CEO, Apna - Job Marketplace Vertical

AI for Good — Democratising Opportunity

A 24-year-old from Pune said:

“The AI interviewer knew my résumé, switched between Hindi and English, and challenged me like a real HDFC panel.

I cracked the job on my next attempt. It felt like I finally had the same prep chance as everyone else.”

This is the quiet revolution: AI not replacing people, but equalising access — giving millions of job seekers the same preparation once reserved for a privileged few.

Beyond engineering, it’s a story of AI for good — technology that levels the playing field for opportunity.

Engineering Takeaways

  1. Latency = Believability — 300 ms defines human conversation.

  2. Context is Computation — every query is a personalisation event.

  3. Explainability = Trust — auditable AI builds confidence.

  4. Orchestrate, Don’t Monolith — micro-agents scale safely.

  5. Human Experience = SLO — empathy, tone, timing are engineered outputs.

Closing

Apna’s AI Interview Prep looks effortless — a simple Q&A on a screen.

Beneath it runs a sub-300 ms orchestration lattice synchronising speech, reasoning, and emotion across millions of voices.

That’s what modern engineering looks like:

Human-level realism, delivered at planetary scale.

© All right reserved

© All right reserved