Model selection assistant

Supporting confident STT model selections through explainable interfaces

Supporting confident STT model selections through explainable interfaces

Supporting confident STT model selection through explainable interfaces

This project explored how UX design and explainable AI could better support language analysts as their workflows expanded from relying on a single default model to navigating multiple available models. As options increased, analysts struggled to understand tradeoffs, build trust, and make confident decisions due to limited explanation and comparison support. A research-driven, tiered model comparison experience structured insights by depth, enabling quick scanning, meaningful comparison, and deeper exploration only when needed. The result was a scalable interaction approach that reduced cognitive load while aligning with real analyst workflows.

This project explored how UX design and explainable AI could better support language analysts as their workflows expanded from relying on a single default model to navigating multiple available models. As options increased, analysts struggled to understand tradeoffs, build trust, and make confident decisions due to limited explanation and comparison support. A research-driven, tiered model comparison experience structured insights by depth, enabling quick scanning, meaningful comparison, and deeper exploration only when needed. The result was a scalable interaction approach that reduced cognitive load while aligning with real analyst workflows.

Client:

Client:

LAS

LAS

Duration:

Duration:

Jan - Dec 2025

Jan - Dec 2025

Role:

Role:

End-to-end Design & Research

End-to-end Design & Research

Team:

Team:

2 PI's and 3 Research Assistants

2 PI's and 3 Research Assistants

background

Designing within a research-driven analytical environment

The Laboratory for Analytical Sciences (LAS) is a research organization that partners across government, industry, and academia to explore how data, analytics, and emerging technologies can better support human decision-making. Projects at LAS often investigate future-facing scenarios, using research and design to understand how evolving tools could change analyst workflows.

At the time of this project, analysts typically worked with a single speech-to-text (STT) model, limiting their ability to adapt model selection to different audio conditions. This project explored a speculative yet realistic future in which analysts have access to multiple STT models and must select the most suitable one for their specific audio files. Designing for this expanded decision space introduced new challenges around transparency, trust, and confidence.

The Laboratory for Analytical Sciences (LAS) is a research organization that partners across government, industry, and academia to explore how data, analytics, and emerging technologies can better support human decision-making. Projects at LAS often investigate future-facing scenarios, using research and design to understand how evolving tools could change analyst workflows.

At the time of this project, analysts typically worked with a single speech-to-text (STT) model, limiting their ability to adapt model selection to different audio conditions. This project explored a speculative yet realistic future in which analysts have access to multiple STT models and must select the most suitable one for their specific audio files. Designing for this expanded decision space introduced new challenges around transparency, trust, and confidence.

the problem

When choice increases, confidence can decrease

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.


Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

How might an explainable user interface (XUI) help analysts assess STT model recommendations and calibrate trust without adding cognitive overload?

How might an explainable user interface (XUI) help analysts assess STT model recommendations and calibrate trust without adding cognitive overload?

design process overview

A research-driven approach to supporting analyst decision-making

This project followed an iterative design thinking process that balanced human-centered research with the technical complexity of explainable AI systems. Rather than treating model selection as a purely algorithmic problem, the process emphasized understanding analyst workflows, decision pressures, and trust-building behaviors before moving into solution design.

Each phase built intentionally on the last. Early research focused on understanding how analysts currently work and how future access to multiple STT models could change their decision-making. Insights from this phase informed clear problem framing and prioritization, which then guided concept exploration, prototyping, and testing. Throughout the process, designs were continuously refined to reduce cognitive load, surface meaningful signals, and support confidence without oversimplifying complexity.

The result was a layered, explainable interface concept shaped through research, iteration, and feedback designed to adapt to varying analyst needs while remaining transparent, flexible, and grounded in real-world use cases.

This project followed an iterative design thinking process that balanced human-centered research with the technical complexity of explainable AI systems. Rather than treating model selection as a purely algorithmic problem, the process emphasized understanding analyst workflows, decision pressures, and trust-building behaviors before moving into solution design.

Each phase built intentionally on the last. Early research focused on understanding how analysts currently work and how future access to multiple STT models could change their decision-making. Insights from this phase informed clear problem framing and prioritization, which then guided concept exploration, prototyping, and testing. Throughout the process, designs were continuously refined to reduce cognitive load, surface meaningful signals, and support confidence without oversimplifying complexity.

The result was a layered, explainable interface concept shaped through research, iteration, and feedback designed to adapt to varying analyst needs while remaining transparent, flexible, and grounded in real-world use cases.

Research

Uncovering analysts workflows and constraints

To inform the design, we conducted research across three areas: analyst workflows, existing speech-to-text tools, and explainable AI within human–AI collaboration. Together, this research helped us understand how analysts make decisions today, where current STT systems fall short, and how explanation plays a role in building confidence and accountability when working with AI-driven recommendations.

A review of existing STT platforms showed that many tools prioritize performance metrics and outputs, offering limited support for understanding why a model is recommended for a given audio scenario. At the same time, research in explainable AI emphasized that trust is not built through accuracy alone, but through reasoning that users can access, question, and stand behind, especially in high-stakes analytical work.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Key finding: layered explanations explore trust, transcaprecy, and adaptability

One of the most important insights from this research was the value of layered explanations. Instead of presenting all system logic at once, explanations can be structured to allow analysts to progressively access deeper levels of detail about a recommendation.

In this framework, high-level signals communicate what the system recommends, additional context explains why a model is appropriate for a specific audio scenario, and deeper layers reveal how the system arrived at that recommendation. This approach supports both quick decision-making and deeper investigation, giving analysts control over how much information they engage with while maintaining transparency and trust.

This insight became a foundational principle for the rest of the project and directly informed how model recommendations, confidence signals, and interaction patterns were later designed.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

reframing the problem

The analyst persona and journey

Synthesizing the analyst persona and end-to-end journey helped clarify where friction, uncertainty, and decision pressure occur when working with speech-to-text systems. While access to multiple STT models introduces flexibility, it also raises new questions around trust, usability, and interpretation, especially for analysts who are not machine learning experts.

To focus the design space and guide exploration, we reframed these insights into a set of How Might We questions that capture the core challenges of model selection, explainability, and analyst confidence.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

􀀻

How might we design an audio playback system that supports access to multiple STT models while helping analysts confidently determine which model is most useful for a given use case?

􀀽

How might visual and interpretative design approaches improve understanding of machine learning model behavior for non-technical domain experts?

􀀿

How might differences in domain expertise and seniority influence explanation preferences, and how can a system support these varying needs without adding cognitive overload?

exploring early design concepts

Low fidelity concepts grounded in research

Based on research insights, we explored three low-fidelity concept directions to understand how multiple STT models and explanations could be introduced within an existing audio playback system. All concepts were intentionally designed on top of a provided screen from the current analyst workflow tool, allowing us to explore new interactions while staying grounded in a familiar interface. These early concepts helped us evaluate tradeoffs between clarity, depth, and cognitive load.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Concept 1: List-Based Model Recommendations

When users are working with a model, the system automatically suggests alternative models based on performance and community usage data, presented in a concise list format. This approach focuses on quick comparison and supports progressively deeper explanation levels as users engage further.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Concept 2: Visual Comparison

The system visually compares the current model with top alternatives across key performance metrics, allowing users to quickly identify strengths and weaknesses. This concept emphasizes pattern recognition and supports comparison without relying solely on text.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Concept 3: Task-Focused Expansion

Based on the user’s current activity, the system suggests not only better-suited models but also potential next tasks. This approach supports forward-thinking exploration beyond the user’s initial query.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Learning Through Feedback and Iteration

Feedback on the early concepts challenged several assumptions and helped clarify what truly supported analyst decision-making. I initially expected visual comparisons to be the most effective way to evaluate STT models, but stakeholders found these visualizations difficult to interpret and more cognitively demanding than anticipated. While informative, they required additional education and slowed quick decision-making.

In contrast, stakeholders responded strongly to the three-level explanation structure, noting its value in supporting analysts with different levels of expertise. The ability to choose how deeply to engage with a recommendation kept explanations glanceable while still allowing deeper investigation when needed. We also learned that analysts typically work with multiple tabs open rather than a single continuous flow, reinforcing the need to surface recommendations early within the broader workflow.

One of the most positive responses centered on community usage and peer-driven insights. Stakeholders saw this as a powerful way to build trust alongside the system, allowing analysts to learn from how others approached similar audio and model decisions. Rather than replacing individual judgment, community signals were viewed as a collaborative layer that supported confidence, sensemaking, and accountability. Based on this feedback, the list-based and task-focused concepts were favored over the visualization-heavy approach. Moving forward, we used the provided green screen layouts to place these ideas within a more complete system view.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Exploring Mid-Fidelity Design Directions

Moving forward, we transitioned into mid-fidelity designs that aligned with the visual system and interaction patterns established by the provided green screens. This allowed us to move beyond isolated concepts and explore how model recommendations, explanations, and collaboration signals could function as part of a more complete system. At this stage, we explored two distinct directions that reflected different balances between automation and analyst control.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Direction 1: System-Initiated Recommendations

In this direction, the system detects key audio characteristics based on the selected audio cut and automatically generates a ranked list of STT model recommendations. Analysts are presented with glanceable Level 1 explanations that combine system recommendations and community usage signals to support quick decision-making.

To go deeper, analysts can preview a model’s transcript before fully committing, then access additional explanation layers once they enter the model view. This approach emphasizes system initiative and efficiency, supporting fast workflows while still allowing deeper validation when needed.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Direction 2: Analyst-Driven Model Selection

This direction emphasizes analyst agency by allowing users to explicitly select which audio characteristics best match their current audio selection. Based on these inputs, the system returns a tailored set of STT model recommendations presented as comparable model cards.

Community notes and analyst feedback are surfaced as trust signals alongside each recommendation, helping analysts validate system suggestions through real-world usage context. This approach supports more deliberate exploration and deeper understanding of model behavior while keeping explanations structured and accessible.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Testing Mid-Fidelity Concepts With A survey

Understanding explanation preferences, trust calibration, and collaboration

To evaluate the two mid-fidelity directions, we conducted a survey with 28 participants spanning a range of analyst seniority levels and familiarity with AI systems. Participants included junior, intermediate, and senior analysts with varying exposure to STT and AI-driven tools. The goal of this study was to understand how analysts interpret STT model recommendations, which explanation strategies they rely on when making selections, and how trust is built across different levels of experience.

We then analyzed responses related to usability, confidence, trust, and likelihood of using the system in real workflows. We found no correlation between seniority level and an analyst’s ability to understand the system or make appropriate model selections. Instead, trust and confidence varied based on individual familiarity with AI. Across experience levels, participants consistently valued the three-level explanation structure, which supported both quick decision-making and deeper investigation when needed.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Open-ended responses revealed several strong themes. Analysts across all seniority levels emphasized the importance of community usage and peer-driven insights as trust signals, often viewing them as more influential than system-generated recommendations alone. Community feedback was seen as a way to validate decisions, learn from peers, and feel more confident standing behind model selections.

Participants also highlighted the importance of workflow efficiency and explanation clarity, including the ability to preview transcripts, jump to keywords, and better understand model limitations. Together, these insights reinforced the need for model selection tools that balance transparency, flexibility, and collaboration.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Senior Analyst 

[…] if there is user feedback to crowdsource [….] wouldn't that ALWAYS trump the system suggestions? Users are going to be more influenced by other users rather than a nebulous “system suggestion…”

—Calibrating Trust

Intermediate Analyst

The ability to immediately jump to keywords and have a provided STT transcript is an incredibly valuable tool that would speed up my workflow considerably [….] to filter through audio much more quickly than I am able to now.

— Improving Tools

Senior Analyst 

Model description & tags are helpful but don't explain how audio is processed [….] would be helpful to have examples of original vs processed clip & what the model is NOT good at [….] too many options can make the selection difficult.

— Explanation Needs

high-Fidelity design solution

A layered, explainable system shaped by analyst feedback

Insights from the survey directly informed the final design direction. Findings confirmed that trust was not tied to seniority, but to how supported analysts felt in understanding and validating model recommendations. Analysts consistently valued adaptability, community insight, and the ability to control explanation depth, which reinforced the need for a layered system that could support both quick decisions and deeper investigation.

The final solution brings these insights together into a high-fidelity system that integrates seamlessly into the existing audio analysis workflow. Rather than forcing a single explanation style, the design supports progressive engagement, allowing analysts to move between glanceable recommendations, contextual reasoning, and technical detail based on their needs, confidence, and task complexity.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Scenario Context

To bring the final system to life, we used a scenario-based approach that follows Riley, a fictional language analyst working at the Nixon Historical Audio Task Force. Riley is piloting a new suite of specialized speech-to-text models optimized for different audio conditions, and her task is to identify conversations related to the Apollo space missions. This context allows the system to be shown in both fast-moving analysis and deeper validation moments within a realistic workflow.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Scenario 1: Level 1 — Glanceable Decision-Making

Prioritizes plain-language, task-specific recommendations.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Scenario 2: Level 2 — Context and Community Insight

Synthesizes technical data with community input.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Scenario 3: Level 3 — Technical Validation

Tailored for technical-oriented users requiring granular metrics (e.g., Word Error Rate, training data).

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Key insight 1

Layered Explanations Support Trust, Adaptability, and Analyst Growth

Trust is built through consistent, transparent interactions that adapt to an analyst’s needs over time. Layered explanations allow analysts to start with glanceable guidance and progressively explore deeper reasoning as their confidence and curiosity grow. By supporting this flexibility, the system reinforces analyst agency while also enabling learning and long-term trust calibration throughout the analyst’s journey.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Key insight 2

Leveraging Social Intelligence to
Calibrate Trust

Leveraging Social Intelligence to Calibrate Trust

Leveraging Social Intelligence to Calibrate Trust

Analysts have consistently shown they rely on peer knowledge to build confidence in complex decisions, especially when system recommendations alone feel abstract. Surfacing community usage, peer-validated performance, and task appropriateness adds a collaborative layer of trust that complements system reasoning rather than replacing it. By embedding social intelligence directly into the workflow, the system helps analysts calibrate trust through shared experience and collective insight.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Key insight 3

Support Adaptability by Aligning With Analyst Workflows

The system is designed to align with existing analyst workflows by allowing a quick preview and comparison of model outputs before a full selection is made. This supports flexible exploration without disrupting momentum or forcing early commitment. By making comparisons glanceable and accessible, the system reinforces adaptability while helping analysts build confidence in their decisions.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

impact / reflection

This project demonstrates how expanding analyst tools from a single-model workflow to a multi-model decision system can significantly improve confidence, flexibility, and decision quality. By introducing layered explanations and community-driven trust signals, the final design reduces cognitive overhead while enabling analysts to make faster, more informed model selections. The system is designed to scale across varying expertise levels, lowering onboarding friction and supporting consistent decision-making in complex, high-stakes environments.

From a business and systems perspective, this work highlights how explainable AI can reduce uncertainty, support accountability, and increase adoption of advanced tools by meeting analysts where they are. Rather than requiring users to become technical experts, the system adapts to their needs while encouraging learning over time, helping analysts grow more confident and capable with continued use. Seeing how small design decisions could meaningfully influence trust, efficiency, and collaboration reshaped how I think about building AI-powered systems that are both responsible and operationally impactful.

As speech-to-text systems continue to evolve, analysts are moving toward a future where multiple STT models may be available rather than relying on a single default option. While this shift creates opportunities for more accurate and context-aware transcriptions, it also introduces a new challenge: how analysts evaluate, compare, and trust model recommendations when the decision space expands.

Without clear explanations, access to multiple models risks increasing cognitive load rather than supporting better decisions. Analysts must understand not only which model is recommended, but why it is appropriate for a specific audio file, especially when working with real-world conditions such as background noise, accents, or multi-speaker recordings. The problem, then, is not simply enabling choice, but designing systems that help analysts confidently assess model performance without overwhelming them.

Thank you for being here!

Created with curiosity, passion, & the occasional coffee ©2025 Amaya Hush

Thank you for being here!

Created with curiosity, passion, & the occasional coffee ©2025 Amaya Hush

Thank you for being here!

Created with curiosity, passion, & the occasional coffee ©2025 Amaya Hush