Context
From personal experience as a clinical data manager, I've observed that data cleaning for clinical trials heavily involves site queries in the clinical trial database. This requires a query to be raised by the DM manually or automatically by the system. It then requires the site to attend to the query — sometimes a response is needed or the data would be updated. In 21% of cases, there is back and forth if the data change or query response is not clear.
As per industry research, a moderate sized trial (200 patients in Phase 3) could have between 3,000–10,000 queries, and each query costs between $30–$70. It also takes a query an average of 52 days to close, which can delay database lock and analysis timelines.
The Market
Medidata Rave · Veeva Vault · Oracle Inform
Static query text + email clarification. Sites dig through lengthy eCRF guidelines.
Missing: real-time explanation of query intentMedidata Detect · IQVIA AI Coding
Anomaly detection and AI-assisted coding for data quality.
Gap: No AI for human-in-the-loop query resolutionGitHub Copilot · MS Copilot for Healthcare
Workflow augmentation tools in regulated environments (dev workflows, clinical notes).
Proof point: Human augmentation is viable even in regulated spacesMarket Trends
- RBQM reducing total query volume → remaining queries increasingly complex
- AI copilots proving workflow augmentation works in regulated industries
The Audience
Persona 1 · Primary
The Site Coordinator "So-Much-To-Do"
- Motivated by
- Resolving data management queries and other data quality issues so their site doesn't have bad metrics or protocol deviations.
- Pain Points
- Lots of documentation everywhere — for labs, data management, devices, pharmacy. Must review a 50+ page eCRF completion guideline to understand a data entry issue and maybe get an understanding of how to resolve it.
- What they hope for
- Immediate assistance when answering queries. Currently they raise issues to the CRA, who reaches out to the DM if they can't solve it — via email, with back and forth, sometimes looping in Clinical Science for complex issues.
Persona 2 · Secondary
The Data Manager "Deep-in-the-Data"
- Motivated by
- Highest quality data and ways to detect and clean data efficiently.
- Pain Points
- Some queries must be manual. Communication with the site is only via the query. Due to large trials and time zone differences, they cannot support at the time the coordinator needs. DMs receive emails with lag in responding, then write detailed emails on how to solve — very time consuming. The same email is often sent by multiple CRAs.
Current Query Resolution Journey
-
1
DM Detects Discrepancy
Manual review by the Data Manager identifies a data discrepancy in the EDC.
-
2
DM Writes Query in EDC
Query is raised in the EDC and a notification is sent to the site.
-
3
Site Coordinator Sees Query
The coordinator sees the query among 20–30+ others in the database.
Pain Point: No context about query intent. -
4
Site Interprets & Responds
Coordinator attempts to understand and respond to the query.
Pain Point: Must cross-reference patient documents + 50-page eCRF manual. -
5
DM Reviews Response
Data Manager reviews the site's response — with a 1–3 day lag.
Pain Point: 21% require re-query clarification. -
6
Re-Query Issued 21% of Cases
A new query is issued for the same case. If the site is unsure, they escalate to the CRA, who escalates to the DM via email.
Pain Point: Site re-works the same case. DM writes repetitive re-queries or lengthy clarification emails. -
7
Final Resolution
Query is eventually closed — but critical queries block analysis timelines.
Pain Point: Critical queries block analysis timelines.
Big Takeaways
Competitive Whitespace
No EDC (Medidata Rave, Veeva Vault) offers real-time query context at response time — only static text + email.
Structural Shift
RBQM reduces total query volume, making efficient resolution of remaining complex queries mission-critical.
User Behaviour
Sites spend time digging through docs; DMs write repetitive clarification emails — both are solvable with contextual AI guidance.
Proven Pattern
Regulated copilots (e.g., MS Healthcare) show that human workflow augmentation is viable — even in highly regulated industries.
Problem
Re-query and Burnout Loop
Sites receive queries with little context, dig through 50-page eCRF guides, and send best-guess responses; DMs then re-query 21% of them, rewriting similar clarifications by email and increasing burnout on both sides.
No Real-Time Query Guidance
Existing EDC systems (Rave, Vault, Inform) only provide static query text and email chains—there is no real-time assistant to explain why a query exists and suggest compliant resolution options at the moment the site is responding.
The Goal
Site Coordinator
Reduced digging through 50-page eCRF completion guidelines; less coordination with CRAs when stuck; faster time to clean query counts in the database.
Data Manager
Less re-querying; less time to review and write queries; fewer repeated emails to CRAs and sites.
Sponsor / CRO / Study
Faster time to close queries; shorter time to database lock; higher quality data, faster.
Feature Prioritization & MVP Definition
| Feature | Reach | Impact | Confidence | Effort | Result |
|---|---|---|---|---|---|
| "Why?" button | 100% | High | High | Med | MVP |
| Chat avatar | 60% | High | Med | High | MVP |
| Analytics | 40% | Med | Med | Med | V2 |
Final Solution — Core Flow
-
1
Site opens query
A "Why?" button appears for site coordinators to select if they choose.
-
2
Popup: Context & Guidance
Query intent + eCRF reference + resolution examples are surfaced in a popup.
-
3
[Optional] Chat
"I'm still confused about how to solve this query."
DM chat: "For this query, it is requesting you to verify the start date of the drug admin as this is the same as the start date from Week X (it is duplicated)."
Prototype of the Virtual DM Assistant embedded directly in the EDC query view.
Launch & GTM Strategy
Pilot approach (3 months):
1 Oncology Phase 3 Maintenance + 1 New Phase 3 Oncology Study
20 High-Performing Sites
Low training burden — chosen to reduce confounding variables in pilot data.
Opt-in Button for 50% of Queries
Channels
- Site training webinars
- EDC nudge when site coordinator logs in after the feature is implemented
Measuring Success
North Star Metric
Re-query rate: 21% → 12%
Leading Indicators
- Button click rate > 60%
- Chat usage > 20%
Lagging Indicators
- Query closure: 52 → 36 days
Counter Metric
- Site response time increase: +10% max
Risks & Tradeoffs
Regulatory: Leading the Site Coordinators
Chat guidance interpreted as data direction.
100% non-prescriptive language ("Common resolutions include..."), full audit trail, regulatory pre-validation.
Tradeoff: Sacrifices some guidance richness for GCP compliance.
AI Bias / Hallucination
Chat gives wrong eCRF/protocol interpretation → data quality worse.
Study-specific document grounding only.
Tradeoff: Conservative guidance vs. comprehensive coverage.
RBQM Query Reduction
Too few queries remain for ROI as RBQM reduces total volume.
Scope to complex queries (protocol interpretation, lab adjudication).
Tradeoff: Smaller market vs. higher impact per query.
Compliance Constraints
Guidance must remain auditable, non-prescriptive, and grounded only in study-specific documentation.
- Use non-prescriptive phrasing (e.g., “Common resolutions include A, B”).
- Restrict grounding to protocol and study documents.
- Avoid leading queries or chats that could bias data entry.
Tradeoff: Slightly less directive guidance in exchange for regulatory safety.
Future Iterations
- Full chat avatar (post-regulatory validation) with clinical science query avatar.
- Cross-study pattern learning (scale beyond single study) for query resolutions.
Closing
Started with a 21% re-query rate wasting site and DM time. The solution delivers contextual guidance at decision time, cutting re-queries by 40% and DB lock by 2 weeks. Unlocks faster, cleaner trials.