Virtual DM Assistant — Shivani Kadakia

1 Discover

Context

From personal experience as a clinical data manager, I've observed that data cleaning for clinical trials heavily involves site queries in the clinical trial database. This requires a query to be raised by the DM manually or automatically by the system. It then requires the site to attend to the query — sometimes a response is needed or the data would be updated. In 21% of cases, there is back and forth if the data change or query response is not clear.

As per industry research, a moderate sized trial (200 patients in Phase 3) could have between 3,000–10,000 queries, and each query costs between $30–$70. It also takes a query an average of 52 days to close, which can delay database lock and analysis timelines.

The Market

Direct EDC Query Workflows

Medidata Rave · Veeva Vault · Oracle Inform

Static query text + email clarification. Sites dig through lengthy eCRF guidelines.

Missing: real-time explanation of query intent

Adjacent AI in Clinical Data (Cleaning)

Medidata Detect Â· IQVIA AI Coding

Anomaly detection and AI-assisted coding for data quality.

Gap: No AI for human-in-the-loop query resolution

Indirect Regulated AI Copilots

GitHub Copilot · MS Copilot for Healthcare

Workflow augmentation tools in regulated environments (dev workflows, clinical notes).

Proof point: Human augmentation is viable even in regulated spaces

Market Trends

RBQM reducing total query volume → remaining queries increasingly complex
AI copilots proving workflow augmentation works in regulated industries

The Audience

Persona 1 · Primary

The Site Coordinator "So-Much-To-Do"

Motivated by: Resolving data management queries and other data quality issues so their site doesn't have bad metrics or protocol deviations.
Pain Points: Lots of documentation everywhere — for labs, data management, devices, pharmacy. Must review a 50+ page eCRF completion guideline to understand a data entry issue and maybe get an understanding of how to resolve it.
What they hope for: Immediate assistance when answering queries. Currently they raise issues to the CRA, who reaches out to the DM if they can't solve it — via email, with back and forth, sometimes looping in Clinical Science for complex issues.

Persona 2 · Secondary

The Data Manager "Deep-in-the-Data"

Motivated by: Highest quality data and ways to detect and clean data efficiently.
Pain Points: Some queries must be manual. Communication with the site is only via the query. Due to large trials and time zone differences, they cannot support at the time the coordinator needs. DMs receive emails with lag in responding, then write detailed emails on how to solve — very time consuming. The same email is often sent by multiple CRAs.

2 Define

Current Query Resolution Journey

1

DM Detects Discrepancy

Manual review by the Data Manager identifies a data discrepancy in the EDC.
2

DM Writes Query in EDC

Query is raised in the EDC and a notification is sent to the site.
3

Site Coordinator Sees Query

The coordinator sees the query among 20–30+ others in the database.

Pain Point: No context about query intent.
4

Site Interprets & Responds

Coordinator attempts to understand and respond to the query.

Pain Point: Must cross-reference patient documents + 50-page eCRF manual.
5

DM Reviews Response

Data Manager reviews the site's response — with a 1–3 day lag.

Pain Point: 21% require re-query clarification.
6

Re-Query Issued 21% of Cases

A new query is issued for the same case. If the site is unsure, they escalate to the CRA, who escalates to the DM via email.

Pain Point: Site re-works the same case. DM writes repetitive re-queries or lengthy clarification emails.
7

Final Resolution

Query is eventually closed — but critical queries block analysis timelines.

Pain Point: Critical queries block analysis timelines.

Big Takeaways

Competitive Whitespace

No EDC (Medidata Rave, Veeva Vault) offers real-time query context at response time — only static text + email.

Structural Shift

RBQM reduces total query volume, making efficient resolution of remaining complex queries mission-critical.

User Behaviour

Sites spend time digging through docs; DMs write repetitive clarification emails — both are solvable with contextual AI guidance.

Proven Pattern

Regulated copilots (e.g., MS Healthcare) show that human workflow augmentation is viable — even in highly regulated industries.

Problem

01

Re-query and Burnout Loop

Sites receive queries with little context, dig through 50-page eCRF guides, and send best-guess responses; DMs then re-query 21% of them, rewriting similar clarifications by email and increasing burnout on both sides.

02

No Real-Time Query Guidance

Existing EDC systems (Rave, Vault, Inform) only provide static query text and email chains—there is no real-time assistant to explain why a query exists and suggest compliant resolution options at the moment the site is responding.

The Goal

User Success

Site Coordinator

Reduced digging through 50-page eCRF completion guidelines; less coordination with CRAs when stuck; faster time to clean query counts in the database.

Data Manager

Less re-querying; less time to review and write queries; fewer repeated emails to CRAs and sites.

Business Success

Sponsor / CRO / Study

Faster time to close queries; shorter time to database lock; higher quality data, faster.

3 Develop

Feature Prioritization & MVP Definition

Feature	Reach	Impact	Confidence	Effort	Result
"Why?" button	100%	High	High	Med	MVP
Chat avatar	60%	High	Med	High	MVP
Analytics	40%	Med	Med	Med	V2

Final Solution — Core Flow

1

Site opens query

A "Why?" button appears for site coordinators to select if they choose.
2

Popup: Context & Guidance

Query intent + eCRF reference + resolution examples are surfaced in a popup.
3

[Optional] Chat

"I'm still confused about how to solve this query."
DM chat: "For this query, it is requesting you to verify the start date of the drug admin as this is the same as the start date from Week X (it is duplicated)."

Prototype of the Virtual DM Assistant embedded directly in the EDC query view.

4 Deliver

Launch & GTM Strategy

Pilot approach (3 months):

Study Selection

1 Oncology Phase 3 Maintenance + 1 New Phase 3 Oncology Study

Sites

20 High-Performing Sites

Low training burden — chosen to reduce confounding variables in pilot data.

Rollout

Opt-in Button for 50% of Queries

Channels

Site training webinars
EDC nudge when site coordinator logs in after the feature is implemented

Measuring Success

North Star Metric

Re-query rate: 21% → 12%

Leading Indicators

Button click rate > 60%
Chat usage > 20%

Lagging Indicators

Query closure: 52 → 36 days

Counter Metric

Site response time increase: +10% max

Risks & Tradeoffs

Regulatory: Leading the Site Coordinators

Chat guidance interpreted as data direction.

Mitigation

100% non-prescriptive language ("Common resolutions include..."), full audit trail, regulatory pre-validation.

Tradeoff: Sacrifices some guidance richness for GCP compliance.

AI Bias / Hallucination

Chat gives wrong eCRF/protocol interpretation → data quality worse.

Mitigation

Study-specific document grounding only.

Tradeoff: Conservative guidance vs. comprehensive coverage.

RBQM Query Reduction

Too few queries remain for ROI as RBQM reduces total volume.

Mitigation

Scope to complex queries (protocol interpretation, lab adjudication).

Tradeoff: Smaller market vs. higher impact per query.

Compliance Constraints

Guidance must remain auditable, non-prescriptive, and grounded only in study-specific documentation.

Mitigation

Use non-prescriptive phrasing (e.g., “Common resolutions include A, B”).
Restrict grounding to protocol and study documents.
Avoid leading queries or chats that could bias data entry.

Tradeoff: Slightly less directive guidance in exchange for regulatory safety.

Future Iterations

Full chat avatar (post-regulatory validation) with clinical science query avatar.
Cross-study pattern learning (scale beyond single study) for query resolutions.

Closing

Started with a 21% re-query rate wasting site and DM time. The solution delivers contextual guidance at decision time, cutting re-queries by 40% and DB lock by 2 weeks. Unlocks faster, cleaner trials.

Context

The Market

The Audience

The Site Coordinator "So-Much-To-Do"

The Data Manager "Deep-in-the-Data"

Current Query Resolution Journey

DM Detects Discrepancy

DM Writes Query in EDC

Site Coordinator Sees Query

Site Interprets & Responds

DM Reviews Response

Re-Query Issued 21% of Cases

Final Resolution

Big Takeaways

Competitive Whitespace

Structural Shift

User Behaviour

Proven Pattern

Problem

Re-query and Burnout Loop

No Real-Time Query Guidance

The Goal

Feature Prioritization & MVP Definition

Final Solution — Core Flow

Launch & GTM Strategy

1 Oncology Phase 3 Maintenance + 1 New Phase 3 Oncology Study

20 High-Performing Sites

Opt-in Button for 50% of Queries

Measuring Success

Risks & Tradeoffs

Regulatory: Leading the Site Coordinators

AI Bias / Hallucination

RBQM Query Reduction

Compliance Constraints

Future Iterations

Closing

References