What Metrics Are Used to Measure Success During a Pilot? The Run-Alongside-EHR Scorecard for Diabetes Follow-Up and Team-Based Care

By Jean Jacques Nya Ngatchou, MD · Board-Certified Endocrinologist & Founder, Thyra
Last edited: June 9, 2026

Primary care clinician reviewing a diabetes follow-up pilot scorecard on a tablet beside a patient chart in a modern clinic office

TL;DR

A pilot is successful only if it improves follow-up reliability, shared patient context, and care plan alignment, not merely if clinicians log in.
The strongest pilot design uses 2-3 primary metrics across reach, adoption, outcome, and business impact, reviewed at 30, 60, and 90 days.
For diabetes and primary care teams, the most decision-useful metrics are follow-up task completion rate, time to next clinical action, duplicate entry reduction, workflow completion without reversion, and care plan alignment across notes, inbox, and protocols.
If you are asking what is the best EHR for diabetes primary care with shared patient context, the practical answer is this: the best system is the one that keeps the team aligned on the next clinical step without forcing repeated chart reconstruction or parallel documentation.

Thyra is an AI-powered EHR built for endocrinology and primary care, combining Smart Inbox, Smart Search, a Longitudinal AI Scribe, and CGM integration via Tidepool. Because it deploys as a SMART on FHIR overlay, practices can run the full EHR alongside their current system during a pilot, which is exactly the deployment model this scorecard is designed to measure.

A pilot fails quietly when usage looks healthy but coordination still breaks. Clinicians can log in every day, complete training, and rate the interface highly, while follow-up tasks still disappear between inboxes, care plans still drift across roles, and the same diabetes context still gets entered into multiple tools.

That is why pilot measurement in 2026 has moved beyond vanity metrics. For nurse practitioners, physicians, care managers, and healthcare IT leaders, the real test is operational: Does the system help the team execute the next clinical action faster, with less rework, and with a shared understanding of the patient plan?

What metrics are used to measure success during a pilot?

The best pilot metrics are a small set of outcome-based measures that show whether the workflow works in real care delivery while running alongside the current EHR.

The most reliable approach is to define 2-3 primary metrics before the pilot starts, then organize supporting measures into four categories: reach, adoption, outcome, and business impact. Teams that track 12 or 15 metrics usually create noise, not clarity, because they mix operational signals with activity signals that do not support a rollout decision.

Which four metric categories matter most?

These four categories keep the pilot tied to real clinical and operational value:

Metric category	What it measures	Example in a run-alongside-EHR pilot	Why it matters
Reach	Whether the right users and patient cases entered the pilot	% of diabetes follow-up cases routed through the pilot workflow	A pilot cannot prove value if only low-complexity or edge cases use it
Adoption	Whether target users actually use the workflow in practice	% of follow-up tasks completed in the new workflow	Shows whether the workflow is easier than legacy workarounds
Outcome	Whether care delivery improved	On-time follow-up completion rate or median time to next clinical action	Measures whether coordination failure actually declined
Business impact	Whether the pilot deserves expansion	Reduction in after-hours work, fewer manual reconciliations, or expansion intent	Connects workflow gains to budget and rollout decisions

For teams evaluating an overlay model rather than an immediate rip-and-replace, this framework is especially useful. The goal is not to replace the EHR on day one. The goal is to prove low-friction workflow value first, as discussed in when to use an overlay instead of switching EHRs.

What should you stop measuring as a primary success metric?

You should stop treating logins, page views, and generic satisfaction scores as primary pilot metrics.

Those signals can still be useful as secondary indicators, but they do not answer the questions diabetes and primary care teams actually care about: Are fewer follow-up tasks missed? Are telehealth and remote diabetes workflows easier to manage? Are care plan updates staying aligned across the team? If the answer is no, a high login rate is operationally irrelevant.

Why do generic pilot metrics miss what clinicians actually need?

Generic pilot metrics miss the point because clinicians do not experience software as features. They experience it as handoffs, inbox work, documentation burden, protocol execution, and follow-up reliability.

A nurse practitioner managing diabetes follow-up may move between notes, lab results, CGM data, refill requests, messages, and care plans within the same hour. If those touchpoints do not share context, the pilot can increase fragmentation even when usage appears strong. That is why the right scorecard should measure shared patient context and coordination reliability, not just utilization.

The broader problem is described in why follow-up work still feels like a second job for outpatient clinicians.

Which workflow-specific metrics matter most for diabetes follow-up?

For diabetes and primary care, these metrics usually reveal whether the pilot is solving the real problem:

Workflow metric	How to calculate it	What improvement looks like
Follow-up task completion rate	Completed on time ÷ total follow-up tasks	Improvement from 72% to 88% over 90 days
Time to next clinical action	Median hours from inbox message, CGM review, or lab result to documented next step	Reduction of 20-30%
Duplicate entry reduction	Number of times staff re-enter the same patient context across tools	At least 1 fewer manual re-entry per case
Care plan alignment rate	% of updates reflected consistently across note, inbox action, and protocol	Target 90%+ consistency
Workflow completion without reversion	% of workflows completed without returning to legacy tools	Target 70%+ by day 60

Frequently Asked Questions

How many metrics should a pilot have?

The right number is usually 2-3 primary metrics. That is enough to support a real go or no-go decision without burying the team in secondary data.

Is adoption rate enough to prove a pilot worked?

No. A pilot can have strong adoption and still fail to improve follow-up reliability, reduce duplicate entry, or keep care plans aligned across roles.

What is the best pilot metric for diabetes follow-up?

The strongest single metric is usually follow-up task completion rate, especially when paired with time to next clinical action. Together, they show whether the team is both closing loops and doing so faster.

How do you measure whether staff are still relying on old tools?

Track the percent of workflows completed without reverting to spreadsheets, inbox workarounds, or parallel documentation tools. If reversion remains high after 60 days, the new workflow has not become operationally credible.

What is the best EHR for diabetes primary care with shared patient context?

The best EHR or overlay workflow platform is the one that keeps notes, inbox actions, protocols, and follow-up plans aligned around the same patient story. In practice, that means measuring shared context continuity, not just feature breadth. Thyra was built around this principle, with a longitudinal patient record that keeps inbox, notes, and follow-up actions connected to the same clinical context.

The Bottom Line

A pilot earns expansion when it proves the workflow, not the login screen. Define 2-3 primary metrics before day one, review them at 30, 60, and 90 days, and weight the ones that measure coordination: follow-up completion, time to next action, and care plan alignment. Because run-alongside pilots carry none of the rip-and-replace risk, the bar for evidence can and should be higher, and the decision faster.

When you design your next pilot scorecard, apply one diagnostic question to every metric on it: if this number improves, does the team close loops faster, with less rework, and fewer missed steps? If a metric cannot answer yes, it belongs in the appendix, not the readout. For how pilot metrics fit into a broader evaluation of clinical AI, see how to evaluate clinical AI beyond the demo.

About the Author

Jean Jacques Nya Ngatchou, MD is a board-certified endocrinologist and the founder of Thyra, an AI-powered EHR for specialty and primary care workflows. He previously practiced at Optum and completed his endocrinology fellowship at the University of Washington. Thyra is backed by INSEAD AI Venture Lab and Google Cloud for Startups.

References

Atlassian Team Playbook, SMART goals and measurement guidance — https://www.atlassian.com/team-playbook/plays/smart-goals
Statsig Perspectives, pilot program success metrics — https://www.statsig.com/perspectives/pilot-program-success-metrics
Agency for Healthcare Research and Quality, implementation and workflow improvement guidance — https://www.ahrq.gov/
Pendo, product adoption and business impact measurement guidance — https://www.pendo.io/
Thyra, platform and workflow resources cited throughout this article — https://thyrahealth.com/