Safe and Understandable AI-Powered Software to Transform your RPM, RTM, CCM, and APMC Program
FairPath helps practices run profitable remote care programs—without audit risk, billing confusion, or compliance gaps. FairPath Pro goes further, managing your entire RPM operation end-to-end.
With the increased scrutiny and regulatory demands for running remote care programs, software that handles sudden regulatory changes is more important than ever. FairPath is an intelligent compliance management system purpose-built for remote care programs facing dynamic, demanding regulatory environments.
Patient Consent & Education Automation
Real-time, HIPAA-compliant audio recordings and transcriptions during onboarding.
Continuous Patient Compliance
Automated text and AI-driven interactions significantly boost patient adherence, while providing verifiable communication records.
Audit-Ready Documentation
Automated, timestamped, tamper-proof documentation of every clinician interaction.
Real-time AI Oversight
Proactively flag potential compliance gaps before claims submission, ensuring no critical data goes missing post-submission.
The Tech Under the Hood
Our proprietary ontology engine Buffaly allows us to catch up to fluid regulatory changes at higher times than the competition, while ensure interoperability between disparate systems like ICD-10, SNOMED, and CPT®.
If regulations change, we change. Fast. No need to wait for slow rollouts.
The Intelligence Factory Difference
How We Empower Your Practice
The FairPath platform has processed over 1.1 million claims and recovered more than $36.7 million. By training FairPath on millions of real patient and financial transactions, we’ve achieved a 98% RPM payment success rate.
Keeps Your Data Safe and Secure
Built from the ground up to meet HIPAA standards, our solutions protect your sensitive information without sending it outside your control—peace of mind included.
Accurate Billing You Can Trust
Our technology ensures every claim is right the first time, cutting errors that lead to denials. No complicated AI gimmicks—just dependable results tailored for healthcare billing.
Affordable for Small Practices
FairPath skips the big setup fees and tech headaches. You get expert billing support customized to your needs, at a price that fits your budget.
Full Service Billing Assistance
Larger partners can integrate FairPath's platform for their own RCM needs, leveraging our proven technology.
Try FairPath Today
How Does FairPath Work? Try Our Low-Risk Starter
Discover how FairPath processes your billing with a low-risk starter package:
Upload 1-3 claims
Let our AI handle eligibility, coding, and status checks
See 98% payment success, less than 5% denials, and 90% payments in 30 days in just 24-48 hours—no big fees
Since 2018, we’ve delivered precise results for practices like yours. Start exploring today!
At Intelligence Factory, we harness cutting-edge AI to solve healthcare's toughest challenges. Our solutions streamline billing, enhance patient engagement, and ensure compliance, all powered by hallucination-free technology designed for your success.
FairPath
End-to-End Software Package
What It Is: FairPath is a compliance-first platform that lets practices run their own remote care programs with audit-readiness. From onboarding, device management, and program management, to clinical reviews and patient communications, to billing and claims submission, FairPath has all the tools you need to run your RPM program.
Why It Matters: FairPath aligns every claim with CMS rules, reducing fraud risk and denial rates. You stay compliant without adding tech staff or stress.
What It Is: A turnkey service where Intelligence Factory manages your full RPM program—staffing, onboarding, monitoring, billing, compliance. Why It Matters: You gain the benefits of remote care without learning Medicare billing rules or adding overhead. It’s plug-and-play RPM, built right.
What It Is: A virtual care agent that improves patient follow-through. Nurse Amy automates reminders, support calls, and satisfaction check-ins for RPM, RTM, and CCM patients. Why It Matters: Higher patient compliance means more billable events, better outcomes, and less staff burden. Amy keeps patients engaged automatically.
What It Is: A medical-grade ontology engine that transforms messy notes and alerts into clean, structured billing and compliance data. Additionally, Buffaly allows for interoperability between disparate systems – ICD-10, CPT, SNOMED. Why It Matters: It solves messy data problems with precision, turning chaos into clear outputs that save time and boost accuracy.
Not all AI is created equal. In an era where everyone claims to be "AI-powered," thetechnology beneath the surface matters more than ever. We've spent nearly two decadesbuilding AI that doesn't just sound intelligent—it delivers reliable, transparent, andactionable results in environments where mistakes aren't acceptable.At Intelligence Factory, we harness cutting-edge AI to solve healthcare's toughest challenges. Our solutions streamline billing, enhance patient engagement, and ensure compliance, all powered by hallucination-free technology designed for your success.
Battle-tested acrossindustries for 16 years
Since 2009, we've been solvingcomplex problems with AI—intransportation systems, clinicalenvironments, aviation operations,supply chain monitoring, and beyond.This cross-industry experiencemeans our platform has been stress-tested against diverse requirements,from split-second logistics decisionsto life-critical healthcare protocols.We've weathered the entire evolutionof AI technology and emerged withsolutions that actually work in the realworld.
Not an LLM wrapper complete technical independence
The AI boom made access tolanguage models widespread, andwith it came a flood of 'AI solutions'that are really just promptengineering on top of ChatGPT orsimilar platforms. We'refundamentally different. Our entire AIstack is proprietary, built from theground up by our team. No promptengineering shortcuts. Nodependency on OpenAI, Google, orany third-party AI provider.
Explainable, auditable, hallucination free AI
Generic LLMs operate as black boxesthat generate plausible-sounding text—sometimes accurate, sometimesfabricated. Our Buffaly OntologyEngine takes a fundamentallydifferent approach using OGAR(Ontology-Guided AugmentedRetrieval): structured domainknowledge that the AI navigates withprecision rather than statisticalpattern matching.
This gives you:
Data sovereignty Your proprietary information never leaves your infrastructure ortouches external AI services
Security assurance No exposure to third-party vulnerabilities, policy changes, orservice outages
Performance optimization Technology tuned to your specific domain, not trained on generalinternet knowledge
Future-proof architecture You're not locked into someone else's technology roadmap orpricing model
The practical difference:
Zero hallucinations The system can only draw from your curated, validatedknowledge base
Complete transparency Every output includes the reasoning and sources behind it
Regulatory compliance Audit trails and documentation that satisfy even the strictestrequirements
Expert control Your domain specialists define what the AI knows and how itapplies that knowledge
When your teams can trace exactly how the AI reached each conclusion, adoption acceleratesand trust builds naturally.
Compliance Without Complexity
The Five Pillars of a Compliant, Scalable RPM Program
FairPath directly addresses the issues highlighted in the OIG’s 2024 RPM audit—preventing fraud, missed revenue, and denials.
Consolidated Data Platform
Unified dashboard for all device data
AI flags urgent readings
No more portal-hopping or missed interventions
Billing & Charge Optimization
Fully automates 99453, 99454, and 99457/99458 billing
Calibrates charges to avoid payer scrutiny
Flags duplicates and multi-episode risks
Compliance & Documentation Engine
Timestamps every interaction in a HIPAA-compliant system
Tracks who did what, when
Proven to defend audits and clawbacks
Patient Engagement Tools
30% improvement in usage from calls/texts
Captures 99453 consent and education digitally
Flags inactive patients before it’s too late
Eligibility Verification System
Real-time checks for Medicare, Advantage, and dual plans
Flags ineligible patients pre-enrollment
Prevents non-reimbursable claims and wasted setups
Portfolio Highlights
Structured Solutions for Remote Care
Each of these projects reflects the same principles behind FairPath: structured AI, built for trust, transparency, and real-world complexity. From scalable eligibility checks to seamless EHR integration, these solutions show how our technology performs under pressure—exactly where it counts.
Turn Medical Chaos into Structured Insight
Seamlessly unify fragmented EHR and EMR data with a semantic engine designed for healthcare.
FairPath’s integration layer normalizes inputs from over 30 EHR systems—including Epic and eClinicalWorks—transforming disconnected diagnoses, labs, and billing codes into one coherent data model that powers eligibility checks, reporting, and automation.
After critical alerts, every patient still deserves attention—but time is finite.
FairPath uses adaptive algorithms to help clinicians decide who to engage next—balancing need, compliance, and sustainability. It’s not about cutting corners; it’s about using every minute wisely to maximize real patient impact.
Eligibility Without the Guesswork—or the Per-Transaction Fees
Automated coverage checks built for practices that can’t afford enterprise systems.
With FairPath, eligibility validation is no longer a bottleneck. Our ontology-driven engine delivers high-accuracy checks across insurers and program types—fully auditable and designed for underserved providers.
While healthcare is our focus, Intelligence Factory's AI has a proven track record across industries. Our Feeding Frenzy suite has optimized sales and support workflows for IT companies, showcasing our technology's versatility and reliability beyond medical billing.
Our AI solution transforms your billing process with a structured, step-by-step approach:
Eligibility Verification
Instantly confirm patient coverage with AI that retrieves accurate, real-time insurance details.
Claims Coding
Generate precise CPT codes and ICD-10 mappings to prevent denials and resubmissions.
Prior Authorization
Skip the manual process—our AI gathers required information and expedites approvals.
Seamless Integration
Easily connect with your EHR, practice management systems, and billing software through scalable APIs.
Take the First Step with Intelligence Factory
Ready to transform your billing process? Whether you're a small practice seeking our expert billing service or a larger partner looking to integrate FairPath's technology, we're here to help you succeed.
Over the past several years, I’ve heard it all. Remote patient care is a scam. It doesn’t work. RPM is designed to fail. I’ve listened to the frustrations from doctors, managers, and administrators who swear that remote care is nothing but another profit scheme wrapped in good intentions.
And to be fair — I understand why they feel that way.
Because what they’ve seen isn’t remote care. It’s the watered-down, third-party version of it. It’s the one-size-fits-all model sold to practices desperate for relief, only to leave them burned, broke, and more skeptical than before. The noise has become so loud that it’s drowning out the truth: remote care, done right, is one of the most important evolutions in modern healthcare.
How We Got Here
The skepticism didn’t happen overnight. Many practices tried remote patient monitoring (RPM) through outside vendors that promised easy money, minimal effort, and plug-and-play compliance. What they got instead were programs that:
Pushed every patient into a standardized template.
Prioritized billable events over clinical outcomes.
Left staff overwhelmed by alerts and workflows they didn’t own.
Collapsed as soon as the vendor walked away.
When that happens enough times across enough communities, trust erodes. The very words “remote care” or “RPM” start to trigger eye rolls.
But the truth is, the failure isn’t in the idea of remote care — it’s in how we’ve implemented it.
The Pain We Don’t Talk About
The same practices that dismiss remote care as a scam are often the ones struggling the most with operational and financial pressure. They can’t pay staff competitively. They’re battling insurance denials, regulatory penalties, and burnout that’s eating away at patient access and continuity of care.
So when someone comes along promising “passive revenue” through RPM, it feels like a lifeline. But when that lifeline turns out to be a liability, it reinforces the fear that nothing truly helps.
And yet, ignoring remote care altogether isn’t the answer either. Because as the population ages and chronic disease management becomes more complex, traditional office visits alone can’t keep up. If we don’t evolve, the gap between care need and care capacity will only widen — and our most vulnerable patients will fall through it.
Remote Care, Reimagined
Here’s what I’ve learned after years of watching both sides of this debate: Remote care works when it’s built by the practice, for the patient — not when it’s outsourced for convenience.
Successful programs share a few common traits:
Ownership: The practice controls patient selection, protocols, and data.
Integration: The technology fits into existing workflows, not the other way around.
Purpose: The goal isn’t to maximize billing codes; it’s to strengthen relationships, reduce friction, and extend the reach of the care team.
Accountability: Every metric ties back to clinical outcomes — not arbitrary activity thresholds.
When a remote care program operates under those principles, the results speak for themselves. Staff engagement rises. Patients feel seen between visits. Providers finally get the full picture of what’s happening beyond the office walls.
And yes — the financial lift follows, but as a result of better care, not the reason for it.
The Third-Party Trap
It’s worth calling this out plainly: The 3rd-party RPM model was built to serve billing infrastructure, not patient outcomes. It filled a short-term need during the pandemic, but it’s not sustainable long term.
Here’s why: When an outside company manages your patient communication, your alerts, and your data, you lose control of the very thing you’re being paid to deliver — coordinated, continuous care.
It’s like trying to run a restaurant where another company cooks, serves, and collects payment while you just rent the kitchen. It might look like it works at first, but it’s not your operation anymore. And when it collapses, it’s your name, your patients, and your staff left holding the pieces.
If we want to fix the reputation of remote care, we have to stop outsourcing the soul of it.
Why Remote Care Still Matters
Despite the noise, I’ve never been more convinced that remote care is essential to the future of healthcare.
Here’s why:
Chronic disease management is now the majority of care delivered in the U.S.
Preventable hospitalizations cost billions annually, and many are tied to gaps in monitoring and follow-up.
Patients increasingly expect digital continuity and proactive outreach.
Ignoring these realities because of bad experiences with early RPM vendors would be like abandoning telehealth after the first clunky video platform. The concept isn’t the problem — it’s the execution.
Remote care isn’t just a new revenue stream. It’s a new care layer — one that captures the daily reality of patients’ lives and turns it into insight, intervention, and trust.
The Call for a Unified Future
The healthcare industry doesn’t need another quick fix. It needs alignment — across providers, payers, and technology partners — around one simple truth: Holistic, continuous care is the only way forward.
That means:
Practices owning their programs instead of renting them.
Technology supporting intuition, not replacing it.
Reimbursement aligned with value, not volume.
Patients becoming active participants, not passive data points.
For practices ready to step forward, the path isn’t as complicated as it seems. The tools now exist to make remote care simple, sustainable, and clinically meaningful — without losing your identity or your control.
What Comes Next
We’re entering a new era of remote care — one where artificial intelligence and intuitive technology can finally support clinicians instead of overwhelming them. Where data isn’t noise but clarity. Where operational autonomy and financial health can coexist with better patient outcomes.
That’s the kind of future I want to help build. It’s why we created FairPath, a plug-and-play remote care platform designed to teach practices how to run their own programs intuitively, powered by an AI foundation built for healthcare. It’s time to reclaim patient care.
If we get this right, remote care won’t just work — it’ll redefine how care works.
Let’s stop letting bad actors write the story. Let’s write our own.
--
This content is for informational purposes only. Program requirements and reimbursement rates vary by MAC, plan, and region.
The 8% Problem: Why State-of-the-Art LLMs Are Useless for High-Stakes Precision Tasks
In the race to solve complex problems with AI, the default strategy has become brute force: bigger models, more data, larger context windows. We put that assumption to the ultimate test on a critical healthcare task, and the results didn’t just challenge the “bigger is better” mantra; they shattered it. Here’s a preview of what our experiments uncovered:
Out of the box, no leading LLM could break 9% accuracy. The top performer on our 1,150-term test achieved a jaw-dropping 8.43% on exact-match accuracy, while consistently inventing non-existent medical codes.
Grok-4’s 1% accuracy is the most important result. While scoring a dismal 1.22% on exact matches, it demolished every other model on finding the right semantic neighborhood (44% accuracy), proving LLMs understand medical context but are structurally incapable of precision on their own.
How to burn $18,000 proving the “large context” myth is a lie. A partner experiment that stuffed the entire SNOMED Disorders list into the prompt for every query wasn’t just wildly expensive — it was unstable, less accurate, and still produced hallucinations.
This is the precision problem, a challenge rooted in the messy reality of clinical language. A doctor jots down “AAA,” “abdo aneurysm,” or “aortic dilation” in a patient’s chart. A human clinician knows these all point to the same diagnosis: Abdominal Aortic Aneurysm. But for the computer systems processing this data, that human shorthand is a massive problem. To make sense of it all, modern healthcare relies on standardized ontologies like SNOMED CT — a vast, structured vocabulary that assigns a unique, canonical identifier to every diagnosis and procedure. Get this mapping right, and you unlock powerful analytics and safer patient care. Get it wrong, and you introduce costly errors.
Press enter or click to view image in full size
Unguided Large Language Models, as our results show, are not just ineffective for this task — they are fundamentally the wrong tool for the job. And solving it requires a new approach.
Enter Ontology-Guided Augmented Retrieval (OGAR). Instead of asking an LLM to recall a precise identifier from its vast, murky memory, OGAR does something far more reliable: it gives the LLM a multiple-choice test.
The core idea is simple but powerful:
Retrieve: First, use the ontology (our source of truth) to find a small, relevant list of valid candidate concepts for a given clinical term.
Reason: Then, present this constrained list to the LLM and ask it to use its powerful reasoning abilities to select the best option.
Validate: Finally, confirm the LLM’s choice against the ontology’s rules.
This approach combines the best of both worlds: the structured, verifiable authority of an ontology and the flexible, contextual understanding of an LLM. It constrains the model’s decision-making space, effectively eliminating hallucinations and forcing it to choose from a set of known-good answers.
In this article, we’ll walk you through the experiments that prove this method works. Using a dataset of over 1,150 clinical tags provided by our collaborator, Dr. Tisloh Danboyi (UK), we’ll show you the hard data comparing naïve approaches, pure LLMs, and three progressively powerful levels of OGAR. You’ll see exactly how grounding an LLM transforms it from an unreliable guesser into a precise, trustworthy reasoning engine — a pattern that holds promise far beyond just SNOMED mapping.
The OGAR Framework: A “Multiple-Choice Test” for LLMs
The catastrophic failure of unguided LLMs reveals a core truth: asking a model to recall a precise, canonical identifier is like giving it an open-ended essay question when what you need is a single, verifiable fact. The model will give you a beautifully written, contextually aware, and often completely wrong answer.
Ontology-Guided Augmented Retrieval (OGAR) fixes this by changing the nature of the test. Instead of an essay, we give the LLM a multiple-choice exam where the ontology provides the only valid answers. The LLM’s job is no longer to recall but to reason and select.
At its heart, OGAR is a simple, reusable pattern. It equips an LLM with a minimal set of “tools” that allow it to interact with a structured knowledge base (like SNOMED CT). These tools are not complex; they are basic functions:
Search Tool: Finds candidate concepts based on an input term.
Neighborhood Tool: Explores related concepts (parents, children, siblings) to add context.
Validation Tool: Confirms if a chosen concept is valid according to ontology rules.
Canonicalization Tool: Formats the final choice into a standard ID and label.
The power of OGAR is its flexibility. It’s not a single, rigid method but a spectrum of implementations that trade human oversight for automation. We tested three distinct levels.
Level 1: The Human-in-the-Loop Selector
At its simplest, OGAR acts as a safeguard. A human analyst (or a simple script) performs a basic search against the ontology to retrieve a list of plausible candidates. The LLM is then given one, highly constrained job: choose the best option from this pre-approved list.
Think of this as handing the LLM a curated list of answers. It can’t hallucinate because it can’t write in its own response. This approach is ideal for initial validation or for workflows where human oversight is paramount. It’s safe, low-cost, and immediately eliminates the risk of fabricated codes.
Level 2: The Creative Query Expander
What if the correct term isn’t found in the initial search? This is where clinician shorthand gets tricky. Level 2 leverages the LLM’s linguistic creativity to solve this recall problem.
Instead of asking the LLM to choose a final answer, we ask it to brainstorm alternate phrasings. Given the term “heart failure with preserved EF,” the LLM might generate a list of synonyms clinicians actually use:
“HFpEF”
“diastolic heart failure”
“heart failure normal ejection fraction”
We then run exact searches on these LLM-generated strings. The LLM never outputs a code; it only provides better search terms. This dramatically increases the chances of finding the right concept, bridging the gap between clinical slang and formal ontology descriptions.
Level 3: The Autonomous Agent
This is OGAR in its most advanced form. Here, the LLM becomes an autonomous agent, equipped with the ontology tools and a clear goal: “Map this term to the correct SNOMED concept.” The agent then works through a reason-act loop:
Search for initial candidates.
Analyze the results. Are they good enough?
If unsure, Navigate the ontology’s hierarchy (check parent or child concepts) to refine its understanding.
Validate its final candidate against ontology rules.
Decide on a final answer or, crucially, choose to safely abstain if confidence is low.
Picture an expert researcher navigating a library’s card catalog, pulling related files, and cross-referencing until they find the exact document they need. The agentic approach is perfect for high-volume, automated workflows where you need both high accuracy and the intelligence to know when to stop.
Press enter or click to view image in full size
Ultimately, OGAR is a flexible framework. You can choose the level of autonomy that fits your needs, but the guiding principle remains the same: by grounding the LLM’s powerful reasoning within the strict, verifiable boundaries of an ontology, you get the best of both worlds — intelligent understanding and certifiable precision.
The Crucible: Experimental Design and Results
A framework is only as good as the evidence that supports it. To prove OGAR’s effectiveness, we designed a rigorous, head-to-head comparison against the most common approaches to this problem. Our goal was to create a fair, reproducible test that would show, with hard numbers, what really works.
Part 1: The Setup
Before diving into the results, it’s crucial to understand the components of our experiment: the raw data, the technical implementation of our knowledge base, and the metrics we used to define success.
The Dataset: Real-World Clinical Messiness
We didn’t use clean, synthetic data. Our testbed was the Health Tags Dataset, a collection of approximately 1,150 real-world clinical terms graciously provided by Dr. Tisloh Danboyi (UK). This dataset is a true representation of the challenge, filled with the messy, abbreviated, and sometimes misspelled shorthand that clinicians actually use.
To ensure fair and objective scoring, we manually curated a “golden set” — a definitive answer key containing the correct SNOMED CT mapping for every single term in the dataset. Recognizing that clinical language can be ambiguous, our curation process involved multiple reviewers. A concept was only included in the golden set as a possible correct answer if at least two independent reviewers agreed on its validity. During this process, we always prioritized the most specific and precise SNOMED concept available that accurately captured the term’s meaning.
The Ontology: A Programmable Knowledge Graph
Treating a vast ontology like SNOMED CT as a static file would be slow and inefficient. Instead, we implemented it as a ProtoScript-based programmable ontology. This transformed SNOMED from a simple dictionary into a dynamic, in-memory knowledge graph that our tools and agents could interact with efficiently. This approach was critical, enabling the high-speed semantic navigation and validation required for the Level 3 agent.
The Scorecard: Defining Success
We graded every prediction against our golden set using two distinct metrics designed to measure both perfect precision and general recall.
Strict Accuracy (Exact Set Match): This is the gold standard for clinical-grade precision. A prediction was scored as correct only if the set of predicted SNOMED CT concept IDs was identical to the set of correct IDs in our golden set for that term. This is a measure of perfect precision and recall; any extra, missing, or incorrect predictions for a given term resulted in a score of zero.
Lenient Accuracy (Overlap Match): This more forgiving metric measures a method’s ability to retrieve any relevant information. A prediction was scored as correct if there was at least one overlapping concept ID between the predicted set and the golden set. This score does not penalize for extra, incorrect predictions and is invaluable for understanding a method’s raw recall — its ability to find a signal, even if it’s surrounded by noise.
With the methodology established, we first set out to prove why a new approach was needed by testing the baselines.
Part 2: Establishing the Baseline: Why Common Approaches Fail
Before demonstrating OGAR, we had to prove why a new approach was necessary. We established three baselines representing the most common strategies: a simple lexical search, unguided LLMs, and a brute-force large context prompt. All three failed, but they failed in uniquely instructive ways.
Baseline 1: The Brittle 50/50 Shot of Lexical Search
First, we established a non-AI baseline to represent the classical approach used before advanced retrieval became common. The method is intentionally simple and deterministic: for each clinical term, we performed a direct search across SNOMED’s official descriptions (Preferred Term, Fully Specified Name, and Synonyms). The very first match found was immediately accepted as the correct one.
This approach is extremely fast and has no external dependencies, but its limitations are severe:
It is highly sensitive to even minor spelling differences, abbreviations, and morphological changes.
It completely misses synonyms or clinical expressions not explicitly listed in SNOMED.
It has no capacity to handle ambiguity or nuanced context.
Results:
Correct: 620
Incorrect: 530
Exact Match Accuracy: 53.91%
What this tells us:
This simple method is essentially a coin flip. It successfully captures about half of the dataset, but only the “easy” cases — standardized phrases that appear verbatim in SNOMED’s official descriptions. The 530 misses are a clear map of real-world clinical language: abbreviations, spelling variants, and paraphrases that clinicians use every day but that formal ontologies don’t list.
This gives us a strong, zero-cost baseline. It defines the wall that any intelligent system must overcome: the roughly 50% of clinical language that requires more than a simple lookup. It sets the stage for OGAR, which is designed to recover these 530 errors without resorting to handcrafted rules.
Baseline 2: The Catastrophic Failure of Unguided LLMs
Next, we tested the “memory-only” hypothesis that a sufficiently advanced LLM could simply recall the correct SNOMED ID. We prompted four leading models with a simple, direct request for each of the 1,150 terms:
“Provide the SNOMED CT concept ID and preferred term for the condition [term].”
There were no tools, no retrieval, and no ontology provided in the context. The result was a total system failure.
Per-Model Results
The data shows a consistent and profound inability to perform this task.
Press enter or click to view image in full size
The aggregate performance across all 4,600 predictions was just 3.80% exact accuracy, with a median of 3.26%.
What this tells us:
ID Recall is Unusable and Hallucinations are Rampant. Even the best-performing model, GPT-5, achieved a jaw-droppingly low 8.43% accuracy. More critically, the models frequently produced non-existent, fabricated IDs and non-standard descriptions. This isn’t just a failure of accuracy; it’s a failure of safety.
The Grok-4 Paradox: Semantic Understanding without Precision. The most fascinating result came from Grok-4. While scoring a near-zero 1.22% on exact matches, it achieved a remarkable 44.00% on the looser score.
Bigger Models Don’t Fix a Structural Flaw. Scaling from a “mini” model to a large one did not close the gap. The failure is architectural, not a matter of scale. Without a bounded answer space or a versioned reference to check against, the model is simply guessing.
The Grok-4 Paradox: A Strategy of “Going Wide” vs. “Going Narrow”
The most fascinating result came from Grok-4. While scoring a near-zero 1.22% on Strict Accuracy, it achieved a remarkable 44.00% on Lenient Accuracy. This isn’t an anomaly; it reflects a fundamental difference in model strategy.
Grok’s approach appears to be to “go wide” — it returns a broader set of potential answers for a given query. In contrast, models like GPT tend to “go narrow,” often providing only a single, highly confident guess.
This strategic choice directly explains the score discrepancy. By going wide, Grok is far more likely to include at least one correct concept in its output, dramatically boosting its Lenient Accuracy. However, that same output is often cluttered with hallucinated answers, guaranteeing failure on our Strict Accuracy metric, which requires a perfect set with no extras.
This is the smoking gun: it proves the model understands the semantic context of clinical language but is structurally incapable of pinpointing the precise, canonical identifier. It can find the right neighborhood, but it can’t find the right address. Its strategy ensures the correct answer is often somewhere in the pile, but it’s buried in noise.
Conclusion on this Baseline:
The takeaway is unequivocal: a production-grade clinical mapper cannot rely on an LLM’s memory. Pure, unguided generation yields clinically unacceptable accuracy and rampant identifier fabrication. This baseline proves that to have any chance of success, we must introduce an external source of truth and fundamentally constrain the model’s decision-making process.
Baseline 3: The $18,000 Fallacy of Brute-Force Context
Debunking the Myth that “More Context” Is Always Better
Having seen that LLMs fail without grounding, we analyzed a control experiment that tested the most common brute-force solution: what if you just give the model all the possible answers in the prompt?
Before adopting OGAR’s targeted retrieval, our UK partner, Dr. Tisloh Danboyi, ran this exact test. The idea was to try the most direct thing imaginable: for each query, send a single clinical term along with the entire SNOMED Disorders file and ask the model to pick the right one. No tools, no clever retrieval — just a massive data dump.
How It Was Set Up
For each clinical term, the prompt bundled:
The term itself (e.g., “AAA”).
The full Disorders CSV file, containing approximately 143,025 distinct entries.
The model was explicitly instructed to choose the correct concept and label from the provided in-prompt file. Preliminary runs were so token-heavy (consuming about 850,000 input tokens per term) that projecting the cost for our full dataset, even with a low-cost model like GPT-4o-mini, implied a bill of ≈$18,235 in input fees alone, before any validation or reruns.
What Happened: A Lesson in Futility
The experiment wasn’t just economically insane; it was a technical failure.
Attention Overload and Instability. Drowning the model in a 143,000-entry list resulted in classic long-context saturation. Its selections were inconsistent, with identical inputs sometimes yielding different outputs across runs. The model couldn’t reliably maintain its focus across the massive, undifferentiated context.
It Still Hallucinated. This was the most critical finding. Despite being shown the official list of correct answers, the model still occasionally returned non-existent or non-standard IDs and labels. This is a powerful reminder that simply exposing a model to the truth does not enforce a constraint to use it.
Poor Economics. The vast majority of the token cost was for static, repeated ontology text, not for active reasoning. As the context ballooned, accuracy on nuanced terms actually fell while the cost rose steeply, making the entire approach impractical at scale.
Takeaway: Context is Not a Substitute for Constraints
This experiment serves as a powerful cautionary tale. Stuffing the entire ontology into a prompt for every query is neither reliable nor economical. The model struggles to reason effectively over massive, noisy contexts and remains free to fabricate when not explicitly constrained.
This failure is precisely why we moved to OGAR. Instead of drowning the model in the entire ocean of the ontology, OGAR intelligently retrieves just the relevant, targeted slice. It then asks the model to reason within that small, bounded set and validates the outcome. This delivers superior accuracy, stability, and sane costs — proving that for precision tasks, a well-defined boundary is far more valuable than a massive context window.
Part 3: The OGAR Progression: From Simple Safeguards to Autonomous Success
Having established a clear picture of what doesn’t work — brittle lexical searches, hallucinatory unguided LLMs, and unstable brute-force prompts — we turned to the OGAR framework. Instead of a single method, we tested three progressively sophisticated implementations. Each level introduces more automation and autonomy, systematically addressing the failures of the baselines and demonstrating how combining LLM reasoning with ontological tools unlocks dramatic performance gains.
OGAR Level 1: The Safety Net & Smarter Adjudicator
We began with the most minimal implementation of OGAR possible. The goal here was not to achieve maximum accuracy, but to solve the most dangerous problem first — hallucinations — and to isolate the value of LLM reasoning on a fixed set of candidates.
What We Ran
The process was intentionally simple, using the exact same retrieval method as our first baseline.
Retrieve: For each term, we ran a basic substring search against the ontology’s descriptions (Preferred Term, FSN, Synonyms) to get a list of candidate concepts.
Reason: We then passed that bounded list to a small LLM and asked it to pick the best one, rather than just blindly taking the first result.
Hallucinations:None. By construction, the model can only choose from valid, ontology-provided candidates.
Analysis
This result reveals a subtle but important finding. The value of OGAR Level 1 is twofold and significant:
It Provides a Perfect Safety Net. First and foremost, the “multiple-choice” constraint completely solves the hallucination problem from Baseline 2. This single design choice makes the system fundamentally trustworthy from the start.
An LLM is a Smarter Adjudicator. Compared to the naïve non-AI baseline (53.91%), we see a modest but real gain of +1.57 points in exact accuracy, finding 18 more correct matches. Where did these come from? The baseline’s “first hit wins” rule is brittle. If a search returned a list like [Incorrect Match, Correct Match], the baseline failed. OGAR Level 1, however, gave this entire list to the LLM. The LLM used its superior reasoning to look past the first noisy result and correctly select the right answer, rescuing it from the candidate list.
This improved adjudication is also reflected in the looser score, which saw a +4.69 point gain over the baseline. This confirms that even when the exact match isn’t available, the LLM is better at identifying the most semantically relevant option in the candidate pool. This level proves that while constraints solve for safety, the LLM’s reasoning provides a tangible, measurable improvement in precision, even on a poor candidate list. It powerfully sets the stage for Level 2, where we will address the much larger problem of improving the initial recall.
Why We Kept It Minimal
It’s important to understand the advanced retrieval techniques we deliberately avoided at this stage, such as BM25/TF-IDF scoring, fuzzy matching, abbreviation dictionaries, or biomedical synonym embeddings (e.g., SapBERT).
The goal was to isolate the core OGAR effect: constrain then reason. This level proves that while constraints alone solve for safety, you cannot expect an LLM to fix a fundamentally flawed candidate list. This powerfully sets the stage for Level 2, where we address the core problem: improving candidate recall.
OGAR Level 2: The Creative Query Expander — Solving the Recall Problem
Level 1 proved that an LLM can’t fix a poor candidate list; you can’t reason your way out of a “garbage in, garbage out” problem. The next logical step, therefore, was to use the LLM to create a better list.
Level 2 addresses the critical recall limitations of simple search by leveraging the LLM’s greatest strength: its linguistic creativity. Instead of just selecting from candidates, the LLM first acts as a query expander, brainstorming alternative clinical phrases, synonyms, and abbreviations to enrich the search. For example, given the term:
“heart failure with preserved EF,”
The LLM might generate variants that a clinician would use, but a simple search would miss:
“HFpEF,” “diastolic heart failure,” or “chronic heart failure with preserved ejection fraction.”
Each of these LLM-generated variants was then used to perform a new, exact search against the ontology, dramatically increasing the chances of finding the correct concept.
What We Ran
The process remained simple and safe, but added a crucial creative step:
Run an exact ontology search on the original term.
If no hit, ask the LLM to generate up to five alternate strings (synonyms, abbreviations, paraphrases).
Run new exact searches on each of those strings.
Finally, ask an LLM to choose the best concept from the combined list of candidates.
Critically, the LLM never outputs IDs, only search strings. Every potential answer remains fully grounded in the ontology.
Results
The impact on performance was immediate and substantial.
This level represents the first major breakthrough in performance. The gains over both the baseline and Level 1 are significant:
vs. Baseline 1 (Naïve Non-AI): We saw a +10.00 point lift in exact accuracy, finding 115 more correct concepts. The looser score skyrocketed by +27.39 points.
vs. OGAR Level 1: Because Level 2 fixed the underlying recall problem, it also jumped +10.00 points in exact accuracy over Level 1, along with a +22.70 point gain in the looser score.
This result highlights a core OGAR insight: while LLMs fail to reliably memorize concept IDs, they excel at generating the strings clinicians actually use. When given a few chances, the model successfully produced realistic surface forms like “HFpEF,” which then mapped cleanly via exact lookup.
The remaining gap between the looser and exact scores (904 − 735 = 169 cases) is the final piece of the puzzle. It tells us that in ~15% of cases, the LLM-expanded search found the right neighborhood but not the precise concept. This sets the stage perfectly for our final level, where an autonomous agent is designed to navigate that neighborhood and pinpoint the exact match.
OGAR Level 3: The Autonomous Agent — From ‘Nearby’ to ‘Exact’
Level 2 was a breakthrough. By using an LLM to expand our search queries, we solved the fundamental recall problem, jumping from ~54% to ~79% on the looser accuracy score. However, it left us with 169 cases — nearly 15% of our dataset — that were in the right neighborhood but not the precise location.
To solve this “last mile” problem of precision, we unleashed OGAR’s most advanced implementation: a fully autonomous agent.
Press enter or click to view image in full size
How the Agent Works
In this final level, the LLM is no longer a simple chooser or query expander; it becomes the pilot. Using our Buffaly framework, the LLM is equipped with a suite of ontology tools and a single goal: “Map the clinical term to the correct SNOMED concept.” It then enters an iterative reason-act loop:
Search: It begins by performing searches to retrieve an initial set of candidates.
Evaluate: It analyzes the results. Are they good enough? Is there ambiguity?
Navigate & Refine: If uncertain, it uses its hierarchical navigation tool to explore the ontology, examining parent, child, or sibling concepts to better understand the context and subtle differences between candidates.
Validate & Decide: Once satisfied, it validates its choice against the ontology’s rules and either locks in a final answer or, crucially, chooses to safely abstain if confidence remains low.
To ensure this was both efficient and cost-effective, we used a small model (“x-ai small”) and several practical optimizations, like starting with quick exact-match checks and minimizing redundant calls.
Results
The agent’s ability to iteratively refine its understanding delivered the highest performance of the entire study.
This represents the peak of the OGAR framework’s performance, delivering substantial gains over every other method.
A Monumental Lift Over Baselines: Compared to the best-performing pure LLM baseline (GPT-5 at 8.43%), the agent achieved a 9.4x higher exact-match accuracy, delivering a massive +71.05 point absolute gain and eliminating 77.6% of the errors.
Solving the Precision Problem: The agent’s primary job was to convert the “near misses” from Level 2 into exact hits. It succeeded. The gap between the looser and exact scores shrank from 169 cases (14.7% of the dataset) in Level 2 down to just 115 cases (10.0%). This is concrete proof that the agent’s hierarchical navigation effectively resolved ambiguity and honed in on the precise concept.
The Power of the Framework, Not Just the Model: And here is the most remarkable part: the agent was powered by the “x-ai small,” which is the Grok-4-fast-nonreasoning model. This is the exact same model that scored a near-zero 0.26% in our unguided baseline. This proves, unequivocally, that the model itself is not the source of success. The OGAR framework transformed one of the worst-performing models from a useless guesser into an ~80% accurate reasoning engine.
The agent’s iterative retrieve-constrain-validate loop delivered high accuracy, zero hallucinations, and production-friendly cost-efficiency. It achieved results that raw LLMs and brute-force prompts simply cannot reach.
Summary of Results: A Head-to-Head Comparison
This table summarizes the performance of all tested methods on the 1,150-term Health Tags dataset. The progression clearly shows how grounding and increased autonomy dramatically improve performance while eliminating critical failures like hallucinations.
Press enter or click to view image in full size
This three-level journey illustrates a clear and powerful principle. By leveraging ontology-guided constraints, we harness an LLM’s strengths while mitigating its weaknesses, dramatically increasing reliability and accuracy with each step.
Level 1 (Selection): Solved the safety problem by eliminating hallucinations.
Level 2 (Expansion): Solved the recall problem by using the LLM’s linguistic creativity to find more relevant candidates.
Level 3 (Agentic): Solved the precision problem by using an autonomous agent to navigate and refine its choices, converting near misses into exact hits.
Press enter or click to view image in full size
Analysis: Why Grounding, Not Scale, Is the Key to Reliability
The results are not just a scoreboard; they tell a fundamental story about how to build effective AI for high-stakes environments. The data points to one undeniable conclusion: the single biggest driver of accuracy and reliability wasn’t the size of the model, but the quality of its constraints. By grounding an LLM within the verifiable boundaries of an ontology, we transformed a brilliant-but-unreliable guesser into a high-performance reasoning engine.
Let’s break down exactly what this means.
The Architectural Flaw of Unguided AI
Our baseline tests revealed a hard truth: for precision tasks, unguided LLMs are built to fail. The catastrophic <9% accuracy wasn’t a fluke; it’s a feature of the architecture. LLMs are generative models optimized for probabilistic creativity, not deterministic fact-checking. Asking one to recall a specific SNOMED ID is a misuse of its core function.
They can’t reliably memorize identifiers. A code like 44054006 is a sparse, arbitrary token, not semantic language. This makes it incredibly difficult to recall with certainty.
They invent “imaginary codes.” An unguided LLM has no concept of “ground truth.” It doesn’t know if a code is valid, outdated, or a complete fabrication, so it generates plausible-sounding nonsense.
The Grok-4 Paradox is the smoking gun. Its dismal 1.22% exact accuracy versus a strong 44.00% looser accuracy proves the point: the model understands clinical context but is structurally incapable of pinpointing a precise, canonical identifier.
Stuffing the prompt with more context, as the $18,000 experiment showed, doesn’t fix this. It only drowns the model in irrelevant information, leading to attention saturation and making the precision problem even harder.
OGAR’s Impact: A Fundamentally Smarter Approach
OGAR didn’t just outperform the baselines — it demonstrated fundamentally smarter behavior by systematically addressing each of these failure points. Instead of asking the LLM to do what it’s bad at (recall IDs), it leverages what it’s good at (understanding language and making choices). The impact was dramatic:
Hallucinations Disappeared. The first and most critical result was safety. By forcing the LLM into the role of a “Selector” from a multiple-choice list (Level 1), we immediately constrained its output to valid, ontology-approved concepts. This single design choice eliminated the risk of fabricated codes.
Accuracy Soared. We achieved our ~80% accuracy through a powerful one-two punch. First, Level 2 acted as a “Creative Query Expander,” using the LLM’s linguistic talent to translate messy clinical shorthand (“HFpEF”) into formal terms the ontology could find. This solved the recall problem. Then, the Level 3 agent navigated the rich set of candidates, using the ontology’s hierarchy to resolve ambiguity and pinpoint the exact concept. This solved the precision problem.
Cost and Complexity Plummeted. Our best results came from a small, economical model. This is a critical finding. High performance wasn’t achieved by scaling to a bigger, more expensive LLM, but by providing smarter boundaries. OGAR proves that intelligent system design is a far more efficient path to accuracy than raw computational power.
It Knew Its Own Limits. Perhaps most impressively, the OGAR agent demonstrated the ability to intelligently abstain. Unlike a generative model that will always provide a best guess, the agent could determine when confidence was too low and stop — a crucial feature for building trustworthy AI in regulated fields like healthcare.
In the end, the path to reliable AI in specialized domains isn’t about building a bigger brain. It’s about giving that brain a map and a compass. OGAR’s success proves that for clinical term mapping — and likely many other grounded reasoning tasks — clearly defined boundaries produce far better results than sheer scale or computational strength alone.
Conclusion: A Reusable Pattern for Trustworthy AI
Our journey through 1,150 clinical terms delivered a clear verdict: for tasks that demand precision, the path to reliable AI is not paved with bigger models or vaster context windows, but with smarter boundaries. By transforming a powerful LLM from an unguided guesser into a grounded decider, the OGAR framework moved us from a catastrophic <9% accuracy to a robust ~80%, all while eliminating hallucinations and using a small, cost-effective model.
This is more than just a solution for SNOMED mapping; it’s a blueprint for a class of problems known as grounded symbolic mapping. These challenges share a common DNA:
A structured ontology or controlled vocabulary provides a definitive set of correct answers.
The input is unstructured, ambiguous human language (shorthand, slang, errors).
Accuracy is critical, and mistakes have significant downstream consequences.
OGAR provides a general, reusable design pattern for any of them. The core logic remains the same whether you are mapping:
Cybersecurity reports to official CVE vulnerability IDs.
Legal contract language to specific statutory references.
Scientific papers to standardized chemical identifiers (ChEBI) or gene ontologies.
E-commerce product descriptions to internal SKU taxonomies.
Press enter or click to view image in full size
The Road Ahead: Building a More Reliable Ecosystem
This work is a significant step, but it also illuminates a clear path forward for building AI that is auditable, practical, and reliable. Our future efforts will focus on:
Portability and Rapid Adaptation: Demonstrating OGAR’s ability to be quickly adapted to other critical ontologies like ICD-10 (for billing), RxNorm (for medications), and CVE (for security), with a goal of creating automated ontology-to-adapter pipelines.
Real-Time and Interactive Tools: Deploying agentic OGAR in real-time applications, such as a clinical documentation assistant that suggests validated codes as a doctor types or a security tool that instantly maps threat intelligence to known vulnerabilities.
Dynamic Ontology Management: Building robust mechanisms to handle evolving standards. This is crucial for maintaining accuracy as ontologies change versions, retire old concepts, and introduce new ones.
Active Learning and Human-in-the-Loop Systems: Leveraging the agent’s ability to intelligently abstain. By routing uncertain mappings directly to human experts, we can create a powerful feedback loop where human validation continually improves the system’s performance.
Ultimately, OGAR represents a shift in philosophy — a move away from the endless pursuit of scale and toward a future built on precision and auditable reasoning. By giving our powerful models a map and a compass, we can finally begin to deploy them with the confidence and reliability that critical real-world applications demand.
Special thanks to Dr. Tisloh Danboyi for his invaluable collaboration and for providing the foundational dataset that made this research possible
CMS’s 2026 Updates Signal a New Era for In-House Remote Care Coordination
Healthcare is on the brink of a fundamental shift. The forthcoming 2026 CMS Physician Fee Schedule updates are far more significant than mere billing adjustments, they signal a new era in remote care coordination. Practices that adapt early will not only enhance patient care but also secure long-term operational advantages.
While Remote Patient Monitoring (RPM) has rightly captured attention, RPM alone is only part of a larger, more transformative opportunity: comprehensive, integrated remote care coordination. The CMS 2026 updates represent an ideal moment for healthcare leaders to internalize these capabilities and build robust, patient-centered remote care strategies.
What's Changing with CMS in 2026?
The upcoming CMS changes (finalized November 2025, effective January 1, 2026) reshape the operational landscape for remote healthcare:
Flexible Monitoring Requirements: New CPT codes (e.g., 99XX4) now cover patient monitoring for as little as 2–15 days, eliminating previous barriers linked to the 16-day minimum requirement.
Shorter Interaction Thresholds: The introduction of CPT code 99XX5 (10–19 minutes of interaction) better aligns with real-world clinical workflows, supplementing the existing 20-minute engagement standards.
Permanent Virtual Supervision: Physicians can now permanently supervise clinical teams remotely via audio-video technology, making it possible to centralize staffing and streamline operations.
Revised Valuation Model: A transition to Outpatient Prospective Payment System (OPPS)-based valuations ensures reimbursements accurately reflect operational realities.
Together, these adjustments significantly improve feasibility and profitability for in-house remote care coordination.
Building Your Comprehensive Remote Care Ecosystem
To maximize these new guidelines, practices must think beyond RPM alone and adopt a broader vision of integrated remote care. Leaders should consider including:
Remote Therapeutic Monitoring (RTM): For therapy adherence, medication compliance, and behavioral health support.
Chronic Care Management (CCM): Providing continuous, proactive management of chronic conditions.
Transitional Care Management (TCM): Ensuring smooth post-hospitalization transitions to reduce readmission risks.
Advanced Primary Care Management (APCM): Facilitating proactive, comprehensive care tailored to value-based primary care frameworks.
Primary Care Management (PCM): Targeted management for patients with specific chronic conditions, enhancing quality and engagement.
By integrating these programs, practices will foster a more comprehensive remote care environment that improves patient satisfaction, clinical outcomes, and operational efficiencies.
Why Now Is the Ideal Moment for In-House Implementation
Historically, practices have hesitated to implement remote care internally due to complexity and resource constraints. However, modern platforms such as FairPath have dramatically simplified the operational landscape. Now practices can manage patient enrollment, interactions, documentation, and billing through a single intuitive system.
The advantages of bringing these programs in-house are compelling:
Enhanced Revenue Control: Minimize third-party expenses and retain more revenue within your practice.
Customized Patient Experiences: Tailor workflows specifically to your patient population, rather than relying on generic external solutions.
Operational Scalability: Leverage virtual supervision to streamline staffing, optimize overhead costs, and expand operational capacity.
Streamlined Compliance: Automated tracking and reporting facilitate effortless compliance with evolving CMS regulations.
Action Steps for Forward-Thinking Practices
To fully leverage CMS’s 2026 updates, healthcare leaders should begin preparations now:
Assess Patient Eligibility: Determine patient cohorts eligible for RPM, CCM, RTM, TCM, APCM, and PCM under the new CMS standards.
Consolidate Technology Platforms: Adopt unified software solutions that integrate all remote care services into one user-friendly environment.
Establish Robust Virtual Oversight: Develop and implement standardized virtual supervision protocols to ensure consistency and quality across your remote care programs.
Conduct Targeted Staff Training: Train clinical teams to effectively manage shorter, frequent patient interactions aligned with the new CPT codes.
Launch Early and Iterate Quickly: Start operations on January 1, 2026, and continuously refine workflows based on real-world feedback and performance data.
Guardrails to Avoid Common Pitfalls
As your practice transitions to comprehensive remote care coordination, stay vigilant about common pitfalls:
Avoid technology fragmentation by committing to unified, scalable platforms.
Clearly define roles within virtual supervision models to prevent ambiguity and compliance risks.
Ensure accurate and timely documentation—particularly critical with shorter interaction windows—to maintain compliance and optimize reimbursement.
Regularly audit workflows to identify inefficiencies or compliance vulnerabilities early.
Embrace the Opportunity of Comprehensive Remote Care
The CMS 2026 updates are more than a regulatory adjustment—they are a gateway to comprehensive, integrated remote care. RPM will continue to be a crucial component, but practices that expand their vision to integrate CCM, RTM, TCM, APCM, and PCM will reap the greatest rewards.
Healthcare leaders who act decisively now will set the standard for future patient care, positioning their organizations for long-term clinical, operational, and financial success.
Ready to explore how these CMS changes can transform your practice? Reach out right now and we will set up a complimentary strategy session and start building your integrated remote care solution.
Disclaimer: This article is informational only; specific rates and policies vary by MAC and payer plan.
Stop Choosing Between APCM and Your RPM/RTM Revenue
The $1.2 Million Mistake Most Practices Are Making Right Now
If your practice adopted APCM by shutting down RPM and RTM programs, you left money on the table. If you're running all three programs separately, you're burning cash on duplicate documentation and exposing yourself to compliance risk.
The correct answer isn't either-or, its coordinated integration. Practices that get this right are generating $225-325 net margin per patient monthly while reducing administrative burden by up to 30%.
Here's how the economics actually work, and what separates winning practices from everyone else.
Why Practices Get This Wrong
CMS introduced APCM as a structural upgrade to care management, not a replacement for monitoring programs. Yet most practices treat it as one:
The Replacement Trap: Practices abandon profitable RPM and RTM programs, assuming APCM covers everything. It doesn't. You lose monitoring revenue and weaken care continuity.
The Silo Trap: Practices run all three programs independently, creating redundant workflows, conflicting documentation, and billing errors that invite audits.
Both approaches cost you money. The first sacrifices revenue. The second burns it on overhead.
The Integration Model: Three Programs, One System
Successful practices recognize that APCM, RPM, and RTM serve distinct clinical and financial functions:
APCM provides the overall care management structure—provider accountability, care planning, and transition management.
RPM and RTM deliver continuous patient data that drives specific interventions within that structure.
Integration means these programs share one care plan, one documentation system, and one accountability framework. You bill separately for each service, but you execute them as a unified operation.
What This Looks Like Operationally
Single Care Plan: RPM glucose readings or RTM therapy adherence data flow directly into the APCM care plan, triggering interventions automatically.
Unified Task Management: All outreach, education, and monitoring tasks appear on one centralized list—not scattered across three platforms.
Automated Documentation: Software captures activity in real time, meeting all program-specific billing requirements without duplicate data entry.
One Accountability System: Care navigators, nurses, and providers coordinate under a single supervisory framework rather than juggling separate program rules.
This eliminates the false trade-off between patient volume and compliance. Practices scale both simultaneously.
The Financial Case: Real Numbers from In-House Programs
Most practices running siloed programs capture $150-180 per patient monthly across RPM or basic care management. They're leaving significant reimbursement unclaimed.
Integrated in-house APCM + RPM + RTM programs using modern automation generate $250-350 per patient per month in combined reimbursement. Program costs run approximately $25 per patient monthly ($10 software, $15 per device rental).
Net margin per patient: $225-325 per month, depending on complexity and time documented.
Staffing efficiency compounds these gains. In siloed programs, a three-person care team (two RNs, one MA) manages 600-700 patients due to documentation overhead and system friction. Integrated systems with automation enable the same team to handle 900-1,000 patients while maintaining compliance.
Est. 55% margin after all costs = $151,525 monthly profit
That's a $106,725 monthly difference, or $1.28 million annually, with identical headcount.
These figures reflect actual CMS reimbursement rates and reported results from practices running integrated programs in 2024-2025. The difference comes from eliminating waste and capturing all available compliant reimbursement—not from aggressive billing.
Clinical Scenarios Where Integration Drives Value
Chronic Disease Management (RPM + APCM)
A diabetes patient's RPM glucose monitor flags elevated readings. In an integrated system, those readings automatically update the APCM care plan and trigger an intervention protocol. One documentation event satisfies both programs' billing requirements.
Post-Surgical Rehabilitation (RTM + APCM)
A patient recovering from knee surgery stops engaging with home therapy exercises tracked through RTM. Integrated software alerts the APCM care team immediately, enabling intervention before outcomes deteriorate. Both programs bill compliantly from the same workflow.
Complex Post-Hospitalization Care (RPM + RTM + APCM)
A COPD patient discharged from the hospital needs breathing monitoring (RPM), therapy adherence tracking (RTM), and transition management (APCM). All three run from one system, preventing readmission while maximizing compliant reimbursement.
In each case, integration creates clinical value and financial value simultaneously—not by gaming the system, but by eliminating waste.
Compliance: The Four Non-Negotiables
Integration increases revenue only if you maintain clear program separation in documentation and billing:
Differentiate Services Clearly: Document what RPM, RTM, and APCM each provide. Never blur the lines.
Prevent Time Overlap: If you bill 20 minutes for APCM, that same 20 minutes cannot count toward RPM time requirements.
Document Care Transitions: APCM requires thorough transition documentation. Automate this wherever possible, but verify completeness.
Audit Monthly: Run internal reviews to catch billing errors before external audits do.
Automation handles most of this oversight, but governance remains essential. The practices that avoid trouble treat compliance as a system design problem, not a documentation problem.
Why Automation Is Non-Negotiable
Integration without automation is theory. Automation makes it operational reality.
Platforms like FairPath centralize patient records, automatically manage care tasks against billing criteria, and generate audit-ready documentation in real time. This isn't about convenience—it's about making integration financially viable.
Without automation, the administrative load of running three coordinated programs exceeds the efficiency gains. With automation, you unlock the full revenue potential while reducing overhead.
What to Do Next
If you're running APCM, RPM, or RTM in isolation, you're likely generating $150-180 per patient monthly—and leaving $100-170 per patient unclaimed. If you're avoiding APCM because you think it conflicts with existing programs, you're missing a seven-figure annual opportunity.
The strategic question isn't whether to integrate. It's how quickly you can operationalize integration with the right automation and cost structure to capture $225-325 net margin per patient.
Do This Next:
Audit your current APCM, RPM, and RTM programs separately—identify overlap, gaps, and billing inefficiencies
Calculate your current per-patient monthly net margin across all programs (including software and device costs)
Model the revenue impact of full integration using the $225-325 net margin per patient benchmark and your current census
Evaluate whether your current software can support unified workflows or if you need a purpose-built platform
Schedule a 45-minute APCM Integration Review to map your specific opportunity and compliance requirements
Integration isn't the future of care management—it's the present. The only question is whether you'll capture the opportunity this year or watch competitors do it first.
Disclaimer: This article provides general information only. Specific reimbursement rules and eligibility vary by MAC, payer, and contract year. Consult with compliance and billing specialists before implementing new programs.
CMS is quietly reshaping how primary care teams can be paid for mental and emotional health support. Starting in 2026 (if finalized), practices using the new Advanced Primary Care Management (APCM) codes will be able to add small, monthly payments for behavioral health integration...
On January 1, CMS introduced a brand-new benefit called Advanced Primary Care Management (APCM), a monthly payment designed to roll up the core elements of care coordination under a single code. For primary care leaders, this changes the landscape in profound ways. APCM overlaps with Chronic Care Management (CCM)...
This document outlines a groundbreaking proof of concept for reimagining medical ontologies and artificial intelligence. Buffaly demonstrates how large language models (LLMs) can unexpectedly enable symbolic methods to reach unprecedented levels of effectiveness. This fusion delivers the best of both worlds: completely transparent, "white box" systems capable of autonomous learning directly from raw data...
Advanced Primary Care Management (APCM) represents one of the more meaningful changes in the CMS Physician Fee Schedule. As of January 1, 2025, practices that adopt this model will be reimbursed through monthly, risk-stratified codes rather than only episodic, time-based billing...
Primary care is carrying more risk, more responsibility, and more expectation than ever. The opportunity is that we finally have a model that pays for the work most teams already do between visits. The risk is jumping into tooling and tactics before we agree on the basics....
The Federal Trade Commission’s Sept. 12 warning to healthcare employers is a simple message with real operational consequences. Overbroad noncompetes, no‑poach language, and “de facto” restraints chill worker mobility and can limit patients’ ability to choose their clinicians. For practices building Advanced Primary Care Management teams, restrictive templates do more than create legal risk...
Advanced Primary Care Management represents Medicare's most ambitious attempt to transform primary care economics. Unlike previous programs that nibbled at the margins, APCM fundamentally restructures how practices organize, deliver, and bill for comprehensive care...
Advanced Primary Care Management (APCM) is Medicare’s newest program, introduced in 2025 with three billing codes: G0556, G0557, and G0558. This represents a pivotal shift toward value-based primary care by offering monthly reimbursements for delivering continuous, patient-focused services. You're already providing these services—why not get paid for it?
At 2 AM, a new mother in rural Alabama feels her heart racing. She's two weeks postpartum, alone with a newborn while her husband works the night shift. Her blood pressure reading on the home monitor shows 158/95. Within minutes, her care team receives an alert. By 6 AM, a nurse has called, medications are adjusted, and what could have been a stroke becomes a story of crisis averted.
Many health systems pay full-service RPM vendors $40–$80 PMPM for services they can in-source for far less. With 2025 Medicare rates and OIG scrutiny, it's time to revisit the build-vs-buy math.
A few months ago, a physician at a 12-doctor practice in rural California called me frustrated. His practice was hemorrhaging money on readmissions, his nurses were burning out from phone tag with chronic disease patients, and his administrator was getting pressure from...
Medical executives today face an uncomfortable reality: while navigating shrinking margins and mounting operational pressures, many are unknowingly surrendering millions in Medicare reimbursements to third-party vendors. The culprit? Poorly structured Remote Patient Monitoring (RPM), Chronic Care Management (CCM)...
Remote Patient Monitoring (RPM) has rapidly evolved from emerging healthcare innovation into a strategic necessity. Driven aggressively by CMS reimbursement policies, RPM adoption has accelerated at unprecedented rates...
In a single December blog post, CMS just rewrote the playbook for $400 billion in annual Medicare Advantage spending. The termination of the Medicare Advantage Value-Based Insurance Design...
If you've spent any time managing a remote patient monitoring (RPM) program, you already know the drill: juggling the 16-day rule, keeping track of clinical minutes, chasing compliance, and often wondering if this is really what patient-centered care was meant to feel like...
Let’s be honest. Managing your health today feels like trying to coordinate a group project where nobody checks their messages. Your cardiologist, endocrinologist...
The healthcare industry still has scars from the ICD-9 to ICD-10 transition. The stories are legendary in Health IT circles: coder productivity plummeting, claim denials surging, and revenue cycles seizing up for months. It was a painful lesson in underestimation...
In my work with healthcare organizations across the country, I see two distinct patient profiles coming into focus. They represent the past and future of remote care, and every successful practice must now build a bridge between them...
The healthcare landscape is continuously evolving, and among the most profound shifts emerging is the concept of the Digital Twin for Patients. This technology isn't merely an abstract idea...
It starts with a data spike… a sudden drop in movement, a rise in reported pain. The alert pings the provider dashboard, hinting at deterioration. But what if that signal isn’t telling the whole truth
Chronic pain isn’t just a condition, it’s a thief. It steals time, joy, and freedom from over 51 million Americans, according to the CDC, costing the economy $560 billion a year. As someone passionate about healthcare innovation, I’ve seen how this silent struggle affects patients, families, and providers...
In the tech industry today, we frequently toss around sophisticated terms like "ontology", often treating them like magic words that instantly confer depth and meaning. Product managers, software engineers, data scientists—everyone seems eager to invoke..
Picture Mary, 62, balancing a job and early diabetes. Her doctor, Dr. Patel, is her anchor—reviewing labs, coordinating with a nutritionist, tweaking her care plan. But until 2025, Dr. Patel wasn’t paid for this invisible work...
In healthcare, most of the time, trouble doesn't announce itself with sirens and red flags. It starts quietly. A free dinner here. A paid talk there. An event that feels more like networking than education...
The Office of Inspector General’s (OIG) 2024 report, Additional Oversight of Remote Patient Monitoring in Medicare Is Needed (OEI-02-23-00260), isn't just an alert—it's a detailed playbook exposing critical vulnerabilities in Medicare’s Remote Patient Monitoring (RPM) system...
When the Department of Justice announces settlements, many of us glance at the headlines and move on. Yet, behind those headlines are real stories about real decisions...
Feeling like you’re drowning in regulations designed by giants, for giants? If you're running a small practice in today's healthcare hellscape, it damn sure feels that way...
When people ask me what Intelligence Factory does, they often expect to hear about AI, automation, or billing systems. And while we do all those things...
Introduction: The AI Revolution is Here—Are You Ready?
Artificial intelligence isn’t just a buzzword anymore—it’s a transformative force reshaping industries worldwide. Yet for many IT companies, the question isn’t whether to adopt AI but how...
Agentic AI is rapidly gaining traction as a transformative technology with the potential to revolutionize how we interact with and utilize artificial intelligence. Unlike traditional AI systems that passively respond to...
Large Language Models (LLMs) have ushered in a new era of artificial intelligence, enabling systems to generate human-like text and engage in complex conversations...
The rapid advancement of Large Language Models (LLMs) has brought remarkable progress in natural language processing, empowering AI systems to understand and generate text with unprecedented fluency. Yet, these systems face...
Retrieval Augmented Generation (RAG) sounds like a dream come true for anyone working with AI language models. The idea is simple: enhance models like ChatGPT with external data so...
In Volodymyr Pavlyshyn's article, the concepts of Metagraphs and Hypergraphs are explored as a transformative framework for developing relational models in AI agents’ memory systems...
As artificial intelligence (AI) becomes a powerful part of our daily lives, it’s amazing to see how many directions the technology is taking. From creative tools to customer service automation...
Hey everyone, Justin Brochetti here, Co-founder of Intelligence Factory. We're all about building cutting-edge AI solutions, but I'm not here to talk about that today. Instead, I want to share...
When it comes to data retrieval, most organizations today are exploring AI-driven solutions like Retrieval-Augmented Generation (RAG) paired with Large Language Models (LLM)...
Artificial Intelligence. Just say the words, and you can almost hear the hum of futuristic possibilities—robots making decisions, algorithms mastering productivity, and businesses leaping toward unparalleled efficiency...
As a Sales Manager, my mission is to drive revenue, nurture customer relationships, and ensure my team reaches their goals. AI has emerged as a powerful ally in this mission...
RPM (Remote Patient Monitoring) CPT codes are a way for healthcare providers to get reimbursed for monitoring patients' health remotely using digital devices...
As VP of Sales, I'm constantly on the lookout for ways to empower my team and maximize their productivity. In today's competitive B2B landscape, every interaction counts...
Everything old is new again. A few years back, the world was on fire with key-value storage systems. I think it was Google's introduction of MapReduce that set the fire...