Secure, deterministic semantic data access — no hallucinations
SemDB is a semantic database developed by Intelligence Factory to address the limitations of existing retrieval and generation systems when working with legacy, fragmented, or semantically complex data. Built as part of our core technology stack in Orlando, Florida, SemDB is designed for environments where trust, auditability, and system integration are essential—especially in regulated industries like healthcare. Unlike vector-only or generative systems that suffer from hallucinations, semantic ambiguity, and limited reasoning capability, SemDB uses structured ontologies, hybrid embeddings, and local integration to create a data layer that can be queried deterministically and acted upon directly.
The Problem
Why Existing Systems Fail - Comparison
Capability
Typical RAG Systems
SemDB
Complex Query Handling
Often returns plausible but incorrect results due to weak semantic grounding
Returns precise, ontology-aligned results for domain-specific queries
Relational Understanding
Lacks structure; can't model policy rules or cross-record relationships
Understands formal relationships via graph and ontology logic
Execution Capability
Stops at retrieval; no ability to trigger actions or update data
Can populate fields, tag records, or trigger downstream routines
Security & Compliance
Cloud-based with potential for data leakage and compliance risk
Runs locally, encrypts data, no cloud dependency
Off-the-shelf retrieval and RAG (Retrieval-Augmented Generation) systems have several limitations that made them unsuitable for our use cases:
Inaccuracy in complex domains: Traditional RAG architectures depend on word embeddings to retrieve similar chunks of text. This leads to semantically plausible but incorrect results when queries are nuanced or domain-specific.
Lack of relational understanding: Vector databases retrieve data based on similarity, not structure. They can't represent relationships like "which billing codes apply under insurer X's policy" or "which records are incomplete across systems."
No execution layer: RAG systems stop at retrieval and augmented text generation. They cannot update a database, fill in missing fields, or act on the retrieved information.
Cloud dependency and security concerns: Many vendor solutions require data to leave the local environment, creating compliance risk and complicating integration with secure legacy systems.
We built SemDB in-house to resolve these constraints and support our need for structured, secure, ontology-aware data access that integrates seamlessly into existing systems.
Design and Implementation
SemDB is deployed as a local semantic data layer that unifies structured and unstructured data sources—from PDFs and call transcripts to SQL databases and EHRs. It uses the following architectural elements:
OGAR
Ontology-Guided Augmented Retrieval (OGAR): A structured ontology defines domain-specific concepts, properties, and relationships. OGAR interprets queries with contextual awareness, enabling precise filtering and disambiguation without relying on statistical generation.
Relevant, Structured Results
Hybrid Retrieval Engine: Combines vector embeddings for similarity scoring with symbolic graph traversal and ontology alignment. This allows SemDB to match intent with both contextual meaning and formal structure.
Deterministic, rules-based logic
ProtoScript Execution Layer: Semantic queries can trigger action routines via ProtoScript, a rule-based scripting language developed internally. This allows SemDB to populate missing fields, tag records for review, or integrate directly with CRM, billing, or workflow systems.
Semantic Extraction Pipeline: For unstructured data, such as audio transcripts or documents, SemDB extracts entities and relations and stores them as structured JSON aligned with the ontology. These can be used immediately to update other systems.
Deployment Model: SemDB is designed to run locally or in private infrastructure, with no data egress. It integrates via standard REST APIs and supports formats like JSON-LD for structured export.
Example Applications
Healthcare Claims Workflows: At Medek Health Systems, SemDB enables retrieval of insurer-specific policies, billing codes, and denial reasons from legacy EHRs and call transcripts. It has supported the processing of ~1 million claims with a 90% success rate and recovered over $250K in otherwise unrecoverable cash flow.
CRM Data Hygiene: SemDB extracts contact information from unstructured call transcripts and automatically updates missing fields in CRM records, improving data quality and reducing manual entry by over 70% in some deployments.
Supply Chain Audits: In manufacturing contexts, SemDB maps shipment delays, supplier performance, and inventory status across multiple internal databases. The semantic layer allows direct querying of state transitions and relational events over time.
Key Characteristics
Zero Hallucinations: All responses are directly retrieved from validated data. No generative inference is used, eliminating ambiguity and enabling auditability.
Structured and Unstructured Data Fusion: Supports both schema-first ingestion (from SQL and spreadsheets) and unstructured extraction (from PDFs, emails, and transcripts), harmonized through the ontology.
Real-Time Actionability: Extracted data can be transformed and dispatched immediately, enabling automation of downstream systems.
Scalability: In-memory graphs for development; persistent ontology-backed storage for production. Proven scale to tens of thousands of documents with sub-second retrieval.
Security and Compliance: Runs locally, encrypts data with AES-256, and meets HIPAA/GDPR requirements. No cloud dependency, no third-party inference.
Why We Built It
We needed a system that could:
Understand structured and unstructured legacy data
Retrieve precise answers under formal constraints
Act on that data deterministically
Remain fully auditable and secure
No existing tool provided this combination. SemDB is the result of that gap—a semantic system built for controlled environments, real-world data messiness, and execution-focused use cases.
Learn More
SemDB is now a core layer in our technology stack, enabling safe and explainable AI across medical billing, CRM automation, and internal data harmonization. For technical documentation or a demonstration, visit semdb.ai.