Skip to the content.

ADR-0042 — Hypothesis Testing Engine (Phase 4C)

Date: 2026-04-09
Status: Accepted
Deciders: GNAT Platform Team


Context

Threat intelligence analysis is fundamentally a hypothesis-driven activity. An analyst observing a cluster of suspicious indicators might form the hypothesis “192.0.2.1 is a Lazarus Group command-and-control server” and then accumulate evidence for or against that claim over days or weeks.

Prior to this ADR, GNAT had no structured mechanism for tracking hypotheses. Analysts recorded their assessments as free-text investigation notes, which meant:

The ReasoningEngine (ADR-0044) needed a structured hypothesis type to feed into its scoring pipeline, and the HypothesisEngine itself needed a home in the GNAT architecture that was consistent with the existing STIX custom object pattern (ADR-0032).


Decision

STIXHypothesis Custom SDO

A new custom STIX Domain Object is defined in gnat/stix/sdos/hypothesis.py:

@dataclass
class STIXHypothesis(STIXBase):
    """
    x-gnat-hypothesis — STIX custom SDO for analyst hypotheses.

    Represents a structured claim about a threat actor, campaign, or observable
    that can be confirmed, refuted, or left inconclusive by accumulated evidence.
    """

    type: str = "x-gnat-hypothesis"
    schema_version: int = 1

    # Core fields
    statement: str = ""                    # Natural-language hypothesis text
    confidence: float = 0.2               # [0.0, 1.0]; updated by evaluate()
    status: str = "pending"               # pending | confirmed | refuted | inconclusive

    # Evidence arrays — store STIX relationship IDs
    supporting_evidence: list[str] = field(default_factory=list)
    refuting_evidence: list[str] = field(default_factory=list)

    # Provenance
    created_by: str = ""                   # initiated_by from the creating ExecutionContext
    workspace_id: str = ""
    created_at: datetime | None = None
    last_evaluated_at: datetime | None = None

STIXHypothesis is registered in gnat/stix/sdos/__init__.py alongside other custom SDOs (x-gnat-report-summary, x-gnat-enrichment-log).

Evidence is stored as STIX relationship IDs (strings matching the STIX relationship--<uuid> pattern) rather than direct STIX IDs so that the evidence relationship itself carries the semantic link (e.g. relationship_type: "supports" or relationship_type: "refutes").

Status State Machine

              propose()
               ───────►  pending
                              │
                   evaluate() │
                 ┌────────────┤
                 │            │
      confidence ≥ 0.75       │  0.15 < confidence < 0.75
                 │            │
                 ▼            ▼            confidence ≤ 0.15
             confirmed    (unchanged)      AND refuting_evidence
                                               ───────────────►  refuted
                              │
                  close(verdict) │
                         ───────►  inconclusive (when verdict == "inconclusive")

HypothesisEngine

gnat/reasoning/hypothesis.py provides the lifecycle manager:

class HypothesisEngine:
    """
    Manages the full lifecycle of STIXHypothesis objects:
    propose → evaluate → close.
    """

    def __init__(
        self,
        store: WorkspaceStore,
        search_index: SearchIndex,  # SolrSearchIndex or NullSearchIndex
        trust_weights: dict[str, float] | None = None,
    ) -> None:
        self._store = store
        self._search = search_index
        self._weights = trust_weights or TRUST_WEIGHTS  # from gnat.core.trust

propose()

Creates and persists a new STIXHypothesis in the workspace:

def propose(
    self,
    statement: str,
    initial_evidence: list[str],
    ctx: ExecutionContext,
    confidence: float = 0.2,
) -> STIXHypothesis:
    """
    Parameters
    ----------
    statement : str
        Natural-language hypothesis text (e.g. "192.0.2.1 is Lazarus C2").
    initial_evidence : list[str]
        STIX relationship IDs linking the hypothesis to supporting objects.
    ctx : ExecutionContext
        Execution context; workspace_id and initiated_by are taken from here.
    confidence : float
        Initial confidence score in [0.0, 1.0].  Defaults to 0.2 (weak prior).

    Returns
    -------
    STIXHypothesis
        The persisted hypothesis object.
    """
    hyp = STIXHypothesis(
        id=f"x-gnat-hypothesis--{uuid4()}",
        statement=statement,
        confidence=confidence,
        status="pending",
        supporting_evidence=list(initial_evidence),
        refuting_evidence=[],
        created_by=ctx.initiated_by,
        workspace_id=ctx.workspace_id,
        created_at=datetime.utcnow(),
    )
    self._store.upsert(hyp, ctx)
    return hyp

evaluate()

Queries Solr for corroborating or refuting evidence and updates confidence:

def evaluate(
    self,
    hypothesis_id: str,
    ctx: ExecutionContext,
) -> STIXHypothesis:
    """
    Re-scores a hypothesis by querying the Solr search index for evidence
    corroborating or refuting its statement, then updates its confidence
    and (if thresholds are crossed) its status.
    """
    hyp = self._store.get(hypothesis_id, STIXHypothesis)

    # 1. Solr full-text query on the hypothesis statement
    hits = self._search.query(hyp.statement, fields=["name", "pattern", "description"])

    # 2. Weight each hit by the trust level of its source connector
    weighted_sum = 0.0
    for hit in hits:
        trust = hit.get("source_trust_level", "semi_trusted")
        weighted_sum += self._weights.get(trust, 0.6)

    # 3. Normalise to [0.0, 1.0]
    raw_corroboration = min(weighted_sum / max(len(hits), 1), 1.0)

    # 4. Blend with the existing confidence (Bayesian-inspired update)
    new_confidence = 0.4 * hyp.confidence + 0.6 * raw_corroboration
    new_confidence = round(max(0.0, min(1.0, new_confidence)), 4)

    # 5. Auto-classify
    new_status = hyp.status
    if new_confidence >= 0.75:
        new_status = "confirmed"
    elif new_confidence <= 0.15 and hyp.refuting_evidence:
        new_status = "refuted"

    hyp.confidence = new_confidence
    hyp.status = new_status
    hyp.last_evaluated_at = datetime.utcnow()
    self._store.upsert(hyp, ctx)
    return hyp

Confidence scoring weights by trust level:

Source Trust Level Weight Used in Corroboration
trusted_internal 0.9
semi_trusted 0.6
untrusted_external 0.3

Auto-classification thresholds:

Condition New Status
confidence ≥ 0.75 confirmed
confidence ≤ 0.15 AND refuting_evidence non-empty refuted
Neither threshold met Unchanged (remains pending)

close()

Locks the hypothesis with a final analyst verdict:

def close(
    self,
    hypothesis_id: str,
    verdict: str,  # "confirmed" | "refuted" | "inconclusive"
    ctx: ExecutionContext,
) -> STIXHypothesis:
    """
    Closes a hypothesis with a final analyst-provided verdict.
    Closed hypotheses are not eligible for further evaluate() calls.
    """
    if verdict not in ("confirmed", "refuted", "inconclusive"):
        raise ValueError(f"Invalid verdict: {verdict!r}")
    hyp = self._store.get(hypothesis_id, STIXHypothesis)
    if hyp.status in ("confirmed", "refuted", "inconclusive"):
        raise HypothesisAlreadyClosedError(hypothesis_id)
    hyp.status = verdict
    hyp.last_evaluated_at = datetime.utcnow()
    self._store.upsert(hyp, ctx)
    return hyp

Evidence Linkage via STIX Relationships

When an analyst (or an automated pipeline) identifies a STIX object that supports or refutes a hypothesis, a STIX relationship is created linking the two objects and the relationship ID is appended to the appropriate evidence list:

# Analyst adds supporting evidence
rel = STIXRelationship(
    relationship_type="supports",
    source_ref=suspicious_ip.id,
    target_ref=hyp.id,
)
workspace.upsert(rel, ctx)
hyp.supporting_evidence.append(rel.id)
engine.evaluate(hyp.id, ctx)  # re-score with new evidence

This approach means that every piece of evidence is a first-class STIX object, queryable, exportable as a STIX bundle, and auditable via the lineage tracker (ADR-0038).

Storage

STIXHypothesis is persisted via the existing WorkspaceStore.upsert() mechanism. No new database tables are required; the object lands in workspace_objects like any other STIX object. The idempotency key (ADR-0041) ensures that evaluate() calls updating the same hypothesis do not create duplicate rows.


Consequences

Positive

Negative / Trade-offs

Deferred


Alternatives Considered

Free-text analyst notes

Keeping hypotheses as free-text entries in investigation notes was the simplest option and required no new code. Rejected because:

  1. Notes are not machine-readable; SOAR and reporting systems cannot filter on status == "confirmed".
  2. Evidence linkage is lost; the note references the evidence by name but not by STIX ID, breaking the audit chain.
  3. Confidence is not tracked; analysts must manually re-assess every note when new evidence arrives.

External hypothesis management tools (e.g. Jupyter notebooks, Jira)

Using an external tool (Jira tickets, Jupyter analysis notebooks) to track hypotheses was considered. Rejected because it breaks GNAT’s single-data-model principle: all threat intelligence objects should be representable in STIX and stored in the workspace. An external tool would require a synchronisation bridge and would not benefit from Solr corroboration, lineage tracking, or the ReasoningEngine scoring pipeline.

Native STIX 2.1 opinion SDO

STIX 2.1 includes an opinion SDO that expresses an assessment about the correctness of STIX content. Using opinion directly was considered. Rejected because opinion has a fixed enumerated value set (strongly-disagree to strongly-agree) and no fields for a natural-language statement, a confidence score, or an evidence list. STIXHypothesis (x-gnat-hypothesis) extends the STIX custom object pattern consistently with ADR-0032 and can produce an opinion on close() as a derived output (deferred).


Licensed under the Apache License, Version 2.0