GNAT High-Level Architecture

GNAT (CTM Toolkit) is a production-ready Python library providing a unified client interface for security and threat-intelligence platforms. This document describes the overall system architecture and links to the individual Architecture Decision Records (ADRs) that capture the rationale behind each major design choice.

Visual diagrams:

Architectural Diagrams — system overview, connector architecture, AI agent layer, ingestion pipeline (PNG, generated with the diagrams library)

Workflow Diagrams — sequence and flow diagrams for ingestion, analysis, export, scheduling, and AI agent request flows (Mermaid, compatible with Grafly)

Layers at a Glance

┌─────────────────────────────────────────────────────────────────────┐
│                        CLI / Web Dashboard                          │
│          gnat/cli/  ·  gnat/serve/  ·  gnat/viz/tui.py             │
├─────────────────────────────────────────────────────────────────────┤
│              Dissemination  (gnat/dissemination/)                   │
│     ExportService · WebhookNotifier · TAXII 2.1 · REST Gateway      │
├────────────────────┬────────────────────┬───────────────────────────┤
│   Analysis Layer   │   Reporting Layer  │   Investigation Builder   │
│   gnat/analysis/   │   gnat/reporting/  │   gnat/investigations/    │
│ Confidence · TLP   │ Report lifecycle   │ 5-step evidence graph     │
│ Correlation        │ STIX SDO export    │ Cross-platform pipeline   │
│ Timeline · Graph   │ AI drafting assist │ Workspace materialisation │
├────────────────────┴────────────────────┴───────────────────────────┤
│                         GNATClient facade                           │
│                          gnat/client.py                             │
├────────────────────┬────────────────────┬───────────────────────────┤
│   Ingest Pipeline  │  AI Agent Layer    │  Research Library         │
│   gnat/ingest/     │  gnat/agents/      │  gnat/research/           │
├────────────────────┴────────────────────┴───────────────────────────┤
│                     STIX 2.1 ORM  (gnat/orm/)                       │
├──────────────────────────────────┬──────────────────────────────────┤
│  159 Platform Connectors         │  Export Pipeline                 │
│   gnat/connectors/               │  gnat/export/                   │
├──────────────────────────────────┴──────────────────────────────────┤
│          HTTP Client Layer  (gnat/clients/  ·  gnat/async_client/)  │
│            urllib3 (sync)  ·  httpx (async)                         │
├─────────────────────────────────────────────────────────────────────┤
│   Context & Workspace  │  Scheduling    │  Search Sidecar           │
│   gnat/context/        │  gnat/schedule/│  gnat/search/ (Solr)      │
└─────────────────────────────────────────────────────────────────────┘

Core Subsystems

HTTP Client Layer

All network I/O is handled by a thin wrapper around urllib3.PoolManager for synchronous work and httpx.AsyncClient for async work. The layer provides connection pooling, configurable retries, and a uniform GNATClientError exception that carries HTTP status and body.

→ ADR-0001: HTTP Client Layer → ADR-0007: Async Client

STIX 2.1 ORM

STIXBase is a pure Python class — not a SQLAlchemy model or Pydantic model. Core STIX fields are real instance attributes; all other properties live in a _properties dict exposed via __getattr__/__setattr__. Serialisation is done via to_dict() / from_dict() / to_stix_bundle(). Non-standard extension fields use the x_ prefix per STIX 2.1.

→ ADR-0002: ORM / STIX Compatibility

Analysis Layer

The gnat.analysis package is the analyst-facing layer that transforms ingested CTI data into intelligence products. It provides:

Confidence scoring — ConfidenceScore combines the NATO Admiralty Scale (source reliability A–F, information credibility 1–6) with a STIX 2.1 numeric confidence value (0–100).
TLP 2.0 classification — TLPLevel enum covering WHITE/CLEAR/GREEN/AMBER/AMBER+STRICT/RED, shared across the analysis, reporting, and dissemination layers.
Analyst investigations — Investigation objects with a four-state lifecycle (OPEN → IN_PROGRESS → REVIEW → CLOSED), hypothesis tracking, analyst notes, tasks, and artifact linking. InvestigationService enforces transitions; InvestigationStore persists via SQLAlchemy.
Correlation engine — EntityResolver deduplicates IOCs across platforms; RelationshipScorer scores co-occurrence; ClusterDetector groups related indicators; EnrichmentDispatcher fans out enrichment queries best-effort.
Timeline reconstruction — TimelineBuilder assembles chronological event sequences from investigations, evidence graphs, or raw platform records.
Graph queries — GraphQuery provides BFS pivot/expand/filter over EvidenceGraph objects without a separate graph database. The infra_roles filter enables querying by infrastructure classification (C2, staging, exfiltration, delivery, proxy, credential harvest).
Attribution & campaigns — CampaignService manages campaign lifecycles (SUSPECTED → ACTIVE → DORMANT → CONCLUDED), AttributionEngine tracks competing hypotheses with Admiralty Scale scoring, DiamondAnalyzer constructs ACIV tuples, KillChainTracker monitors ATT&CK tactic progression, InfrastructureClassifier labels indicators by operational role, and CampaignBuilder promotes ClusterDetector output to formal campaigns.
Analyst assistance — GapDetector surfaces missing evidence via rule-based gap analysis; ReportDraftingAssistant generates LLM-backed executive summaries and key-findings narratives.

→ ADR-0031: Analysis Layer Architecture → ADR-0033: Confidence Scoring Model → ADR-0051: Attribution & Campaign Tracking → ADR-0053: Infrastructure Graph Labels → How-to: Use the Analysis Layer

Investigation Builder

gnat.investigations.InvestigationBuilder orchestrates a five-step cross-platform evidence collection pipeline: seed expansion → incident expansion → normalisation → correlation → materialisation. It translates raw platform records into a unified EvidenceGraph of EvidenceNode and EvidenceEdge objects, then writes them to a GNAT workspace as STIX objects and Relationship SROs. Works with any subset of connected platform clients.

→ ADR-0031: Analysis Layer Architecture → How-to: Build Cross-Platform Investigations

HuntGNAT (Detection Rule Translation)

gnat.plugins.huntgnat translates STIX indicator patterns into platform-native detection rules. A recursive descent parser produces a typed AST from STIX patterns, which four translators consume:

Sigma — log-source-aware YAML rules with field-name resolution
YARA — hash-based file detection rules
Suricata — network alert rules (rejects host-only patterns via UntranslatableError)
Snort — Snort 3 IPS rules

Hunt packages (HuntPackage) bundle hypotheses, evidence, detection rules, and ATT&CK coverage into STIX Grouping objects with a lifecycle (DRAFT → PEER_REVIEWED → ACTIVE → RETIRED). CoverageAnalyzer builds ATT&CK technique × rule coverage matrices and identifies gaps. DeploymentTracker monitors where rules are deployed and DriftDetector identifies when on-platform copies diverge from canonical versions. ValidationRun scores whether rules actually fire during Atomic Red Team-style test executions.

→ ADR-0050: HuntGNAT — Detection Rule Translation

Telemetry Ingestion

gnat.ingest.telemetry provides high-volume sensor event ingestion for lab infrastructure:

KafkaSourceReader — SourceReader subclass consuming JSON events from Kafka topics
SensorSchema — normalises five sensor types (honeypot, netflow, IDS alert, DNS log, generic) into a common SensorEvent intermediate format
TelemetryMapper — RecordMapper subclass producing STIX Indicators for IPs, domains, URLs, and file hashes with private-IP filtering and severity gating
RedisDeduplicationCache — SHA-256 fingerprint dedup via Redis SET operations with automatic in-memory fallback when Redis is unavailable
CampaignLinker — pipeline transform that auto-links ingested indicators to active campaigns

Install with pip install "gnat[telemetry]" (kafka-python-ng + redis).

→ ADR-0052: Telemetry Ingestion

Analysis Rule Engine

gnat.analysis.rules provides automated hypothesis evaluation via declarative rules. Three engine implementations are available, selectable via [rules] engine in config:

Hy (default) — Lisp/S-expression rules via defrule macro
YAML — declarative condition DSL referencing 26 helpers by name
Prolog — SWI-Prolog logic rules via pyswip for complex inference

All engines share the evaluation pipeline: RuleContext, Decision types, AuditWriter, RuleOrchestrator, and 26 helper functions (evidence, confidence, temporal, status, policy, source/trust). Rules are advisors — they return decisions without mutating state. The orchestrator applies decisions via InvestigationService. Feature flag default: OFF.

→ ADR-0054: Analysis Rule Engine

Reporting Layer

gnat.reporting provides first-class intelligence report objects with a formal five-state lifecycle (DRAFT → REVIEW → APPROVED → PUBLISHED → ARCHIVED). ReportService enforces the state machine and generates a STIX 2.1 report SDO bundle automatically on publish(). Published reports are immutable; revisions create a new draft linked via parent_report_id. Distinct from gnat.reports (operational PDF/DOCX generator) — this layer produces structured, traceable finished intelligence.

→ ADR-0034: Report Lifecycle → ADR-0032: STIX Custom Objects → How-to: Create Intelligence Reports

Dissemination Layer

gnat.dissemination handles the outbound delivery of finished intelligence:

ExportService — serialises published Report objects to STIX 2.1 bundle, JSON, or PDF.
WebhookNotifier — fans out HTTP POST notifications to registered subscribers, with TLP-based filtering and optional HMAC-SHA256 request signing.
TAXII 2.1 router — build_taxii_router() returns a FastAPI router exposing full TAXII 2.1 Discovery / Collections / Objects endpoints.
REST gateway — build_gateway_router() exposes report listing, export download, and API key administration; bearer-token auth with TLP-restricted key scopes.

→ ADR-0028: TAXII 2.1 Server → ADR-0031: Analysis Layer Architecture → How-to: Disseminate Intelligence

Connector Architecture

Each connector uses dual inheritance — BaseClient (HTTP) and ConnectorMixin (STIX contract). Every connector must implement authenticate(), to_stix(), from_stix(), health_check(), and the four CRUD methods. Connectors are registered in CLIENT_REGISTRY in gnat/clients/__init__.py. The library ships with 159 connectors covering SIEM, XDR, TIP, ASM, OT/IoT, vulnerability management, sandboxes, MDR, identity/ITDR, email security, insider risk/UEBA, BAS, DFIR, certificate transparency, bug bounty, and AI platforms.

→ ADR-0003: Connector Architecture

Ingestion Framework

Three composable abstractions form the ingest pipeline:

Abstraction	Role
`SourceReader`	Reads raw records from any source (file, API, TAXII, RSS, SQL…)
`RecordMapper`	Converts raw records into `STIXBase` objects
`IngestPipeline`	Wires reader → mapper → dedup → connector write

14 built-in readers and 12 built-in mappers cover the most common formats. Custom readers and mappers can be dropped in by subclassing.

→ ADR-0004: Ingestion Framework

Context and Workspace

A GlobalContextRegistry tracks named connector instances and their read/write permissions. WorkspaceManager creates and manages investigation workspaces, each with its own object graph and diff/commit lifecycle. Workspaces are serialised to JSON for persistence; optional SQLAlchemy back-end available via the persist extra.

→ ADR-0005: Context System → ADR-0006: Workspace Persistence → ADR-0027: Multi-Tenant Workspace Isolation

Visualization

Three rendering targets are supported out of the box:

Target	Module	Best for
Tabular (pandas / rich)	`gnat/viz/tabular.py`	CLI output, quick review
Graph (sigma.js / pyvis)	`gnat/viz/graph.py`	Relationship exploration
Grafana / Power BI export	`gnat/viz/`	Operational dashboards

→ ADR-0008: Visualization — Tabular → ADR-0009: Visualization — Graph → ADR-0010: Visualization — Grafana vs Power BI

CLI

The CLI (gnat/cli/main.py) uses argparse subcommands with no framework dependency. It surfaces ingest, export, scheduling, workspaces, connectors, reports, and code generation as top-level subcommands.

→ ADR-0011: CLI Design → ADR-0023: Terminal UI — Textual

Code Generation

gnat/codegen/ scaffolds new connector packages from an OpenAPI specification. It generates the directory layout, __init__.py, client.py stub with the full ConnectorMixin contract, unit test skeleton, INI example block, and ADR stub.

→ ADR-0012: Code Generation → ADR-0024: XSOAR Content Pack Generator

Configuration

INI-based configuration via stdlib configparser. Search order: GNAT_CONFIG env var → ~/.gnat/config.ini → ./gnat.ini. Each platform gets its own section; shared settings live in [global]. No external config library is used.

→ ADR-0013: Configuration

Testing Strategy

Unit tests live in tests/unit/ and mock at the HTTP layer via mock_pool_manager. Integration tests in tests/integration/ are gated behind @pytest.mark.integration and the --run-integration pytest flag; they require live credentials in GNAT_CONFIG. Minimum coverage is 70 %.

→ ADR-0014: Testing Strategy

Packaging and Extras

GNAT uses setuptools extras so users install only what they need. The core package requires only urllib3. Optional feature groups (yaml, taxii, ingest, async, persist, schedule, reports, viz, serve) are installed on demand. The all extra pulls everything.

→ ADR-0015: Packaging and Extras

Feed Scheduling

FeedJob wraps a (SourceReader, RecordMapper, connector) triple with a cron expression. FeedScheduler runs jobs via croniter, tracks last_success, and passes a JobRunContext to each reader factory so incremental fetches work correctly.

→ ADR-0016: Feed Scheduling

Export Pipeline

The export layer converts STIXBase objects to delivery-ready formats. Built-in targets include EDL (plain-text IP/domain/URL block lists) and Netskope CE. A filter chain (ConfidenceFilter, TLPFilter, SectorFilter) gates what reaches each target.

→ ADR-0017: Export / Integration Pipeline

AI Agent Layer

ResearchAgent (a SourceReader) and ParsingAgent (a RecordMapper) drop directly into the existing IngestPipeline and FeedJob infrastructure. They call the Claude API using stdlib urllib (no anthropic SDK dependency). Every AI-extracted STIX object is capped at ai_confidence_ceiling (default 60) and tagged x_source_type: "ai_extracted" to require human review before high-stakes propagation. CopilotReader connects to Microsoft 365 via the Bot Framework DirectLine v3 API.

→ ADR-0018: AI Agent Layer

Research Library

ResearchLibrary provides a curated, searchable store of threat reports, news, and analyst notes. CurationJob automates ingestion from monitored RSS/web sources. The library integrates with the AI agent layer for AI-assisted summarisation and with the Solr search sidecar for full-text search.

→ ADR-0019: Shared Research Library

NLP Query Layer

A natural-language query interface sits in front of workspace objects and the research library. Queries are translated to structured filters by the AI agent layer, allowing analysts to ask questions like “show me all ransomware indicators added this week” without writing code.

→ ADR-0020: NLP Query Layer

Rust Native Extension

An optional Rust extension module (gnat._core) accelerates hot-path IOC operations: classify, defang, refang, extract pattern value, and batch classify. The Python shim (gnat/ingest/_ioc_classifier.py) detects whether the compiled extension is available and falls back to the pure-Python implementation transparently.

→ ADR-0021: Rust Native Extension

Web Dashboard

gnat/serve/ exposes a FastAPI-based web dashboard for browsing workspaces, running queries, and reviewing AI-extracted objects. It is an optional component installed via the serve extra (fastapi + uvicorn).

→ ADR-0022: Web Dashboard

Upstream Contribution Pipeline

A pipeline that formats GNAT-curated intelligence as pull requests or API submissions to open-source threat-intel communities (MISP galaxies, OpenCTI, TAXII 2.1 servers). Governed by configurable confidence thresholds and TLP markings.

→ ADR-0025: Upstream Contribution Pipeline

Connector Health Monitor

A background service that polls each registered connector’s health_check() endpoint on a configurable interval, records latency and availability metrics, and surfaces connector status in the web dashboard and CLI.

→ ADR-0026: Connector Health Monitor

TAXII 2.1 Server

An embedded TAXII 2.1 server (gnat/serve/taxii/) allows GNAT to act as a threat-intel distribution point. Collections map to connector namespaces or workspace snapshots. Requires the serve extra.

→ ADR-0028: TAXII 2.1 Server

Docker Containerisation

Official Docker images and a docker-compose.yml ship with the repository. The compose stack includes the GNAT API server, Solr (search sidecar), and a scheduler container. Configuration is injected via environment variables that map to INI keys.

→ ADR-0029: Docker Containerization

Architecture Decision Records Index

All ADRs are stored in docs/explanation/architecture/adrs/ and listed in the ADR README.

#	Title	Topic
0001	HTTP Client Layer	Infrastructure
0002	ORM / STIX Compatibility	Data model
0003	Connector Architecture	Integration
0004	Ingestion Framework	Data pipeline
0005	Context System	State management
0006	Workspace Persistence	State management
0007	Async Client	Infrastructure
0008	Visualization — Tabular	UX
0009	Visualization — Graph	UX
0010	Visualization — Grafana vs Power BI	UX
0011	CLI Design	UX
0012	Code Generation	Developer experience
0013	Configuration	Infrastructure
0014	Testing Strategy	Quality
0015	Packaging and Extras	Distribution
0016	Feed Scheduling	Data pipeline
0017	Export / Integration Pipeline	Data pipeline
0018	AI Agent Layer	Intelligence
0019	Shared Research Library	Intelligence
0020	NLP Query Layer	Intelligence
0021	Rust Native Extension	Performance
0022	Web Dashboard	UX
0023	Terminal UI — Textual	UX
0024	XSOAR Content Pack Generator	Developer experience
0025	Upstream Contribution Pipeline	Integration
0026	Connector Health Monitor	Operations
0027	Multi-Tenant Workspace Isolation	State management
0028	TAXII 2.1 Server	Integration
0029	Docker Containerization	Operations
0030	Adopt Diátaxis and ADRs	Documentation
0031	Analysis Layer Architecture	Intelligence
0032	STIX Custom Objects	Data model
0033	Confidence Scoring Model	Intelligence
0034	Report Lifecycle State Machine	Intelligence
0035	Quality Agents	Quality
0036	Security Agents (Phase B)	Quality
0037	Responsible Disclosure, DCO, and Apache 2.0 Compliance	Governance

Key Design Principles

Principle	Rationale
urllib3 over requests	Direct control, no extra abstraction layer, compatible with async path
Pure-Python ORM	STIX objects are not DB-bound; serialise to JSON, not sessions
`ConnectorMixin` contract	Every connector exposes the same CRUD + STIX surface; no special casing in pipelines
Extras-based packaging	Users pay only for the dependencies they actually use
AI confidence ceiling	AI-extracted intel requires human review before high-stakes propagation
INI configuration	Zero external config library; works everywhere `configparser` works
Diátaxis docs	Each document has one purpose — tutorial, how-to, reference, or explanation

Licensed under the Apache License, Version 2.0