GNAT High-Level Architecture
GNAT (CTM Toolkit) is a production-ready Python library providing a unified client interface for security and threat-intelligence platforms. This document describes the overall system architecture and links to the individual Architecture Decision Records (ADRs) that capture the rationale behind each major design choice.
Visual diagrams:
- Architectural Diagrams — system overview, connector architecture, AI agent layer, ingestion pipeline (PNG, generated with the
diagramslibrary)- Workflow Diagrams — sequence and flow diagrams for ingestion, analysis, export, scheduling, and AI agent request flows (Mermaid, compatible with Grafly)
Layers at a Glance
┌─────────────────────────────────────────────────────────────────────┐
│ CLI / Web Dashboard │
│ gnat/cli/ · gnat/serve/ · gnat/viz/tui.py │
├─────────────────────────────────────────────────────────────────────┤
│ Dissemination (gnat/dissemination/) │
│ ExportService · WebhookNotifier · TAXII 2.1 · REST Gateway │
├────────────────────┬────────────────────┬───────────────────────────┤
│ Analysis Layer │ Reporting Layer │ Investigation Builder │
│ gnat/analysis/ │ gnat/reporting/ │ gnat/investigations/ │
│ Confidence · TLP │ Report lifecycle │ 5-step evidence graph │
│ Correlation │ STIX SDO export │ Cross-platform pipeline │
│ Timeline · Graph │ AI drafting assist │ Workspace materialisation │
├────────────────────┴────────────────────┴───────────────────────────┤
│ GNATClient facade │
│ gnat/client.py │
├────────────────────┬────────────────────┬───────────────────────────┤
│ Ingest Pipeline │ AI Agent Layer │ Research Library │
│ gnat/ingest/ │ gnat/agents/ │ gnat/research/ │
├────────────────────┴────────────────────┴───────────────────────────┤
│ STIX 2.1 ORM (gnat/orm/) │
├──────────────────────────────────┬──────────────────────────────────┤
│ 159 Platform Connectors │ Export Pipeline │
│ gnat/connectors/ │ gnat/export/ │
├──────────────────────────────────┴──────────────────────────────────┤
│ HTTP Client Layer (gnat/clients/ · gnat/async_client/) │
│ urllib3 (sync) · httpx (async) │
├─────────────────────────────────────────────────────────────────────┤
│ Context & Workspace │ Scheduling │ Search Sidecar │
│ gnat/context/ │ gnat/schedule/│ gnat/search/ (Solr) │
└─────────────────────────────────────────────────────────────────────┘
Core Subsystems
HTTP Client Layer
All network I/O is handled by a thin wrapper around urllib3.PoolManager for synchronous work and httpx.AsyncClient for async work. The layer provides connection pooling, configurable retries, and a uniform GNATClientError exception that carries HTTP status and body.
→ ADR-0001: HTTP Client Layer → ADR-0007: Async Client
STIX 2.1 ORM
STIXBase is a pure Python class — not a SQLAlchemy model or Pydantic model. Core STIX fields are real instance attributes; all other properties live in a _properties dict exposed via __getattr__/__setattr__. Serialisation is done via to_dict() / from_dict() / to_stix_bundle(). Non-standard extension fields use the x_ prefix per STIX 2.1.
→ ADR-0002: ORM / STIX Compatibility
Analysis Layer
The gnat.analysis package is the analyst-facing layer that transforms ingested CTI data into
intelligence products. It provides:
- Confidence scoring —
ConfidenceScorecombines the NATO Admiralty Scale (source reliability A–F, information credibility 1–6) with a STIX 2.1 numeric confidence value (0–100). - TLP 2.0 classification —
TLPLevelenum covering WHITE/CLEAR/GREEN/AMBER/AMBER+STRICT/RED, shared across the analysis, reporting, and dissemination layers. - Analyst investigations —
Investigationobjects with a four-state lifecycle (OPEN → IN_PROGRESS → REVIEW → CLOSED), hypothesis tracking, analyst notes, tasks, and artifact linking.InvestigationServiceenforces transitions;InvestigationStorepersists via SQLAlchemy. - Correlation engine —
EntityResolverdeduplicates IOCs across platforms;RelationshipScorerscores co-occurrence;ClusterDetectorgroups related indicators;EnrichmentDispatcherfans out enrichment queries best-effort. - Timeline reconstruction —
TimelineBuilderassembles chronological event sequences from investigations, evidence graphs, or raw platform records. - Graph queries —
GraphQueryprovides BFS pivot/expand/filter overEvidenceGraphobjects without a separate graph database. Theinfra_rolesfilter enables querying by infrastructure classification (C2, staging, exfiltration, delivery, proxy, credential harvest). - Attribution & campaigns —
CampaignServicemanages campaign lifecycles (SUSPECTED → ACTIVE → DORMANT → CONCLUDED),AttributionEnginetracks competing hypotheses with Admiralty Scale scoring,DiamondAnalyzerconstructs ACIV tuples,KillChainTrackermonitors ATT&CK tactic progression,InfrastructureClassifierlabels indicators by operational role, andCampaignBuilderpromotesClusterDetectoroutput to formal campaigns. - Analyst assistance —
GapDetectorsurfaces missing evidence via rule-based gap analysis;ReportDraftingAssistantgenerates LLM-backed executive summaries and key-findings narratives.
→ ADR-0031: Analysis Layer Architecture → ADR-0033: Confidence Scoring Model → ADR-0051: Attribution & Campaign Tracking → ADR-0053: Infrastructure Graph Labels → How-to: Use the Analysis Layer
Investigation Builder
gnat.investigations.InvestigationBuilder orchestrates a five-step cross-platform evidence
collection pipeline: seed expansion → incident expansion → normalisation → correlation →
materialisation. It translates raw platform records into a unified EvidenceGraph of
EvidenceNode and EvidenceEdge objects, then writes them to a GNAT workspace as STIX objects
and Relationship SROs. Works with any subset of connected platform clients.
→ ADR-0031: Analysis Layer Architecture → How-to: Build Cross-Platform Investigations
HuntGNAT (Detection Rule Translation)
gnat.plugins.huntgnat translates STIX indicator patterns into platform-native detection rules.
A recursive descent parser produces a typed AST from STIX patterns, which four translators consume:
- Sigma — log-source-aware YAML rules with field-name resolution
- YARA — hash-based file detection rules
- Suricata — network alert rules (rejects host-only patterns via
UntranslatableError) - Snort — Snort 3 IPS rules
Hunt packages (HuntPackage) bundle hypotheses, evidence, detection rules, and ATT&CK coverage
into STIX Grouping objects with a lifecycle (DRAFT → PEER_REVIEWED → ACTIVE → RETIRED).
CoverageAnalyzer builds ATT&CK technique × rule coverage matrices and identifies gaps.
DeploymentTracker monitors where rules are deployed and DriftDetector identifies when
on-platform copies diverge from canonical versions. ValidationRun scores whether rules
actually fire during Atomic Red Team-style test executions.
→ ADR-0050: HuntGNAT — Detection Rule Translation
Telemetry Ingestion
gnat.ingest.telemetry provides high-volume sensor event ingestion for lab infrastructure:
KafkaSourceReader—SourceReadersubclass consuming JSON events from Kafka topicsSensorSchema— normalises five sensor types (honeypot, netflow, IDS alert, DNS log, generic) into a commonSensorEventintermediate formatTelemetryMapper—RecordMappersubclass producing STIX Indicators for IPs, domains, URLs, and file hashes with private-IP filtering and severity gatingRedisDeduplicationCache— SHA-256 fingerprint dedup via Redis SET operations with automatic in-memory fallback when Redis is unavailableCampaignLinker— pipeline transform that auto-links ingested indicators to active campaigns
Install with pip install "gnat[telemetry]" (kafka-python-ng + redis).
→ ADR-0052: Telemetry Ingestion
Analysis Rule Engine
gnat.analysis.rules provides automated hypothesis evaluation via declarative rules.
Three engine implementations are available, selectable via [rules] engine in config:
- Hy (default) — Lisp/S-expression rules via
defrulemacro - YAML — declarative condition DSL referencing 26 helpers by name
- Prolog — SWI-Prolog logic rules via pyswip for complex inference
All engines share the evaluation pipeline: RuleContext, Decision types, AuditWriter,
RuleOrchestrator, and 26 helper functions (evidence, confidence, temporal, status, policy,
source/trust). Rules are advisors — they return decisions without mutating state. The
orchestrator applies decisions via InvestigationService. Feature flag default: OFF.
→ ADR-0054: Analysis Rule Engine
Reporting Layer
gnat.reporting provides first-class intelligence report objects with a formal five-state
lifecycle (DRAFT → REVIEW → APPROVED → PUBLISHED → ARCHIVED). ReportService enforces the state
machine and generates a STIX 2.1 report SDO bundle automatically on publish(). Published
reports are immutable; revisions create a new draft linked via parent_report_id. Distinct from
gnat.reports (operational PDF/DOCX generator) — this layer produces structured, traceable
finished intelligence.
→ ADR-0034: Report Lifecycle → ADR-0032: STIX Custom Objects → How-to: Create Intelligence Reports
Dissemination Layer
gnat.dissemination handles the outbound delivery of finished intelligence:
ExportService— serialises publishedReportobjects to STIX 2.1 bundle, JSON, or PDF.WebhookNotifier— fans out HTTP POST notifications to registered subscribers, with TLP-based filtering and optional HMAC-SHA256 request signing.- TAXII 2.1 router —
build_taxii_router()returns a FastAPI router exposing full TAXII 2.1 Discovery / Collections / Objects endpoints. - REST gateway —
build_gateway_router()exposes report listing, export download, and API key administration; bearer-token auth with TLP-restricted key scopes.
→ ADR-0028: TAXII 2.1 Server → ADR-0031: Analysis Layer Architecture → How-to: Disseminate Intelligence
Connector Architecture
Each connector uses dual inheritance — BaseClient (HTTP) and ConnectorMixin (STIX contract). Every connector must implement authenticate(), to_stix(), from_stix(), health_check(), and the four CRUD methods. Connectors are registered in CLIENT_REGISTRY in gnat/clients/__init__.py. The library ships with 159 connectors covering SIEM, XDR, TIP, ASM, OT/IoT, vulnerability management, sandboxes, MDR, identity/ITDR, email security, insider risk/UEBA, BAS, DFIR, certificate transparency, bug bounty, and AI platforms.
→ ADR-0003: Connector Architecture
Ingestion Framework
Three composable abstractions form the ingest pipeline:
| Abstraction | Role |
|---|---|
SourceReader |
Reads raw records from any source (file, API, TAXII, RSS, SQL…) |
RecordMapper |
Converts raw records into STIXBase objects |
IngestPipeline |
Wires reader → mapper → dedup → connector write |
14 built-in readers and 12 built-in mappers cover the most common formats. Custom readers and mappers can be dropped in by subclassing.
→ ADR-0004: Ingestion Framework
Context and Workspace
A GlobalContextRegistry tracks named connector instances and their read/write permissions. WorkspaceManager creates and manages investigation workspaces, each with its own object graph and diff/commit lifecycle. Workspaces are serialised to JSON for persistence; optional SQLAlchemy back-end available via the persist extra.
→ ADR-0005: Context System → ADR-0006: Workspace Persistence → ADR-0027: Multi-Tenant Workspace Isolation
Visualization
Three rendering targets are supported out of the box:
| Target | Module | Best for |
|---|---|---|
| Tabular (pandas / rich) | gnat/viz/tabular.py |
CLI output, quick review |
| Graph (sigma.js / pyvis) | gnat/viz/graph.py |
Relationship exploration |
| Grafana / Power BI export | gnat/viz/ |
Operational dashboards |
→ ADR-0008: Visualization — Tabular → ADR-0009: Visualization — Graph → ADR-0010: Visualization — Grafana vs Power BI
CLI
The CLI (gnat/cli/main.py) uses argparse subcommands with no framework dependency. It surfaces ingest, export, scheduling, workspaces, connectors, reports, and code generation as top-level subcommands.
→ ADR-0011: CLI Design → ADR-0023: Terminal UI — Textual
Code Generation
gnat/codegen/ scaffolds new connector packages from an OpenAPI specification. It generates the directory layout, __init__.py, client.py stub with the full ConnectorMixin contract, unit test skeleton, INI example block, and ADR stub.
→ ADR-0012: Code Generation → ADR-0024: XSOAR Content Pack Generator
Configuration
INI-based configuration via stdlib configparser. Search order: GNAT_CONFIG env var → ~/.gnat/config.ini → ./gnat.ini. Each platform gets its own section; shared settings live in [global]. No external config library is used.
Testing Strategy
Unit tests live in tests/unit/ and mock at the HTTP layer via mock_pool_manager. Integration tests in tests/integration/ are gated behind @pytest.mark.integration and the --run-integration pytest flag; they require live credentials in GNAT_CONFIG. Minimum coverage is 70 %.
Packaging and Extras
GNAT uses setuptools extras so users install only what they need. The core package requires only urllib3. Optional feature groups (yaml, taxii, ingest, async, persist, schedule, reports, viz, serve) are installed on demand. The all extra pulls everything.
→ ADR-0015: Packaging and Extras
Feed Scheduling
FeedJob wraps a (SourceReader, RecordMapper, connector) triple with a cron expression. FeedScheduler runs jobs via croniter, tracks last_success, and passes a JobRunContext to each reader factory so incremental fetches work correctly.
Export Pipeline
The export layer converts STIXBase objects to delivery-ready formats. Built-in targets include EDL (plain-text IP/domain/URL block lists) and Netskope CE. A filter chain (ConfidenceFilter, TLPFilter, SectorFilter) gates what reaches each target.
→ ADR-0017: Export / Integration Pipeline
AI Agent Layer
ResearchAgent (a SourceReader) and ParsingAgent (a RecordMapper) drop directly into the existing IngestPipeline and FeedJob infrastructure. They call the Claude API using stdlib urllib (no anthropic SDK dependency). Every AI-extracted STIX object is capped at ai_confidence_ceiling (default 60) and tagged x_source_type: "ai_extracted" to require human review before high-stakes propagation. CopilotReader connects to Microsoft 365 via the Bot Framework DirectLine v3 API.
Research Library
ResearchLibrary provides a curated, searchable store of threat reports, news, and analyst notes. CurationJob automates ingestion from monitored RSS/web sources. The library integrates with the AI agent layer for AI-assisted summarisation and with the Solr search sidecar for full-text search.
→ ADR-0019: Shared Research Library
NLP Query Layer
A natural-language query interface sits in front of workspace objects and the research library. Queries are translated to structured filters by the AI agent layer, allowing analysts to ask questions like “show me all ransomware indicators added this week” without writing code.
Rust Native Extension
An optional Rust extension module (gnat._core) accelerates hot-path IOC operations: classify, defang, refang, extract pattern value, and batch classify. The Python shim (gnat/ingest/_ioc_classifier.py) detects whether the compiled extension is available and falls back to the pure-Python implementation transparently.
→ ADR-0021: Rust Native Extension
Web Dashboard
gnat/serve/ exposes a FastAPI-based web dashboard for browsing workspaces, running queries, and reviewing AI-extracted objects. It is an optional component installed via the serve extra (fastapi + uvicorn).
Upstream Contribution Pipeline
A pipeline that formats GNAT-curated intelligence as pull requests or API submissions to open-source threat-intel communities (MISP galaxies, OpenCTI, TAXII 2.1 servers). Governed by configurable confidence thresholds and TLP markings.
→ ADR-0025: Upstream Contribution Pipeline
Connector Health Monitor
A background service that polls each registered connector’s health_check() endpoint on a configurable interval, records latency and availability metrics, and surfaces connector status in the web dashboard and CLI.
→ ADR-0026: Connector Health Monitor
TAXII 2.1 Server
An embedded TAXII 2.1 server (gnat/serve/taxii/) allows GNAT to act as a threat-intel distribution point. Collections map to connector namespaces or workspace snapshots. Requires the serve extra.
Docker Containerisation
Official Docker images and a docker-compose.yml ship with the repository. The compose stack includes the GNAT API server, Solr (search sidecar), and a scheduler container. Configuration is injected via environment variables that map to INI keys.
→ ADR-0029: Docker Containerization
Architecture Decision Records Index
All ADRs are stored in docs/explanation/architecture/adrs/ and listed in the ADR README.
| # | Title | Topic |
|---|---|---|
| 0001 | HTTP Client Layer | Infrastructure |
| 0002 | ORM / STIX Compatibility | Data model |
| 0003 | Connector Architecture | Integration |
| 0004 | Ingestion Framework | Data pipeline |
| 0005 | Context System | State management |
| 0006 | Workspace Persistence | State management |
| 0007 | Async Client | Infrastructure |
| 0008 | Visualization — Tabular | UX |
| 0009 | Visualization — Graph | UX |
| 0010 | Visualization — Grafana vs Power BI | UX |
| 0011 | CLI Design | UX |
| 0012 | Code Generation | Developer experience |
| 0013 | Configuration | Infrastructure |
| 0014 | Testing Strategy | Quality |
| 0015 | Packaging and Extras | Distribution |
| 0016 | Feed Scheduling | Data pipeline |
| 0017 | Export / Integration Pipeline | Data pipeline |
| 0018 | AI Agent Layer | Intelligence |
| 0019 | Shared Research Library | Intelligence |
| 0020 | NLP Query Layer | Intelligence |
| 0021 | Rust Native Extension | Performance |
| 0022 | Web Dashboard | UX |
| 0023 | Terminal UI — Textual | UX |
| 0024 | XSOAR Content Pack Generator | Developer experience |
| 0025 | Upstream Contribution Pipeline | Integration |
| 0026 | Connector Health Monitor | Operations |
| 0027 | Multi-Tenant Workspace Isolation | State management |
| 0028 | TAXII 2.1 Server | Integration |
| 0029 | Docker Containerization | Operations |
| 0030 | Adopt Diátaxis and ADRs | Documentation |
| 0031 | Analysis Layer Architecture | Intelligence |
| 0032 | STIX Custom Objects | Data model |
| 0033 | Confidence Scoring Model | Intelligence |
| 0034 | Report Lifecycle State Machine | Intelligence |
| 0035 | Quality Agents | Quality |
| 0036 | Security Agents (Phase B) | Quality |
| 0037 | Responsible Disclosure, DCO, and Apache 2.0 Compliance | Governance |
Key Design Principles
| Principle | Rationale |
|---|---|
| urllib3 over requests | Direct control, no extra abstraction layer, compatible with async path |
| Pure-Python ORM | STIX objects are not DB-bound; serialise to JSON, not sessions |
ConnectorMixin contract |
Every connector exposes the same CRUD + STIX surface; no special casing in pipelines |
| Extras-based packaging | Users pay only for the dependencies they actually use |
| AI confidence ceiling | AI-extracted intel requires human review before high-stakes propagation |
| INI configuration | Zero external config library; works everywhere configparser works |
| Diátaxis docs | Each document has one purpose — tutorial, how-to, reference, or explanation |
Licensed under the Apache License, Version 2.0