GNAT Implementation and Architecture Plan

GNAT — Universal Cyber Threat Management Library
Version 1.0.0 | Architecture as of April 2026

Overview

GNAT is a universal Python client and STIX 2.1 ORM library providing a single abstracted interface across multiple security platforms. It solves the platform proliferation problem: security teams typically maintain point-to-point integrations between every combination of tools, each with its own auth model, data format, and API quirks. GNAT replaces that web of bespoke integrations with one library, one configuration file, and one operational model.

Architecture

Layer model

┌─────────────────────────────────────────────────────────────────┐
│  Application Layer                                               │
│  (analysts, scheduled jobs, report generation, export pipelines) │
├─────────────────────────────────────────────────────────────────┤
│  Orchestration Layer                                             │
│  FeedScheduler · FeedJob · ExportJob · ReportJob · CurationJob  │
├──────────────────────────────────┬──────────────────────────────┤
│  Intelligence Layer              │  AI Agent Layer               │
│  ResearchLibrary · Workspace     │  ResearchAgent · ParsingAgent │
│  GlobalContextRegistry           │  CopilotReader                │
├──────────────────────────────────┴──────────────────────────────┤
│  Pipeline Layer                                                  │
│  IngestPipeline · ExportPipeline · ReportGenerator               │
│  SourceReaders · RecordMappers · ExportFilters · Transforms      │
├─────────────────────────────────────────────────────────────────┤
│  Abstraction Layer (the middle layer)                            │
│  STIX 2.1 ORM · ConnectorMixin · BaseClient                      │
├─────────────────────────────────────────────────────────────────┤
│  Connector Layer                                                 │
│  159 platform connectors across SIEM, TIP, EDR, ASM, VM,        │
│  SOAR, IDS/IPS, MDR, ITDR, Email, UEBA, BAS, DFIR, Sandboxes,   │
│  Cert Transparency, Bug Bounty, OSINT, and AI categories        │
└─────────────────────────────────────────────────────────────────┘

The abstraction layer is the value

Every connector implements the same five-method interface: authenticate(), get_object(), list_objects(), upsert_object(), delete_object() — plus to_stix() and from_stix() for data translation. Code written against this interface works with any connector. A pipeline that ingests from ThreatQ works identically against CrowdStrike or VirusTotal by changing one string in the config file.

Package Structure

gnat/
├── orm/                    STIX 2.1 ORM (Indicator, ThreatActor, Vulnerability, ...)
├── clients/                BaseClient (urllib3), CLIENT_REGISTRY
├── connectors/             159 platform connectors
├── ingest/                 SourceReaders (14), RecordMappers (12), IngestPipeline
├── export/                 ExportFilters, Transforms (EDL, Netskope), Delivery, ExportJob
├── schedule/               FeedJob, FeedScheduler, APScheduler/Celery adapters
├── context/                Workspace, WorkspaceManager, GlobalContextRegistry, stores
│   └── tenant.py           Multi-tenant workspace isolation (TenantRegistry)
├── search/                 Solr search sidecar (GNATIndexer, SearchMixin, ORM/pipeline integration)
├── agents/                 ResearchAgent, ParsingAgent, CopilotReader, ClaudeClient
│   └── health_monitor.py   ConnectorHealthJob — periodic health + API schema drift detection
├── research/               ResearchLibrary, ResearchEntry, CurationJob
├── reports/                ReportGenerator, ReportJob, 4 renderers, 2 delivery targets
├── viz/                    TabularView, GraphView (3 layout algorithms), GrafanaServer, sigma.js
├── nlp/                    NLPQueryEngine, QuerySpec, builtin + Claude backends
├── tui/                    GNATApp (Textual 8.x), 4 screens, STIXTable/JobTable widgets
├── serve/                  Web dashboard (FastAPI, X-Api-Key auth, rate limiting)
│   └── taxii/              TAXII 2.1 server (workspaces as collections)
├── stix/                   STIX pattern validator (validate_pattern, PatternValidationError)
├── async_client/           AsyncBaseClient, AsyncGNATClient (httpx)
└── codegen/                OpenAPI connector scaffold + XSOAR content pack generator
    └── contribute.py       ContributionPipeline (7-step gate + draft PR)

Connector Map

Platform	Module	Auth	Read	Write	Status
AlienVault OTX	alienvault	API key	✓	—	Stable
Armis Centrix	armis	Secret key	✓	—	Stable
AWS Security Hub / GuardDuty	aws_security	AWS SigV4	✓	—	Stable
Axonius	axonius	API key + secret	✓	—	Stable
BitSight Security Ratings	bitsight	API token	✓	—	Stable
VMware Carbon Black Cloud	carbon_black	API key (composite)	✓	—	Stable
Censys Internet Intelligence	censys	API ID + secret	✓	—	Stable
OpenAI ChatGPT	chatgpt	API key	✓	—	LLM
CISA KEV Catalog	CISA	None (public)	✓	—	Stable
Cisco Umbrella	cisco_umbrella	Bearer / API key	✓	—	Stable
Claroty Platform (OT/IoT)	claroty	API token	✓	—	Stable
CloudSEK Digital Risk	cloudsek	API key	✓	—	Stable
MS Copilot for Security	copilot	DirectLine / Bearer	✓	—	LLM
Palo Alto Cortex XDR / XSIAM	cortex_xdr	HMAC API key pair	✓	—	Stable
Cortex Xpanse (External ASM)	cortex_xpanse	API key	✓	—	Stable
Cribl Stream	cribl	Bearer	✓	—	Stable
CrowdStrike Falcon	crowdstrike	OAuth2	✓	✓	Stable
Cyble Vision	cyble_vision	API key	✓	—	Stable
CyCognito ASM	cycognito	Bearer	✓	—	Stable
Darktrace Enterprise	darktrace	HMAC public/private	✓	—	Stable
Datadog Cloud SIEM	datadog	API key + App key	✓	—	Stable
DefectDojo	defectdojo	API token	✓	✓	Stable
MS Defender Threat Intelligence	defenderti	OAuth2 (Azure AD)	✓	—	Stable
Discord	discord	Bot token	✓	✓	Beta
Dragos Platform (OT/ICS)	dragos	Basic (key+secret)	✓	—	Stable
Elastic SIEM	elastic	API key / Basic	✓	✓	Stable
ExtraHop Reveal(x) NDR	extrahop	API key / OAuth2	✓	—	Stable
Feedly Threat Intelligence	feedly	OAuth2 / API key	✓	—	Stable
Flare (Darknet/Threat Exposure)	flare	Bearer	✓	—	Stable
Flashpoint Underground	flashpoint	Bearer	✓	—	Stable
Fortinet FortiEDR	fortiedr	Basic	✓	—	Stable
Fortinet FortiSIEM	fortisiem	Basic	✓	—	Stable
Fortinet FortiSOAR	fortisoar	JWT / Basic	✓	✓	Stable
Google Gemini	gemini	API key	✓	—	LLM
Google Chronicle (SecOps SIEM)	google_chronicle	OAuth2 / Svc account	✓	—	Stable
Graylog	graylog	API key / Basic	✓	—	Stable
Greenbone / OpenVAS	greenbone	GMP username/password	✓	—	Stable
GreyMatter	greymatter	API key	✓	✓	Stable
GreyNoise	greynoise	API key	✓	—	Stable
Grok AI	grok	API key	✓	—	LLM
Group-IB Threat Intelligence	group_ib	API token	✓	—	Stable
Have I Been Pwned (HIBP)	hibp	API key	✓	—	Stable
Hudson Rock Breach Intelligence	hudsonrock	API key	✓	—	Stable
Intel 471 Cybercrime Intel	intel471	Bearer	✓	—	Stable
Atlassian Jira	jira	Basic / Bearer	✓	✓	Stable
JupiterOne (CAASM)	jupiterone	Bearer	✓	—	Stable
Lansweeper IT Asset Management	lansweeper	OAuth2 / API key	✓	—	Stable
LogRhythm NextGen SIEM	logrhythm	Bearer	✓	—	Stable
Mandiant Advantage	mandiant	OAuth2	✓	—	Stable
MISP Threat Sharing	misp	API key	✓	✓	Stable
Netskope SASE / SSE	netskope	API token	✓	✓	Stable
Nozomi Networks Guardian	nozomi	API token	✓	—	Stable
Nucleus Security	nucleus	API key	✓	✓	Stable
OpenCTI	opencti	API key	✓	✓	Stable
Orca Security (CNAPP)	orca	Bearer	✓	—	Stable
OSINT Feed (TAXII / STIX-JSON)	osint_feed	Various	✓	—	Stable
OSSIM	ossim	Basic auth	✓	—	Stable
Palo Alto Prisma Cloud	prisma_cloud	JWT (access key)	✓	—	Stable
Proofpoint TAP	proofpoint	Basic auth	✓	—	Stable
PulseDive	pulsedive	API key	✓	—	Stable
IBM QRadar	qradar	API token	✓	—	Stable
Qualys VMDR	qualys	Basic	✓	—	Stable
Rapid7 InsightVM / IDR	rapid7	API key	✓	Partial	Stable
Recorded Future	recordedfuture	API token	✓	—	Stable
RiskRecon	riskrecon	OAuth2	✓	—	Stable
Security Onion	security_onion	Bearer	✓	—	Stable
SecurityScorecard	securityscorecard	API token	✓	—	Stable
Microsoft Sentinel	sentinel	OAuth2 (Azure AD)	✓	✓	Stable
SentinelOne Singularity XDR	sentinelone	API token	✓	—	Stable
ServiceNow ITSM	servicenow	Basic / Bearer	✓	✓	Stable
ServiceNow SecOps (SIR/VR)	servicenow_secops	Basic / Bearer	✓	✓	Stable
Shadowserver Foundation	shadowserver	HMAC	✓	—	Stable
Shodan	shodan	API key	✓	—	Stable
Snort IDS	snort	File / Syslog	✓	—	Stable
SOCRadar Extended TI	socradar	API key	✓	—	Stable
Sophos Central	sophos	OAuth2	✓	—	Stable
Splunk	splunk	Basic / token	✓	✓	Stable
Stellar Cyber Open XDR	stellarcyber	API key	✓	—	Stable
Suricata IDS/IPS	suricata	File / EVE JSON	✓	—	Stable
Vertex Project Synapse	synapse	API key / Bearer	✓	—	Stable
Tanium Endpoint Management	tanium	API token / session	✓	—	Stable
Tenable One Exposure Mgmt	tenable_one	X-ApiKeys	✓	—	Stable
TheHive Security IR	thehive	API key	✓	✓	Stable
ThreatConnect	threatconnect	OAuth2 / API token	✓	✓	Stable
ThreatQ	threatq	OAuth2	✓	✓	Stable
Anomali ThreatStream (OPTIC)	threatstream	API key + username	✓	✓	Stable
Trellix XDR / ePO	trellix	OAuth2	✓	—	Stable
Trend Micro Vision One XDR	trendmicro_visionone	Bearer	✓	—	Stable
UpGuard Vendor Risk / CAASM	upguard	API key	✓	—	Stable
Vectra AI NDR	vectra	Bearer	✓	—	Stable
VirusTotal	virustotal	API key	✓	—	Stable
Wazuh SIEM/XDR	wazuh	API key / Basic	✓	—	Stable
Whistic (Vendor Risk)	whistic	API key	✓	—	Stable
Wiz CNAPP	wiz	OAuth2	✓	—	Stable
Palo Alto XSOAR	xsoar	API key	✓	✓	Stable
YETI (Everyday Threat Intel)	yeti	API key	✓	✓	Stable
Zeek Network Monitor	zeek	File / TSV-JSON	✓	—	Stable
ZeroFox Digital Risk Protection	zerofox	Bearer	✓	—	Stable
ControlUp DEX	controlup	Bearer	✓	—	Stable

Status legend: Stable = full ConnectorMixin interface implemented and tested. LLM = LLM assistant connector; wraps an AI API rather than a security platform; read-only, no STIX write-back. Beta = connector scaffolded but not yet fully integrated.

Deployment Architecture

Recommended: Single Azure B2ms VM, three systemd services

Why single VM: All workloads are I/O-bound (network calls to APIs, file writes). CPU is idle 95%+ of the time. A B2ms (2 vCPU, 8GB RAM) handles 30+ scheduled feeds, export pipelines, and daily reports without contention.

Three services:

Service 1: gnat-scheduler.service
  — FeedScheduler running all ingest, export, curation, and report jobs
  — Restarts automatically on failure (missed one run = acceptable)

Service 2: gnat-edl.service
  — EDLServer on port 8080 (or behind nginx)
  — Reads pre-written files; completely independent of scheduler
  — Firewalls poll this; uptime matters more than scheduler uptime

Service 3: gnat-monitor.service (optional)
  — Lightweight HTTP endpoint returning scheduler.summary() as JSON
  — Azure Monitor or simple ping check hooks into this

Why separate the EDL server: If the scheduler crashes and restarts (e.g. after a failed API call), the EDL server continues serving the last-written files. Firewalls never see a gap. Keeping them as separate services provides the only fault isolation that actually matters.

Azure specifics

VM size: B2ms or B4ms — burstable, accumulates CPU credits while idle
Storage: Premium SSD P6 (64GB) for OS + workspace store + report output
Networking: Private endpoint preferred — firewalls reach EDL server via Azure VNet private IP, eliminating public IP costs and outbound transfer costs
Cost estimate: B2ms (~$70/mo) + P6 disk (~$10/mo) + private IP = ~$80/mo
Scaling path: If AI agent workloads grow, offload research jobs to Azure Container Instances (per-job cost) rather than upsizing the VM

systemd service template

# /etc/systemd/system/gnat-scheduler.service
[Unit]
Description=GNAT Feed Scheduler
After=network.target

[Service]
Type=simple
User=ctmsak
WorkingDirectory=/opt/gnat
ExecStart=/opt/gnat/venv/bin/python -m gnat.scheduler_main
Restart=on-failure
RestartSec=30
Environment=GNAT_CONFIG=/etc/gnat/config.ini

[Install]
WantedBy=multi-user.target

Data Flow

Ingest flow

External Source (ThreatQ, RF, Feedly, TAXII, CSV, ...)
  ↓  SourceReader._iter_records()
  ↓  RawRecord (dict)
  ↓  RecordMapper.map()
  ↓  STIXBase objects
  ↓  DeduplicationCache (optional)
  ↓  Workspace.add() or GNATClient.upsert_object()
  ↓  IngestResult (total_records, written_objects, errors)

Export flow

Workspace or GNATClient
  ↓  ExportFilter.apply() (type, confidence, TLP, sector, ...)
  ↓  ExportTransform.transform() (EDL text, Netskope CE JSON, STIX bundle, CSV)
  ↓  ExportDelivery.deliver() (file, HTTP, EDL server, platform, email, SP)
  ↓  DeliveryResult

AI research flow

Topic / Monitored URL list
  ↓  ResearchAgent._iter_records() [Claude + web search]
  ↓  RawRecord {text, url, topic, metadata}
  ↓  ParsingAgent.map() [Claude structured extraction]
  ↓  STIXBase objects (confidence ≤ 60, x_source_type=ai_extracted)
  ↓  Workspace → analyst review
  ↓  ResearchLibrary.promote() → staging
  ↓  CurationJob → library

Report generation flow

Workspace / Library
  ↓  DataAggregator.run() — pure data, no AI
  ↓  ReportAggregates (counts, distributions, top-N lists)
  ↓  ReportSynthesizer.synthesize() — one Claude call per section
  ↓  ReportDocument (ordered sections: data + narrative)
  ↓  Renderer (Markdown → HTML → PDF → DOCX)
  ↓  Delivery (email SMTP, SharePoint Graph API)

Key Design Decisions

1. urllib3 foundation, no requests

BaseClient uses urllib3 directly. No third-party HTTP library dependency beyond urllib3 (already a transitive dep of most Python environments). Connection pooling, retry/backoff, and SSL verification all handled at the BaseClient level — connectors inherit this transparently.

2. STIX as lingua franca

All cross-platform data passes through STIX 2.1 objects. to_stix() converts native platform data to STIX; from_stix() converts back. This means a workflow like “pull from CrowdStrike, enrich with Recorded Future, push to ThreatQ” requires no platform-specific logic in the pipeline — just connector configuration.

3. INI configuration, no code for deployment changes

Every parameter that varies between deployments (hosts, credentials, TTLs, sectors, report schedules) lives in config.ini. Operators can reconfigure the system without touching Python code. Connector targets can be swapped by editing one line.

4. FeedJob / FeedScheduler threading model

One daemon thread per job. Threads sleep in 1-second increments (not one long sleep) so stop() responds within ~1 second. Drift-corrected timing keeps hourly jobs at hourly intervals even when runs take variable time. overlap_policy="skip" (default) prevents queue buildup on slow sources.

5. AI confidence ceiling

All AI-extracted objects carry confidence ≤ ai_confidence_ceiling (default 60) and x_source_type="ai_extracted". This means:

ConfidenceFilter(min_confidence=70) in export pipelines excludes AI intel by default — analyst review required before it reaches EDLs
The tag lets analysts find and verify AI-extracted objects
Never raise the ceiling to 100 without a review process in place

6. Research library three-tier model

Personal workspaces → staging → curated library. Analysts never write directly to the library. The CurationJob is the only thing that promotes staging entries to the library, providing a consistent curation gate. Deduplication keeps the library pruned to one authoritative entry per topic (most recent wins). TTLs per category ensure stale research is clearly flagged.

7. Report generation two-pass model

Aggregation (pure data, no AI) runs first. AI synthesis receives compact structured aggregates, not raw STIX blobs. This keeps prompts small and focused, makes individual section failures recoverable, and ensures reports can be generated with ai_mode=NONE without any API dependency.

Sector / Industry Normalization (PENDING)

The canonical field for sector data across all GNAT objects is x_target_sectors — a list of strings on any STIX object.

Status: Placeholder in ThreatQ connector. Connector needs updating once ThreatQ field names are verified (see CHANGELOG.md for completed work).

Architecture:

ThreatQ API response
  ↓  ThreatQClient.to_stix()
  ↓  x_target_sectors = [normalize(v) for v in tq_industries + tq_sectors]

[sector_aliases] in config.ini
  "healthcare = Healthcare, Health, Medical, H-ISAC, ..."
  ↓  SectorFilter._matches_sector()
  ↓  Alias-expanded matching across all platforms

Testing

Test suite: 3,071+ unit tests across 44 test files (plus connector-embedded tests).

# Run all tests
pytest tests/

# Run by module
pytest tests/unit/connectors/
pytest tests/unit/ingest/
pytest tests/unit/export/
pytest tests/unit/schedule/
pytest tests/unit/agents/
pytest tests/unit/research/
pytest tests/unit/reports/

# Run with coverage
pytest --cov=gnat tests/

All tests use unittest.mock — no live API calls, no network required. New connectors follow the pattern in tests/unit/connectors/test_connectors.py.

Dependencies

Core (no extras required)

Python 3.9+
urllib3 — HTTP client foundation
Standard library only for: STIX ORM, INI config, schedule, HMAC signing, JSON handling, email delivery

Optional extras

pip install "gnat[async]"           # httpx — AsyncGNATClient
pip install "gnat[taxii]"           # taxii2-client — TAXIICollectionReader
pip install "gnat[rss]"             # feedparser — RSS feed reader
pip install "gnat[ingest]"          # taxii2-client + feedparser — full ingest pipeline
pip install "gnat[persist]"         # sqlalchemy — WorkspaceStore (SQLite/PostgreSQL)
pip install "gnat[viz]"             # plotly, networkx, openpyxl — GraphView, TabularView
pip install "gnat[schedule]"        # croniter — cron expression scheduling
pip install "gnat[reports]"         # reportlab + python-docx — PDF and DOCX rendering
pip install "gnat[serve]"           # fastapi, uvicorn — Web dashboard + TAXII server
pip install "gnat[tui]"             # textual — Terminal UI
pip install "gnat[nlp]"             # NLP query interface (builtin backend; zero extra deps)
pip install "gnat[stix-validate]"   # stix2-patterns — full ANTLR STIX pattern validation
pip install "gnat[fast]"            # gnat-core Rust wheel — accelerated IOC classify/defang
pip install "gnat[all]"             # All of the above (except fast/stix-validate)
pip install "gnat[dev]"             # All + dev tools (ruff, mypy, pytest, bandit)

Full install

pip install "gnat[all]"

Roadmap

Completed ✅

All near-term and medium-term roadmap items have shipped:

Item	Status
ThreatQ / RF / CrowdStrike sector normalization	✅ Done
`SectorFilter` moved to `gnat/export/filters.py`	✅ Done
`gnat report run` CLI subcommand	✅ Done
Web UI — research library, scheduler, report viewer	✅ Done (#23b)
Terminal UI — 4 screens, NLP query bar	✅ Done (#23a)
Connector health + drift monitoring agent	✅ Done (#24)
Upstream contribution pipeline	✅ Done (#25)
DOCX rendering (python-docx, no Node.js)	✅ Done
Docker containerization (`docker/`, `docker-compose.yml`)	✅ Done (#22)
XSOAR content pack generator	✅ Done (#21)
NLP query interface	✅ Done (#18)
Client capability reflection	✅ Done (#19)
TAXII 2.1 server (`gnat/serve/taxii/`)	✅ Done
STIX pattern validator (`gnat/stix/`)	✅ Done
Docker-based integration test harness	✅ Done
Solr/Grafana observability (`gnat/viz/grafana/`)	✅ Done
Multi-tenant workspace isolation (`gnat/context/tenant.py`)	✅ Done
Rust native extension (`rust_core/`, `gnat[fast]`)	✅ Done
Expanded connector coverage (159 connectors)	✅ Done

In progress / Near term

All near-term items have shipped. See completed list above.

Medium term

STIX 2.1 full object-level validation against official spec
Analyst workflow UI — structured review/approval queue before promoting AI-extracted intel to production workspaces
Discord connector full integration (currently Beta)

Long term

Federated multi-GNAT deployments with cross-instance workspace sync

Licensed under the Apache License, Version 2.0