Skip to the content.

Architecture overview

SandGNAT is a malware runtime-analysis sandbox. It takes a binary, detonates it inside an isolated Windows VM, captures behavioural artifacts (registry, filesystem, network, processes), and emits STIX 2.1 into PostgreSQL. A pre-detonation Linux static-analysis stage clusters new submissions against known ones and skips detonation when a new sample is a near-duplicate of something already analysed.

This page is the anchor for the “how does it fit together” question. For the canonical design-of-record see MALWARE_ANALYSIS_SYSTEM_DESIGN.md.

Infrastructure topology

flowchart TB
    subgraph Proxmox["Proxmox host"]
        direction TB
        subgraph MgmtBr["Management bridge (vmbr0)"]
            Orchestrator["Job Orchestrator VM<br/>Celery + intake/export API"]
            Postgres["PostgreSQL VM<br/>STIX, jobs, signatures"]
            Redis["Redis VM<br/>Celery broker"]
        end
        subgraph AnalysisBr["Analysis bridge (vmbr.analysis)<br/>no IP on host"]
            Firewall["OPNsense firewall<br/>default-deny"]
            Windows1["Windows VM<br/>vmid 9100..9199"]
            Windows2["Windows VM"]
            Linux1["Linux static VM<br/>vmid 9200..9299"]
            Linux2["Linux static VM"]
            INetSim["INetSim<br/>fake DNS/HTTP"]
        end
        SMB[("SMB / NFS staging share")]
        Quarantine[("Quarantine store<br/>immutable, append-only")]
    end

    User((Analyst)) -->|POST /submit| Orchestrator
    GNAT((GNAT connector)) -->|GET /analyses/...| Orchestrator

    Orchestrator --> Redis
    Orchestrator --> Postgres
    Orchestrator --> SMB
    Orchestrator -.Proxmox API.-> Windows1
    Orchestrator -.Proxmox API.-> Linux1

    Windows1 -.polls staging.-> SMB
    Linux1 -.polls staging.-> SMB
    Windows1 -->|egress| Firewall
    Firewall --> INetSim

    Orchestrator --> Quarantine

Key isolation points:

See isolation-model.md for the full threat model.

Pipeline shape

A single submission traverses four stages: intake validates and stages the sample, optionally static analysis runs first, detonation captures dynamic behaviour, and export exposes the result to downstream consumers.

flowchart LR
    Start([POST /submit]) --> Validate[Validate size + hash + dedupe]
    Validate -->|reject| Rejected([400 rejected])
    Validate -->|duplicate| Duplicate([200 duplicate])
    Validate --> VT[VT hash lookup]
    VT --> YARA[YARA scan]
    YARA --> Insert[(Insert analysis_jobs)]
    Insert --> Stage[Stage bytes to SMB share]
    Stage --> Enqueue{Static enabled?}
    Enqueue -->|no| Detonate
    Enqueue -->|yes| Static[static_analyze_sample<br/>Linux pool]
    Static --> Trigrams[Compute byte + opcode MinHash]
    Trigrams --> Similar[LSH similarity lookup]
    Similar -->|≥ threshold| ShortCircuit[Mark near-duplicate<br/>link lineage]
    Similar -->|< threshold| Detonate[analyze_malware_sample<br/>Windows pool]
    ShortCircuit --> Done
    Detonate --> Parse[Parse ProcMon + RegShot + PCAP]
    Parse --> STIX[Build STIX 2.1 bundle]
    STIX --> Persist[(Persist to Postgres)]
    Persist --> Quarantine[Move dropped files to quarantine]
    Quarantine --> Evasion[Detect anti-VM behaviour<br/>evasion_detector.py]
    Evasion --> Done([status=completed<br/>evasion_observed set])

    Done -.GET /analyses/id/bundle.-> GNATConsumer((GNAT connector))

Component model

flowchart LR
    subgraph Host["Orchestrator host"]
        IntakeAPI["intake_api.py<br/>Flask: POST /submit, GET /jobs/id"]
        ExportAPI["export_api.py<br/>Flask blueprint: GET /analyses/*"]
        Intake["intake.py<br/>validate -> hash -> dedupe -> VT -> YARA"]
        TasksStatic["tasks_static.py<br/>Celery static_analyze_sample"]
        TasksDetonation["tasks.py<br/>Celery analyze_malware_sample"]
        GuestDriver["guest_driver.py<br/>submit_job / wait_for_result"]
        VmPool["vm_pool.py<br/>DB-backed lease"]
        Analyzer["analyzer.py<br/>artifacts -> STIX"]
        StaticAnalysis["static_analysis.py<br/>envelope -> bundle"]
        Similarity["similarity.py<br/>LSH lookup + decision"]
        Persistence["persistence.py<br/>all SQL lives here"]
        StixBuilder["stix_builder.py<br/>factories + UUIDv5 IDs"]
    end

    subgraph Shared["Shared wire schema (stdlib only)"]
        Schema["schema.py<br/>JobManifest / ResultEnvelope"]
        Trigrams["trigrams.py<br/>byte/opcode trigrams + MinHash"]
    end

    subgraph WinGuest["Windows guest (PyInstaller)"]
        WinWatcher["watcher.py"]
        WinRunner["runner.py"]
        WinCapture["capture/procmon, tshark, regshot, dropped"]
    end

    subgraph LinGuest["Linux static-analysis guest"]
        LinWatcher["watcher.py"]
        LinRunner["runner.py"]
        LinTools["tools/pe_elf, fuzzy, strings_entropy,<br/>yara_deep, capa, disasm_trigrams"]
    end

    IntakeAPI --> Intake
    IntakeAPI --> ExportAPI
    Intake --> Persistence
    Intake --> TasksStatic
    Intake --> TasksDetonation
    TasksStatic --> GuestDriver
    TasksStatic --> VmPool
    TasksStatic --> StaticAnalysis
    TasksStatic --> Similarity
    TasksStatic --> Persistence
    TasksDetonation --> GuestDriver
    TasksDetonation --> VmPool
    TasksDetonation --> Analyzer
    TasksDetonation --> Persistence
    Analyzer --> StixBuilder
    StaticAnalysis --> Trigrams
    Similarity --> Persistence

    GuestDriver -->|JobManifest| Schema
    WinWatcher --> Schema
    LinWatcher --> Schema
    LinTools --> Trigrams

    ExportAPI --> Persistence

Happy-path sequence (detonation)

sequenceDiagram
    participant U as User / Upstream
    participant API as intake_api
    participant PG as Postgres
    participant Q as Redis
    participant ST as static task
    participant DT as detonation task
    participant SMB as staging share
    participant LG as Linux guest
    participant WG as Windows guest

    U->>API: POST /submit (bytes)
    API->>API: validate, hash, dedupe, VT, YARA
    API->>PG: INSERT analysis_jobs (status=queued)
    API->>SMB: write samples/{id}/name
    API->>Q: enqueue static_analyze_sample
    API-->>U: 202 {analysis_id, priority, ...}

    Q->>ST: static_analyze_sample(id)
    ST->>PG: acquire linux vmid lease
    ST->>SMB: publish manifest (mode=static_analysis)
    LG->>SMB: claim job, read bytes
    LG->>LG: pefile, ssdeep, YARA, CAPA, trigrams
    LG->>SMB: write static_analysis.json + trigrams + result.json
    ST->>SMB: poll for result.json
    ST->>PG: persist static_analysis, sample_trigrams, bands
    ST->>PG: LSH candidate fetch + Jaccard
    alt Jaccard >= threshold
        ST->>PG: mark near_duplicate_of
        ST-->>Q: done, no detonation
    else Jaccard < threshold
        ST->>Q: enqueue analyze_malware_sample
    end

    Q->>DT: analyze_malware_sample(id)
    DT->>PG: acquire windows vmid lease
    DT->>SMB: publish manifest (mode=detonation)
    WG->>SMB: claim job
    WG->>WG: RegShot baseline, start ProcMon + tshark
    WG->>WG: execute sample with timeout
    WG->>WG: collect dropped files, RegShot diff
    WG->>SMB: write artifacts + result.json
    DT->>SMB: poll for result.json
    DT->>DT: parse + analyze -> STIX bundle
    DT->>PG: persist STIX + normalised rows
    DT->>PG: update analysis_jobs (status=completed)
    DT->>PG: release vmid lease

Request-time dependencies

Submissions are synchronous up to the enqueue. Actually waiting for a detonation is minutes of queued+VM-boot+timeout+capture-export, so the client polls GET /jobs/<id> or GET /analyses/<id> until status becomes completed or failed.

Step Typical latency Blocking?
POST /submit 50–500 ms yes
Static stage 15–120 s no (async)
Windows VM boot + detonation 3–10 min no (async)
Artifact export to SMB 5–30 s no (async)
STIX persist + quarantine 1–5 s no (async)
Bundle fetch 20–200 ms yes

What lives where

Concern Module
HTTP surface intake_api.py, export_api.py
Input validation + prioritisation intake.py
VT + YARA pre-checks vt_client.py, yara_scanner.py
Celery tasks tasks.py, tasks_static.py
VM pool vm_pool.py
Proxmox API calls proxmox_client.py
Host ↔ guest filesystem protocol guest_driver.py
Wire schema (shared) schema.py
Artifact parsers (pure) parsers/*.py, static_analysis.py
STIX factories stix_builder.py
Similarity engine similarity.py, trigrams.py
Anti-analysis mitigations guest_agent/activity/, guest_agent/stealth/
Evasion detection (post-run) evasion_detector.py
All SQL persistence.py

This split matters: parsers and STIX factories are pure (no DB, no network) so they’re trivially unit-testable; Celery tasks glue pure code to the real world.