Malware Runtime Analysis Environment — Design & Implementation Plan
Document Version: 1.0 Date: April 2026 Purpose: Comprehensive design specification for an automated malware detonation and artifact analysis system built on Proxmox with STIX 2.1 output and Postgres persistence.
Executive Summary
This document outlines a production-grade malware runtime analysis system that automates the detonation of suspicious binaries in isolated Windows virtual machines, captures behavioral artifacts, and exports findings as structured STIX 2.1 objects stored in PostgreSQL. The system enforces isolation at the infrastructure level using Proxmox network bridges, integrates industry-standard monitoring tools (ProcMon, RegShot, Wireshark, CaptureBAT), and provides an extensible job queue for scaling analysis capacity.
Key Design Principles:
- Isolation-by-default: Separate network bridge for analysis VMs, OPNsense/pfSense firewall control
- Snapshot-based reset: Clean state guaranteed between analyses via VM snapshots
- Comprehensive artifact capture: Registry deltas, file I/O, network traffic, process chains
- STIX 2.1 native: All behavioral findings modeled as STIX Malware, Indicator, File, Process, and NetworkTraffic objects
- Postgres source of truth: All analysis results, metadata, and audit trails stored in Postgres; search sidecar optional
- Minimal operational complexity: Infrastructure-as-code approach with Proxmox API, Python job orchestration
Architecture Overview
1. Infrastructure Topology
┌─────────────────────────────────────────────────────────────────┐
│ PROXMOX HOST (KVM) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────┐ ┌──────────────────────┐ │
│ │ Management Bridge (vmbr0) │ │ Analysis Bridge │ │
│ │ 172.16.0.0/24 │ │ (vmbr.analysis) │ │
│ │ No IP on Proxmox host │ │ 192.168.100.0/24 │ │
│ │ (management isolated) │ │ No IP on Proxmox │ │
│ │ │ │ (untrusted VMs only) │ │
│ └────────────────────────────────┘ └──────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────┘ │
│ │ │
│ ┌────▼─────────────────────────────────────────────────┐ │
│ │ OPNsense/pfSense Firewall VM (on vmbr.analysis) │ │
│ │ - DMZ gateway for analysis network │ │
│ │ - Explicit allow rules (INetSim, DNS control) │ │
│ │ - Default deny egress (kill-switch policy) │ │
│ └────┬─────────────────────────────────────────────────┘ │
│ │ │
│ ┌────▼──────────────────────────────────────────────────┐ │
│ │ Job Orchestrator VM (Debian/Rocky on vmbr.analysis) │ │
│ │ - Python job queue (Celery or RQ) │ │
│ │ - Proxmox API client │ │
│ │ - STIX model layer + Postgres driver │ │
│ │ - Malware sample intake & validation │ │
│ │ - Result aggregation & export │ │
│ └────┬──────────────────────────────────────────────────┘ │
│ │ │
│ ┌────▼──────────────────────────────────────────────────┐ │
│ │ Analysis Guest VMs (Windows 10/11 Hardened) │ │
│ │ [FLARE-VM Template] │ │
│ │ - Minimal attack surface (no AV, bloatware removed) │ │
│ │ - ProcMon, RegShot, Wireshark, CaptureBAT │ │
│ │ - RDP bridge (inbound only, firewall restricted) │ │
│ │ - Dedicated snapshot for revert │ │
│ │ - Dropped file collector agent (Windows Service) │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL VM (Dedicated LVM volume) │ │
│ │ - STIX schema (Malware, Indicator, File, Process, etc.) │ │
│ │ - Analysis metadata tables │ │
│ │ - Full-text index on IOCs │ │
│ │ - Audit trail & versioning │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Quarantine/Storage (Dedicated LVM + NFS) │ │
│ │ - Isolated dropped file repository │ │
│ │ - Immutable archive (append-only) │ │
│ │ - Hash verification on ingestion │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Dual ISP (Xfinity + Verizon 5G)
Host WAN Isolation (no outbound to VMs)
Key Decisions:
- vmbr.analysis: A dedicated Linux bridge with NO IP address on the Proxmox host itself. This ensures VMs cannot reach Proxmox management. All inbound/outbound controlled by OPNsense/pfSense running on the same bridge.
- OPNsense/pfSense: Acts as the firewall and gateway for the analysis network. Rules allow egress only to legitimate analysis sinks (INetSim, DNS control host). Host WAN link blocked to prevent data exfiltration.
- Job Orchestrator: Central controller running Python job queue (Celery + Redis or RQ). Communicates with Proxmox API, coordinates sample intake, VM lifecycle, and result aggregation.
- PostgreSQL: Dedicated VM with its own LVM logical volume. Stores all STIX objects, analysis metadata, and search indices.
- Quarantine Storage: Separate NFS/SMB volume for dropped file retention, immutable audit trail.
2. Analysis VM Configuration (FLARE-VM Template)
2.1 Base Image & Hardening
Template Spec:
- OS: Windows 10 22H2 or Windows 11 (patched)
- RAM: 4–8 GB (per analysis capacity plan)
- vCPU: 2–4 cores
- Disk: 60–80 GB (FLARE-VM full installation ~60 GB)
- Network: vmbr.analysis (bridged, 192.168.100.0/24)
- Storage: Local or cluster LVM
Installed Tools (FLARE-VM Kit):
- ProcMon (Sysinternals): Real-time file, registry, process, network monitoring. Filter by target PID to reduce noise. Export to CSV/XML for STIX ingestion.
- Process Explorer (Sysinternals): Process tree, DLL inspection, memory dump capability.
- RegShot: Registry snapshot before/after malware execution. Output: .txt diff (parsed into STIX ManifestEntry and File objects).
- Wireshark: Packet capture (bridged mode). Output: .pcap for network IOC extraction.
- CaptureBAT (now superseded by Cuckoo agent, but still useful): File I/O capture, registry logging, produces structured logs.
- INetSim (on firewall VM): Fake DNS/HTTP/FTP to prevent real C2 callbacks and capture network IOCs.
Hardening:
- Remove bloatware (Office, Store, Cortana, Defender Auto-update).
- Disable Windows Update to preserve snapshot state.
- Antivirus: DISABLED (analysis-side; detect via monitoring tools instead).
- Network: Static IP (192.168.100.10–20), DNS controlled by firewall (INetSim).
- RDP: Enabled (local admin only), accessible only from Job Orchestrator over UFW-blocked bridge.
- Windows Defender/SmartScreen: Disabled (we want to see true malware behavior).
- User Account Control (UAC): Disabled (admin operations must run unimpeded).
Snapshot Workflow:
- Create VM from template.
- Boot, install FLARE-VM, run system optimization.
- Boot clean state, start all monitoring tools, take “clean” snapshot.
- Run malware, capture artifacts, shut down.
- Revert to “clean” snapshot.
- Repeat.
3. Monitoring & Artifact Capture
3.1 Pre-Execution Baseline
- Registry Snapshot (RegShot 1st Shot): Captures HKLM, HKCU, HKU. Output:
Registry.txt(baseline). -
File Baseline (FSUtil):
fsutil fsinfo statistics c: > c:\baseline\fsstat.txt dir /s c:\windows > c:\baseline\windows_baseline.txt dir /s c:\program files > c:\baseline\programfiles_baseline.txt -
Process Baseline:
tasklist /v > c:\baseline\processes_baseline.txt -
Wireshark Start:
tshark -i Ethernet -w c:\captures\capture.pcap -f "not (arp or stp)" - ProcMon Start: Capture filter: Include all, will filter post-execution. Export path:
c:\captures\procmon.pml.
3.2 Malware Execution
- Sample Source: Attested by hash check against known malware databases (VirusTotal, etc.).
- Delivery Method: SMB share from Job Orchestrator, or RDP copy + manual execution via orchestrator command.
- Timeout: 2–5 minutes (configurable per sample class: RAT/Worm/Dropper).
- Interaction: Basic (e.g., allow initial dialog; intercept C2 via INetSim).
- Evasion Detection: Monitor for VM detection (look for VBOX, VMWARE strings, hyper-v checks). Log and flag for potential evasion.
3.3 Post-Execution Capture
- Registry Snapshot (RegShot 2nd Shot): Diff →
Registry.txt.html. - ProcMon Stop & Export: Export CSV filtered by malware PID + children. Extract
RegSetValue,WriteFile,CreateKey,DeleteKeyevents. -
Wireshark Stop & Export:
tshark -r c:\captures\capture.pcap -T fields \ -e frame.time -e ip.src -e ip.dst -e dns.qry.name \ > c:\captures\network_summary.txt - File I/O Collection: Scan
AppData\Local\Temp,AppData\Roaming,ProgramData,Windows\Temp,$Recycle.Bin. Hash all new files (MD5, SHA1, SHA256). Copy to quarantine. - Memory Dump (Optional): On signs of code injection (
CreateRemoteThread,WriteProcessMemory). - Artifact Compression & Staging: Copy
c:\artifacts\*to\\192.168.100.1\analysis\[job_id]\.
4. STIX 2.1 Object Modeling
All captured artifacts are modeled as STIX 2.1 Cyber Observables and Malware objects. Design constraint: all hypothesis and evidence objects are STIX types, stored in Postgres as JSONB with normalized indices.
Object types used: malware, file, process, network-traffic, indicator, directory, ipv4-addr, domain-name. Every object carries an x_analysis_metadata extension tying it back to an analysis_id, VM UUID, sample hash, and tool provenance.
See orchestrator/stix_builder.py for the canonical factories and docs/stix_examples/ for wire-format samples.
5. PostgreSQL Schema
The authoritative schema lives at migrations/001_initial_schema.sql. It defines:
analysis_jobs— job lifecycle metadata.stix_malware,stix_observables,stix_indicators— STIX SDO/SCO storage (raw JSONB + extracted columns + GIN indices).dropped_files— immutable audit trail for quarantined artifacts.registry_modifications— RegShot-derived registry deltas with persistence flag.network_iocs— Wireshark/INetSim-derived network indicators.analysis_audit_log— append-only event log.ioc_fts— optional full-text search sidecar with atsvectortrigger.
6. Job Queue & Orchestration
6.1 Job Lifecycle
1. Sample submission
2. Intake validation (hash check, YARA scan, file-type validation)
3. Enqueue (row in analysis_jobs, message to Celery)
4. VM lifecycle (spin up from snapshot, boot, start monitoring)
5. Malware execution (SMB/RDP delivery, timeout)
6. Artifact capture (stop tools, enumerate drops, copy to quarantine)
7. STIX generation (parsers → STIX factories)
8. Postgres persistence (bundle + normalized tables + audit log)
9. VM reset (revert snapshot)
10. Ready for next job
6.2 Tech Stack
- Message Broker: Redis (default) or RabbitMQ.
- Job Queue: Celery. Celery beat handles periodic retries with exponential backoff.
- Task entry point:
orchestrator.tasks.analyze_malware_sample.
7. Isolation & Security Best Practices
7.1 Network Isolation
OPNsense default-deny ruleset:
Default: DENY all
Allow:
- Inbound DNS (UDP 53) to INetSim (192.168.100.2)
- Inbound NTP (UDP 123) to time server
- Inbound HTTP/HTTPS (80, 443) to INetSim honeypot
- Inbound SMB (445) from Job Orchestrator only
- Inbound RDP (3389) from Job Orchestrator only
Deny:
- ALL outbound to Xfinity/Verizon gateway
- ALL outbound to management network (172.16.0.0/24)
- ALL multicast, broadcast
7.2 Host-Level Hardening
- Proxmox UFW: deny from
192.168.100.0/24entirely; allow management only from172.16.0.0/24. - Monthly Proxmox security patching.
- Disable nested virtualization on analysis VMs (reduce escape surface).
- IOMMU/VFIO only for explicitly isolated device pass-through.
7.3 Snapshot-Based Reset
- Never modify a “clean” snapshot; always revert to it.
- Weekly off-storage backup of the clean snapshot.
- Audit every revert (Proxmox task log +
analysis_audit_log).
7.4 Sample Handling & Chain of Custody
- Hash sample on intake.
- Verify against VirusTotal / known-bad lists.
- Store sample in encrypted, access-controlled quarantine.
- Never expose sample hash/path in logs or UI outside authenticated contexts.
- After analysis, delete original sample; retain only quarantined drops + STIX objects.
8. STIX Output & Threat Intelligence Integration
- Postgres remains source of truth. STIX bundle export via
orchestrator.stix_builder.export_bundle(analysis_id). - Planned connectors: ThreatQ, Recorded Future, OpenCTI (indicator push).
- YARA rule auto-generation from high-confidence indicators.
- Snort/Zeek rule export from
network_iocs.
9. Implementation Roadmap
Phase 1: Foundation (Weeks 1–3)
- Proxmox infrastructure:
vmbr.analysis, OPNsense VM, firewall rules. - FLARE-VM template + clean snapshot.
- PostgreSQL VM + schema migration.
- Job Orchestrator VM + Celery + Proxmox API client.
Phase 2: Monitoring & Capture (Weeks 4–5)
- Pre/post-execution collectors (ProcMon, RegShot, Wireshark).
- Dropped-file collector Windows service.
- Artifact staging NFS share.
Phase 3: STIX Generation & Persistence (Weeks 6–8)
- STIX factories.
- Parsers: ProcMon CSV, RegShot diff, Wireshark PCAP.
- Postgres persistence + transactional audit trail.
Phase 4: Job Queue & Automation (Weeks 9–10)
- Celery task wiring.
- Intake validation (VT + YARA).
- End-to-end orchestration test.
Phase 5: Integration & Hardening (Weeks 11–12)
- TI platform connectors.
- YARA/Snort/Zeek auto-rule export.
- Security audit + runbooks.
10. Operational Considerations
10.1 Capacity Planning
- 4 parallel analysis VMs, 8 vCPU / 16 GB RAM total reservation.
- Throughput: ~10 samples/hour (5 min exec + 5 min capture + 5 min reset).
- Disk: 80 GB template, 320 GB live deltas, 500 GB–1 TB quarantine, 100 GB Postgres.
10.2 Observability
- Prometheus: queue depth, success rate, execution time.
- Grafana: pipeline status dashboard.
- Alerts: failure rate > 5%, Postgres > 85%, execution > 10 min.
10.3 Incident Response
- Sandbox escape: immediately null OPNsense outbound rules, snapshot logs, revert affected VMs, review ProcMon for hypervisor-interaction attempts.
- Data loss: Postgres PITR via WAL archiving + daily full backup on separate media.
11. Conclusion
This design provides a production-grade malware runtime analysis environment that enforces infrastructure-level isolation, captures comprehensive behavioral artifacts, models findings as STIX 2.1, persists in Postgres, and automates via a job queue — with integration hooks for threat-intelligence platforms.
Immediate action items:
- Validate Proxmox dual-ISP connectivity and network topology.
- Stand up OPNsense firewall VM + rules.
- Create FLARE-VM template and test snapshot/revert cycle.
- Apply
migrations/001_initial_schema.sqland exercise the STIX builder. - Prototype the end-to-end flow against a single VM.