How to configure YARA rules
SandGNAT has two independent YARA scan points:
- Intake-time quick scan (
INTAKE_YARA_RULES_DIR) — runs on the orchestrator against every submission before enqueue. Matches bump priority and annotate the job. - Deep scan on the Linux static-analysis guest
(
STATIC_YARA_DEEP_RULES_DIRon the host;LINUX_GUEST_YARA_DEEP_RULES_DIRon the guest) — runs as part of the static stage with a heavier ruleset.
Both use yara-python. Both are optional: a missing library or an
empty rules directory degrades to a no-op.
When to use which
- Intake quick scan: fast, cheap, run on every submission. Good for known-family fingerprints, triage tags, and obvious-badness rules that should bump priority immediately.
- Deep scan: slower, runs in the isolated VM. Good for rules that
touch many strings, use imports (
pe.imphashetc.), or need the capstone-disassembled view.
They’re independent; you can run both, either, or neither.
Install yara-python
The intake path needs the yara optional extra:
pip install -e '.[yara]'
yara-python bundles libyara by default on Linux but fails at
install-time if your toolchain is unhappy. If you see “libyara.so not
found,” your distro probably ships a separate libyara package:
# Debian/Ubuntu:
apt-get install libyara-dev yara
On the Linux static-analysis guest, the tool wrapper imports
yara-python lazily and degrades to skipped if missing — so a guest
without libyara still works, the deep scan just doesn’t run.
Configure rule directories
On the orchestrator (intake + export):
INTAKE_YARA_RULES_DIR=/etc/sandgnat/yara-intake
STATIC_YARA_DEEP_RULES_DIR=/etc/sandgnat/yara-deep
On the Linux static-analysis guest:
LINUX_GUEST_YARA_DEEP_RULES_DIR=/etc/sandgnat/yara-deep
(You’ll typically mount the same rules volume on both host and guest via a shared filesystem; see build-linux-guest.md.)
Both directories are scanned recursively for *.yar / *.yara
files. Every rule file is compiled once at service start; compile
errors surface at boot, not at first sample.
Writing rules that SandGNAT cares about
Intake promotes priority (prioritized decision, priority ≤ 2) for:
- Rules with a
meta.severityof"high"or"critical". - Rules tagged with any of
malware,apt,ransomware,rat,stealer.
Everything else is matched and recorded but doesn’t bump priority.
Example “high-severity” rule that would promote:
rule EvilCorp_Stealer_v3 : stealer malware
{
meta:
author = "your-analyst"
severity = "high"
description = "Known EvilCorp credential-stealer variant v3"
strings:
$config_magic = "ECSC3" wide
$c2_pattern = /\bec[a-z0-9]{3,}\.example\b/
condition:
$config_magic and $c2_pattern
}
An “advisory” rule that would just annotate:
rule High_Entropy_Code_Section
{
meta:
severity = "info"
description = "Code section entropy suggests packing"
condition:
math.entropy(filesize - 1024, 1024) >= 7.0
}
Deep-scan rules
The deep scan is free to use heavier features:
pemodule (pe.imphash,pe.imports) — the quick scan runs on raw bytes too, so these work there, but PE-based rules match nothing on ELFs and vice versa.- Large string sets — compile time grows with the rule count, but scan time is bounded by the VM’s CPU time.
If you have vendor-licensed rulesets (e.g. from a threat-intel feed), put them in the deep dir — they’re typically too heavy for every intake.
Verify rules loaded
Check the intake-service logs at startup. You should see:
INFO orchestrator.yara_scanner: Compiling 14 YARA rule files from /etc/sandgnat/yara-intake
(The number is the count of distinct rule files, not individual rules.)
Submit a known-bad sample and verify the /submit response:
{
"decision": "prioritized",
"priority": 2,
"yara_matches": [
{"rule": "EvilCorp_Stealer_v3", "tags": ["stealer", "malware"], "meta": {"severity": "high"}}
]
}
And in the DB:
SELECT yara_matches FROM analysis_jobs WHERE id = '...';
Failure modes
- Rule file with a syntax error — compile fails at service start with a logged error; the scanner falls back to disabled. Fix the file and restart.
- Rule file with the same name but different content on two hosts — intake and deep scans are independent; rules don’t have to match. For reproducibility, source both directories from the same canonical store (git, NFS).
- YARA runtime error on a specific sample — logged at WARNING, scan returns empty matches for that sample only. Intake still enqueues the job.
Managing rules
SandGNAT has no opinion about how you maintain the rules directory. Common patterns:
- Git repo — one repo per rule class (intake-quick vs deep), CI
that runs
yara -cto validate syntax before merge. Deploy viagit pullon the host and guest. - Shared NFS mount — same directory mounted on host and guest. Simplest for small teams; relies on the NFS being up.
- Bundled into a container image — rules baked into the orchestrator image, versioned with the service. Great for reproducibility; slow for rule iteration.
Security
- Rule files are executed (compiled and matched). A malicious rule file could trigger bugs in libyara and potentially achieve code execution on the scanner. Keep your rule sources trusted.
- Don’t load rules directly from sample submissions or any untrusted input — they go in the sample pile, not the rules directory.
- The intake YARA scanner runs on the orchestrator host, not inside a VM. That’s deliberate — it’s a cheap triage signal — but it does mean libyara’s attack surface is on your orchestrator. Keep yara-python patched.