ADR-0019: Shared Research Library
Decision: Three-tier model (personal → staging → library) with all access
through ResearchLibrary. No direct workspace manipulation by analysts.
Why three tiers, not two
A flat shared workspace has two failure modes: analysts write garbage directly to the shared space, or concurrent writes from multiple analysts corrupt entries. The staging tier absorbs both. Analysts write to staging freely — it’s an inbox, not a source of truth. The curation job is the only thing that writes to the library, so the library is never in an inconsistent state from concurrent analyst activity.
Deduplication: most recent wins
When multiple analysts research the same topic and promote to staging, the
curation job keeps the entry with the latest promoted_at timestamp and
archives the rest. Archived entries remain in storage — nothing is deleted —
so the history of who researched what is preserved for audit. The entry_id
is a SHA-256 fingerprint of (topic_key, promoted_at) so it’s deterministic
and collision-resistant.
TTL categories
| Category | Default | Rationale |
|---|---|---|
indicator |
24h | IOCs rotate or get sinkholed quickly |
vulnerability |
72h | Exploitability status changes within days |
campaign |
14d | Campaign activity evolves over weeks |
threat_actor |
30d | Actor TTPs and infrastructure change slowly |
other |
7d | Conservative fallback |
All overridable in [research_library] INI section. The TTL is set at
curation time, not promotion time — so the clock starts when the entry
enters the library, not when the analyst finished their research.
check-before-research pattern
lib = ResearchLibrary.default()
if lib.is_fresh("APT29"):
# Use cached research — load into workspace, save API costs
lib.load_into_workspace("APT29", my_workspace)
else:
# Run agents, review, then promote
# ... research ...
lib.promote(my_workspace, topic="APT29", researcher="analyst1",
note="New C2 infra confirmed by Unit42 and Mandiant.")
is_fresh returns True only for curated (library) entries within their TTL.
Pending staging entries are invisible to is_fresh and get. This means
analysts always see curator-reviewed data, never raw staging entries.
The optional note field
lib.promote(..., note="...") is deliberately optional. Making it required
adds friction that reduces promotion rates. Making it optional means analysts
who want to share context can do so; those in a hurry can skip it. The note
appears in list_entries() and search() output, so a descriptive note
increases discoverability by colleagues.
CurationJob scheduling
from gnat.research import ResearchLibrary, CurationJob
from gnat.schedule import FeedScheduler
lib = ResearchLibrary.default()
job = CurationJob(lib, interval_seconds=4 * 3600) # every 4 hours
with FeedScheduler() as sched:
sched.add(job)
Four hours is a reasonable default — staging entries don’t sit unreviewed for
long, but the curation job doesn’t run so frequently that it becomes noisy in
the scheduler status output. For teams that need faster promotion, cron="0 * * * *"
(hourly) works equally well.
Licensed under the Apache License, Version 2.0