Skip to the content.

ADR-0009: Visualization — Graph

Decision: Tiered renderer + tiered layout based on node count.

Layout algorithm selection

Node count Algorithm Complexity Approx. time
≤ 200 Fruchterman-Reingold (networkx) O(n²) < 0.1s
200–1000 Barnes-Hut ForceAtlas2 (pure Python) O(n log n) 0.1–2s
> 1000 Type-cluster (Fibonacci spiral) O(n) < 0.01s

Barnes-Hut implementation details:

Type-cluster layout tradeoff: At 1000+ nodes, type-cluster sacrifices topological accuracy for speed. You see where types are but not how individual nodes relate. If the relationship structure is the signal (e.g., tracing a specific campaign), override: GraphView(ws, cluster_threshold=5000).to_html(...) to force Barnes-Hut even at large scale.

Renderer selection

Node count Renderer Technology Notes
≤ 300 Plotly 3D WebGL (Plotly) 3D, Jupyter-native, ~3MB JS
> 300 sigma.js WebGL (sigma) 2D, 100K node capacity, ~50KB JS

sigma.js HTML features:

to_graph_json() format:

{
  "nodes": [
    {"key": "indicator--abc", "label": "evil.com", "x": 1.2, "y": -0.8,
     "size": 12, "color": "#4ea8de", "type": "indicator",
     "attributes": {"confidence": 80, "x_rf_risk_score": 90}}
  ],
  "edges": [
    {"key": "e-0", "source": "indicator--abc", "target": "malware--xyz",
     "label": "indicates", "color": "#4ea8de"}
  ]
}

Use this to feed the Grafana Node Graph panel or build custom sigma.js apps.

Intent-driven rendering API — primary user-facing interface:

The five intent methods remove the need to know layout algorithms or renderer names. Each one encodes “what you want to see” and configures everything automatically:

Method Primary question Layout Renderer Edges
render_relationship_graph() How are objects connected? Barnes-Hut (always) sigma/Plotly Prominent (opacity 0.7)
render_type_graph() What types are in this workspace? Type-cluster (always) sigma/Plotly Secondary (opacity 0.25)
render_campaign_graph() What connects to these seeds? Barnes-Hut + BFS ego sigma/Plotly Standard (0.65)
render_timeline_graph() How did this evolve over time? X=timestamp, Y=type lane sigma only Standard
render_risk_heatmap() What has high risk vs low confidence? X=field, Y=field (value-driven) sigma only None

Key design decisions in intent methods:

render_relationship_graph overrides cluster_threshold to n+1 so Barnes-Hut is used at any scale — type-cluster would hide the relational topology that is the entire point of this view.

render_type_graph overrides cluster_threshold to 0 so type-cluster is always used — and sets uniform_node_size=True so visual density reflects object counts rather than score distribution.

render_campaign_graph uses BFS from seed nodes. Auto-seeds to top-3 by degree centrality if no seed_ids given. Result is always a strict subgraph of the workspace — never shows disconnected objects.

render_timeline_graph places objects at X = (timestamp - min) / range × 20, Y = type-lane index × 4 + jitter. Objects without a parseable timestamp get X = -5 (visibly outside the axis, not hidden). Uses sigma always because timelines can be very wide.

render_risk_heatmap places objects at X = x_field/10, Y = y_field/10. Objects missing either field get random jitter near origin — they cluster visibly at (0,0) so the “coverage gap” is itself informative. No edges drawn.

Plotly fallback in _render_intent: If plotly is not installed and the graph is below plotly_threshold, the intent methods automatically fall back to sigma.js rather than raising ImportError. The low-level figure() method still raises ImportError as documented.

max_nodes caps by degree centrality: Keeps the most-connected nodes. For large workspaces this means hub indicators (seen by many sources) survive the cap; isolated singletons are dropped. Run view.summary() first to understand the graph before capping.


Licensed under the Apache License, Version 2.0