ADR-0048 — Query Budget and Cost Tracking (Phase 4E)
Date: 2026-04-09
Status: Accepted
Deciders: GNAT Platform Team
Context
GNAT coordinates calls to up to 99 external connector platforms. Each connector call may count against a paid API quota, consume compute time, or contribute to rate-limit thresholds imposed by the upstream provider.
Prior to this ADR, two mechanisms provided partial protection:
-
AgentGovernorrate limiting (ADR-0045) — a sliding-window counter per agent per time window, expressed in number of governor-checked agent actions. It does not account for the number of HTTP calls each action generates, which may be many (e.g. alist_objects()that pages through 5 000 results). -
QueryBudgetonExecutionContext(ADR-0039) — amax_connector_callsfield on the context dataclass. It was designed as a placeholder but had no enforcement mechanism:BaseClient._request()did not check it, and there was noBudgetExceededexception class.
The consequence was that an agent or pipeline with unrestricted connector access could:
- Page through an entire VirusTotal result set in a single
list_objects()call, exhausting the day’s API quota for the entire deployment. - Create a thundering-herd problem where multiple parallel enrichment pipelines all call the same rate-limited platform simultaneously.
- Provide no cost attribution: there was no record of which connector, agent, or pipeline consumed the most API calls over a given period.
These gaps made GNAT unsuitable for deployments with strict API cost controls or quota-sharing across teams.
Decision
Extend QueryBudget (introduced as a stub in ADR-0039) into a fully
functional cost-tracking and enforcement mechanism, and wire it into the hot
path of BaseClient._request().
QueryBudget Dataclass (Extended)
Located in gnat/core/context.py, replacing the stub from ADR-0039:
@dataclass
class QueryBudget:
"""Per-execution resource budget for connector API calls.
Parameters
----------
max_units : int
Maximum total cost units for this execution. Each connector call
deducts ``COST_UNIT`` units from the budget. Raise
``BudgetExceeded`` when the budget is exhausted.
"""
max_units: int
_consumed: int = field(default=0, repr=False, init=False)
@property
def remaining(self) -> int:
"""Remaining cost units."""
return self.max_units - self._consumed
@property
def is_exhausted(self) -> bool:
"""True when no budget remains."""
return self._consumed >= self.max_units
def consume(self, units: int, connector: str) -> None:
"""Deduct *units* from the budget on behalf of *connector*.
Parameters
----------
units : int
Cost units to deduct. Use ``BaseClient.COST_UNIT`` (default 1)
for single-item requests; use larger values for bulk/search ops.
connector : str
Connector class name, used for cost attribution logging.
Raises
------
BudgetExceeded
If deducting *units* would exceed ``max_units``.
"""
if self._consumed + units > self.max_units:
raise BudgetExceeded(
connector=connector,
cost=units,
remaining=self.remaining,
)
self._consumed += units
BudgetExceeded Exception
class BudgetExceeded(GNATClientError):
"""Raised when a connector call would exceed the active QueryBudget.
Attributes
----------
connector : str
Name of the connector that attempted the call.
cost : int
Cost units the call would have consumed.
remaining : int
Budget units remaining at the time of the attempt.
"""
def __init__(self, connector: str, cost: int, remaining: int) -> None:
self.connector = connector
self.cost = cost
self.remaining = remaining
super().__init__(
f"Budget exhausted: connector='{connector}' attempted "
f"cost={cost} but only {remaining} units remain."
)
BudgetExceeded inherits from GNATClientError (from gnat.clients.base)
so it is caught by the standard error handling path and propagates through
pipelines identically to any other HTTP-layer failure.
COST_UNIT Class Variable on BaseClient
class BaseClient:
COST_UNIT: int = 1 # default: 1 unit per HTTP request
TRUST_LEVEL: str = "semi_trusted"
def _request(self, method: str, path: str, **kwargs) -> urllib3.HTTPResponse:
if self._context and self._context.budget:
self._context.budget.consume(
self.COST_UNIT,
connector=type(self).__name__,
)
# ... existing HTTP dispatch ...
Connectors that make bulk or search calls override COST_UNIT to reflect
their relative expense:
| Connector Category | COST_UNIT |
Rationale |
|---|---|---|
| Standard single-object GET / POST | 1 |
Default; one API call, one result |
| Bulk list / paginated results | 10 |
One call may return hundreds of objects |
| Full-text search queries | 5 |
Search indexes are expensive to query at scale |
| AI inference calls (LLM connectors) | 20 |
Token cost is orders of magnitude above REST calls |
Example for the VirusTotal connector, which supports paginated list endpoints:
class VirusTotalClient(BaseClient):
COST_UNIT = 1 # single-lookup default
def list_objects(self, query: str, limit: int = 100) -> list[dict]:
# Bulk paging — charge 10 per page
results = []
cursor = None
while True:
if self._context and self._context.budget:
self._context.budget.consume(10, connector="VirusTotalClient")
page = self._request("GET", f"/intelligence/search?query={query}&cursor={cursor}")
# ... parse and accumulate ...
if not page.get("meta", {}).get("cursor"):
break
cursor = page["meta"]["cursor"]
return results
ExecutionContext.create() with Budget
The max_budget_units parameter on ExecutionContext.create() is now wired:
ctx = ExecutionContext.create(
initiated_by="enrichment-pipeline",
domain="analysis",
workspace_id="production",
max_budget_units=500,
)
# ctx.budget is a QueryBudget(max_units=500)
# With no budget limit:
ctx = ExecutionContext.create(
initiated_by="manual",
domain="ingestion",
workspace_id="sandbox",
# max_budget_units omitted → ctx.budget is None → unlimited
)
Cost Logging — query_cost_log Table
Every call to QueryBudget.consume() appends a row to the query_cost_log
table (Alembic migration 0008_add_query_cost_log.py):
| Column | Type | Notes |
|---|---|---|
id |
INTEGER |
Auto-increment primary key |
context_id |
VARCHAR(36) |
FK → execution_log.id |
connector |
VARCHAR(200) |
Connector class name |
cost_units |
INTEGER |
Units deducted by this call |
cumulative_consumed |
INTEGER |
Budget state after deduction |
budget_max |
INTEGER |
max_units of the owning QueryBudget |
recorded_at |
DATETIME |
UTC timestamp |
-- Migration 0008 (excerpt)
CREATE TABLE query_cost_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
context_id VARCHAR(36) NOT NULL,
connector VARCHAR(200) NOT NULL,
cost_units INTEGER NOT NULL,
cumulative_consumed INTEGER NOT NULL,
budget_max INTEGER NOT NULL,
recorded_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (context_id) REFERENCES execution_log(id)
);
CREATE INDEX ix_query_cost_log_context ON query_cost_log (context_id);
CREATE INDEX ix_query_cost_log_connector ON query_cost_log (connector, recorded_at);
Logging is best-effort: a failure to write to query_cost_log is caught and
logged at WARNING level but does not propagate. The budget deduction itself
always occurs before the log write, so enforcement is never skipped.
Querying Cost Attribution
from gnat.core.context import CostAttributionQuery
report = CostAttributionQuery(db_session).by_connector(
connector="VirusTotalClient",
since=datetime(2026, 4, 1),
)
# Returns list of (date, connector, total_units, call_count)
report = CostAttributionQuery(db_session).by_context(context_id="...")
# Returns per-connector breakdown for a single execution
Configuration
[context]
default_budget_units = 0 ; 0 = unlimited (default for manual runs)
pipeline_budget_units = 1000 ; budget applied to scheduled pipeline runs
agent_budget_units = 200 ; budget applied to each agent session
When pipeline_budget_units is set, FeedScheduler automatically creates
an ExecutionContext with max_budget_units=pipeline_budget_units for every
scheduled feed run.
Consequences
Positive
- Hard resource limit for pipelines and agents: a misconfigured
ResearchAgentlooping over VirusTotal will hitBudgetExceededaftermax_budget_units / COST_UNITcalls rather than running indefinitely. - First-class error with actionable context:
BudgetExceededcarriesconnector,cost, andremaining— the operator can immediately see which connector triggered the limit and by how much. - Per-connector cost attribution:
query_cost_logprovides a persistent, queryable record of which connectors consumed what share of the budget over any time window. This enables quota planning and chargeback reporting for MSSP deployments. - Zero overhead when no budget is set: if
ctx.budgetisNone, theifguard in_request()is a single attribute lookup that short-circuits immediately. Deployments that do not need budget enforcement pay no cost. - Bulk and search overrides enable accurate cost modelling: connectors
that page through large result sets can declare realistic
COST_UNITmultipliers rather than counting every paginated request as 1 unit.
Negative / Trade-offs
COST_UNITis a class constant, not a per-call value: a connector cannot dynamically adjust the cost of a call based on the response size (e.g. charging more for a response with 10 000 results than one with 10). Per-call dynamic costing is deferred.- Cost logging adds one
INSERTper connector call when a budget is active: high-frequency pipelines may produce large volumes of cost log rows. A retention or aggregation policy is needed for long-running deployments. - Budget is per-execution-context, not global: two concurrent pipelines each with a budget of 1 000 units can together consume 2 000 units from a platform with a 1 500-unit daily quota. Cross-context global quota enforcement requires a shared counter (deferred).
Deferred
- Global quota pool shared across concurrent
ExecutionContextinstances (requires a Redis or database-backed counter) - Dynamic per-call cost calculation based on response size or token count
query_cost_logretention policy and aggregation rollups- Cost attribution dashboard in the TUI
- Per-connector quota configuration in
config.ini(e.g.[virustotal] daily_quota = 500)
Alternatives Considered
Connector-Level Rate Limits Only
Apply rate limits at the connector level rather than introducing a budget
concept on ExecutionContext. For example, each connector would track its
own call count and sleep or raise when a per-hour limit is reached. Rejected
because:
- Connector-level limits do not aggregate across connectors. A pipeline that calls five connectors 200 times each has made 1 000 total calls, but no connector-level limit would fire.
- Rate limits and budgets serve different purposes: rate limits protect against throughput spikes; budgets protect against total cost within an execution. Both are needed; budget enforcement complements rather than replaces rate limiting.
OS-Level Resource Limits (cgroups / resource.setrlimit)
Applying OS-level CPU or memory limits to pipeline processes was considered as a coarser alternative. Rejected because it does not provide per-connector cost attribution, does not integrate with the GNAT audit trail, and does not map naturally to API quota units (which are a business concept, not an OS resource).
OpenAI / Anthropic Cost Estimators as the Model
Using the token-count-based cost estimation models from LLM providers as the
primary budget unit was considered. Rejected because GNAT’s connectors are
predominantly REST API clients, not LLM callers. A unified unit (abstract
cost units with connector-specific COST_UNIT multipliers) is more flexible
and does not require token counting infrastructure for non-LLM connectors.
Queue-Based Throttling (Celery / RQ)
Routing all connector calls through a task queue and configuring per-connector concurrency limits was prototyped. Rejected because it introduces a mandatory message broker dependency for a feature that should be available in single- process deployments. Queue-based throttling remains an option for scale-out deployments but should not be required for the core use case.
Licensed under the Apache License, Version 2.0