Getting Started

Tokenist is a guardrails layer for OpenAI API integrations. It tracks per-user token consumption and costs, lets you set spending limits, and automatically blocks users who exceed them — without changing your existing OpenAI code.

The SDK is designed for server-side use: call /sdk/check before forwarding a request to OpenAI, then /sdk/record once you have a response. Use /sdk/log to store the full request/response payload for auditing and sentiment analysis.

Get an API key

Log in to the Tokenist dashboard and create an API key under Settings → API Keys. Keys are prefixed ug_ and used as Bearer tokens on all SDK requests.

Install the TypeScript SDK

npm install tokenist-js

Or use the HTTP API directly — all endpoints accept standard JSON over HTTPS. No SDK required.

HTTP API Reference

All endpoints are authenticated with your API key as a Bearer token. Base URL: https://api.tokenist.dev (or your self-hosted instance).

Agent prompt

Integrate the Tokenist HTTP API. Base URL: https://api.tokenist.dev — authenticate every request with: Authorization: Bearer <TOKENIST_API_KEY>

Three endpoints to call server-side around each OpenAI request:

POST /sdk/check — call BEFORE forwarding to OpenAI
  Required: userId (string), model (string), requestType ("chat"|"realtime"|"embeddings")
  Optional: estimatedTokens (number), feature (string)
  Returns: { allowed: boolean, reason?: string, usage: { tokens, costUsd }, remaining?: { tokens, costUsd } }
  → If allowed is false, abort the request and surface reason to the user.

POST /sdk/record — call AFTER the OpenAI response completes
  Required: userId, model, requestType, inputTokens (number), outputTokens (number), latencyMs (number), success (boolean)
  Optional: feature (string)
  Returns: { recorded: boolean, usage?: { tokens, costUsd }, blocked: boolean }

POST /sdk/log — call AFTER recording to persist the full payload
  Required: model (string), request (object — the body you sent to OpenAI)
  Optional: response (object), userId, userEmail, userName, conversationId, feature, latencyMs, status ("success"|"error")
  Returns: { logged: boolean, logId: string }

POST/sdk/check

Pre-flight check before forwarding a request to OpenAI. Returns whether the user is allowed to proceed based on their current usage and any active limits.

Request

Field	Type	Required	Description
userId	string	required	Your application's identifier for the end user.
model	string	required	OpenAI model ID, e.g. "gpt-4o" or "gpt-4o-realtime-preview".
requestType	"chat" \| "realtime" \| "embeddings"	required	Type of OpenAI request being made.
estimatedTokens	number	optional	Token estimate for threshold pre-checking. Used to detect near-limit users before the request completes.
feature	string	optional	Optional feature tag for grouping usage in the dashboard.

Response

Field	Type	Required	Description
allowed	boolean	required	Whether the user may proceed with their request.
reason	string	optional	Present when allowed is false. E.g. "User is blocked: Exceeded fair usage".
usage.tokens	number	optional	Total tokens consumed by this user in the current period.
usage.costUsd	number	optional	Total cost in USD consumed in the current period.
remaining.tokens	number	optional	Remaining token budget. 0 if no token limit is configured.
remaining.costUsd	number	optional	Remaining cost budget in USD. 0 if no cost limit is configured.

Request

{
  "userId": "user_alice",
  "model": "gpt-4o",
  "requestType": "chat",
  "estimatedTokens": 500,
  "feature": "customer-support"
}

Response (allowed)

{
  "allowed": true,
  "usage": { "tokens": 1200, "costUsd": 0.08 },
  "remaining": { "tokens": 8800, "costUsd": 9.92 }
}

POST/sdk/record

Record actual token usage after an OpenAI request completes. Updates the user's running totals and re-evaluates their block status against any configured limits.

Request

Field	Type	Required	Description
userId	string	required	End user identifier.
model	string	required	OpenAI model that processed the request.
requestType	"chat" \| "realtime" \| "embeddings"	required	Type of request.
inputTokens	number	required	Actual input tokens consumed.
outputTokens	number	required	Actual output tokens consumed.
latencyMs	number	required	Round-trip latency in milliseconds.
success	boolean	required	Whether the OpenAI request succeeded.
feature	string	optional	Feature tag, should match the value used in /sdk/check.

Response

Field	Type	Required	Description
recorded	boolean	required	Always true on success.
usage.tokens	number	optional	Updated total tokens for this user.
usage.costUsd	number	optional	Updated total cost in USD.
blocked	boolean	required	Whether this usage pushed the user over a limit and triggered a block.

Request

{
  "userId": "user_alice",
  "model": "gpt-4o",
  "requestType": "chat",
  "inputTokens": 412,
  "outputTokens": 318,
  "latencyMs": 1240,
  "success": true,
  "feature": "customer-support"
}

Response

{
  "recorded": true,
  "usage": { "tokens": 1930, "costUsd": 0.11 },
  "blocked": false
}

POST/sdk/log

Store the full request and response payload for a completed OpenAI call. Enables conversation history, cost breakdowns per request, and sentiment analysis in the dashboard.

Request

Field	Type	Required	Description
model	string	required	OpenAI model used.
request	object	required	The original request body sent to OpenAI.
response	object	optional	The response body received from OpenAI.
userId	string	optional	End user identifier for attribution.
userEmail	string	optional	User email for display in the dashboard.
userName	string	optional	User display name.
conversationId	string	optional	Groups multiple log entries into a single conversation thread.
feature	string	optional	Feature tag.
latencyMs	number	optional	Total request latency in milliseconds.
status	"success" \| "error"	optional	Whether the call succeeded. Defaults to success.

Request

{
  "model": "gpt-4o",
  "request": {
    "messages": [
      { "role": "user", "content": "Summarise my order history" }
    ]
  },
  "response": {
    "choices": [
      { "message": { "role": "assistant", "content": "You have placed 3 orders..." } }
    ],
    "usage": { "prompt_tokens": 18, "completion_tokens": 42 }
  },
  "userId": "user_alice",
  "conversationId": "conv_abc123",
  "feature": "order-assistant",
  "latencyMs": 980
}

Response

{
  "logged": true,
  "logId": "log_01jq3..."
}

TypeScript SDK

tokenist-js is a typed Node.js client for the Tokenist API. It wraps the three SDK endpoints and provides full TypeScript types for all request and response shapes.

Agent prompt

Integrate the tokenist-js TypeScript SDK into my project.

npm install tokenist-js

import { TokenistClient } from "tokenist-js";
const tokenist = new TokenistClient({
  apiKey: process.env.TOKENIST_API_KEY!, // ug_... prefix
  baseUrl: "https://api.tokenist.dev",
});

Three methods to call server-side around each OpenAI request:

tokenist.sdk.check(req) → Promise<SdkCheckResponse>
  req: { userId, model, requestType: "chat"|"realtime"|"embeddings", estimatedTokens?, feature? }
  res: { allowed: boolean, reason?: string, usage: { tokens, costUsd }, remaining?: { tokens, costUsd } }
  → Abort if !res.allowed

tokenist.sdk.record(req) → Promise<void>
  req: { userId, model, requestType, inputTokens, outputTokens, latencyMs, success: boolean, feature? }

tokenist.sdk.log(req) → Promise<void>
  req: { model, request: object, response?: object, userId?, userEmail?, userName?, conversationId?, feature?, latencyMs?, status?: "success"|"error" }

Installation

npm install tokenist-js

import { TokenistClient } from "tokenist-js";

const tokenist = new TokenistClient({
  apiKey: process.env.TOKENIST_API_KEY!, // ug_...
  baseUrl: "https://api.tokenist.dev",   // or your self-hosted URL
});

client.sdk.check()

Check whether a user is allowed to make an OpenAI request. Call this before forwarding to OpenAI and abort if allowed is false.

const result = await tokenist.sdk.check({
  userId: "user_alice",
  model: "gpt-4o",
  requestType: "chat",
  estimatedTokens: 500,    // optional
  feature: "support-chat", // optional
});

if (!result.allowed) {
  throw new Error(`Request blocked: ${result.reason}`);
}

Type signatures

interface SdkCheckRequest {
  userId: string;
  model: string;
  requestType: "chat" | "realtime" | "embeddings";
  estimatedTokens?: number;
  feature?: string;
}

interface SdkCheckResponse {
  allowed: boolean;
  reason?: string;
  usage: { tokens: number; costUsd: number };
  remaining?: { tokens: number; costUsd: number };
}

client.sdk.record()

Record actual token usage after a completed OpenAI call. Returns the user's updated totals and whether they were automatically blocked.

await tokenist.sdk.record({
  userId: "user_alice",
  model: "gpt-4o",
  requestType: "chat",
  inputTokens: openAiResponse.usage.prompt_tokens,
  outputTokens: openAiResponse.usage.completion_tokens,
  latencyMs: Date.now() - startTime,
  success: true,
  feature: "support-chat",
});

Type signatures

interface SdkRecordRequest {
  userId: string;
  model: string;
  requestType: "chat" | "realtime" | "embeddings";
  inputTokens: number;
  outputTokens: number;
  latencyMs: number;
  success: boolean;
  feature?: string;
}

client.sdk.log()

Persist the full request/response payload. Required to see conversation history and enable sentiment analysis in the dashboard.

await tokenist.sdk.log({
  model: "gpt-4o",
  request: { messages },        // the body sent to OpenAI
  response: openAiResponse,     // the full response object
  userId: "user_alice",
  userEmail: "[email protected]",
  userName: "Alice",
  conversationId: sessionId,
  feature: "support-chat",
  latencyMs: Date.now() - startTime,
  status: "success",
});

Type signatures

interface SdkLogRequest {
  model: string;
  request: Record<string, unknown>;
  response?: Record<string, unknown>;
  userId?: string;
  userEmail?: string;
  userName?: string;
  conversationId?: string;
  feature?: string;
  latencyMs?: number;
  status?: "success" | "error";
}

client.listRules()

Retrieve all rules configured for an organisation. Returns an array of Rule objects, each carrying the full rule definition — trigger conditions (token limits, cost thresholds, policy violations), restriction config (rate-limit windows, throttle delays, blocks), subject targeting, and timestamps.

Also available as client.admin.listRules(orgId, opts?) when you need the paginated wrapper including the total count.

import type { Rule, ListRulesOptions } from "tokenist-js";

// All rules for an org
const rules: Rule[] = await tokenist.listRules("org_123");

// Narrow to active rate-limit rules only
const rateLimits: Rule[] = await tokenist.listRules("org_123", {
  restrictionType: "rate_limit",
  enabled: true,
});

// Paginated variant with total count
const { rules: list, total } = await tokenist.admin.listRules("org_123");

Options

Field	Type	Required	Description
subjectType	"user" \| "group" \| "feature"	optional	Only return rules targeting this subject type.
restrictionType	"warning" \| "rate_limit" \| "throttle" \| "block"	optional	Only return rules with this restriction action.
enabled	boolean	optional	true → active rules only; false → disabled rules only.

Type signatures

// Shared building block
interface TimeWindow {
  count: number;
  unit: "minute" | "hour" | "day" | "month";
}

// What causes the rule to fire
type RuleTriggerConfig =
  | { type: "token_limit";       tokens: number;  window: TimeWindow }
  | { type: "cost_limit";        costUsd: number; window: TimeWindow }
  | { type: "policy_violation";  policyId: string }
  | { type: "inactivity";        duration: TimeWindow };

// What happens when it fires
type RuleRestrictionConfig =
  | { type: "warning" }
  | { type: "rate_limit"; maxRequests: number; window: TimeWindow }
  | { type: "throttle";   delayMs: number }
  | { type: "block" };

interface Rule {
  id: string;
  name: string;
  enabled: boolean;
  subject: { type: "user" | "group" | "feature"; ids: string[] };
  trigger: RuleTriggerConfig;
  restriction: RuleRestrictionConfig;
  notifications: {
    webhookUrl?: string;
    injectResponse?: boolean;
    responseMessage?: string;
  };
  createdAt: string;
  updatedAt: string;
  createdBy?: string;
  lastTriggeredAt?: string | null;
}

Example response

[
  {
    "id": "rule_01jq3abc",
    "name": "Cap token usage per hour",
    "enabled": true,
    "subject": { "type": "user", "ids": [] },
    "trigger": { "type": "token_limit", "tokens": 50000, "window": { "count": 1, "unit": "hour" } },
    "restriction": { "type": "rate_limit", "maxRequests": 10, "window": { "count": 1, "unit": "hour" } },
    "notifications": { "injectResponse": true, "responseMessage": "Hourly token limit reached." },
    "createdAt": "2026-03-01T10:00:00Z",
    "updatedAt": "2026-03-05T14:22:00Z",
    "lastTriggeredAt": "2026-03-07T09:11:00Z"
  },
  {
    "id": "rule_01jq3xyz",
    "name": "Block on high spend",
    "enabled": true,
    "subject": { "type": "user", "ids": [] },
    "trigger": { "type": "cost_limit", "costUsd": 50, "window": { "count": 1, "unit": "month" } },
    "restriction": { "type": "block" },
    "notifications": { "injectResponse": true, "responseMessage": "Monthly budget exceeded." },
    "createdAt": "2026-03-02T08:00:00Z",
    "updatedAt": "2026-03-02T08:00:00Z",
    "lastTriggeredAt": null
  }
]

Putting it together

A typical server-side middleware pattern using all three methods:

async function openAiWithGuardrails(
  userId: string,
  messages: { role: string; content: string }[],
) {
  // 1. Check before the request
  const check = await tokenist.check({
    userId,
    model: "gpt-4o",
    requestType: "chat",
    feature: "support-chat",
  });
  if (!check.allowed) throw new Error(check.reason);

  // 2. Call OpenAI
  const start = Date.now();
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
  });
  const latencyMs = Date.now() - start;

  // 3. Record usage
  await tokenist.record({
    userId,
    model: "gpt-4o",
    requestType: "chat",
    inputTokens: response.usage!.prompt_tokens,
    outputTokens: response.usage!.completion_tokens,
    latencyMs,
    success: true,
  });

  // 4. Log full payload
  await tokenist.log({
    model: "gpt-4o",
    request: { messages },
    response,
    userId,
    latencyMs,
  });

  return response;
}