Getting Started

Tokenist is a guardrails layer for OpenAI API integrations. It tracks per-user token consumption and costs, lets you set spending limits, and automatically blocks users who exceed them — without changing your existing OpenAI code.

The SDK is designed for server-side use: call /sdk/check before forwarding a request to OpenAI, then /sdk/record once you have a response. Use /sdk/log to store the full request/response payload for auditing and sentiment analysis.

1

Get an API key

Log in to the Tokenist dashboard and create an API key under Settings → API Keys. Keys are prefixed ug_ and used as Bearer tokens on all SDK requests.

2

Install the TypeScript SDK

npm install tokenist-js

Or use the HTTP API directly — all endpoints accept standard JSON over HTTPS. No SDK required.

HTTP API Reference

All endpoints are authenticated with your API key as a Bearer token. Base URL: https://api.tokenist.dev (or your self-hosted instance).

Agent prompt
Integrate the Tokenist HTTP API. Base URL: https://api.tokenist.dev — authenticate every request with: Authorization: Bearer <TOKENIST_API_KEY>

Three endpoints to call server-side around each OpenAI request:

POST /sdk/check — call BEFORE forwarding to OpenAI
  Required: userId (string), model (string), requestType ("chat"|"realtime"|"embeddings")
  Optional: estimatedTokens (number), feature (string)
  Returns: { allowed: boolean, reason?: string, usage: { tokens, costUsd }, remaining?: { tokens, costUsd } }
  → If allowed is false, abort the request and surface reason to the user.

POST /sdk/record — call AFTER the OpenAI response completes
  Required: userId, model, requestType, inputTokens (number), outputTokens (number), latencyMs (number), success (boolean)
  Optional: feature (string)
  Returns: { recorded: boolean, usage?: { tokens, costUsd }, blocked: boolean }

POST /sdk/log — call AFTER recording to persist the full payload
  Required: model (string), request (object — the body you sent to OpenAI)
  Optional: response (object), userId, userEmail, userName, conversationId, feature, latencyMs, status ("success"|"error")
  Returns: { logged: boolean, logId: string }
POST/sdk/check

Pre-flight check before forwarding a request to OpenAI. Returns whether the user is allowed to proceed based on their current usage and any active limits.

Request

FieldTypeRequiredDescription
userIdstringrequiredYour application's identifier for the end user.
modelstringrequiredOpenAI model ID, e.g. "gpt-4o" or "gpt-4o-realtime-preview".
requestType"chat" | "realtime" | "embeddings"requiredType of OpenAI request being made.
estimatedTokensnumberoptionalToken estimate for threshold pre-checking. Used to detect near-limit users before the request completes.
featurestringoptionalOptional feature tag for grouping usage in the dashboard.

Response

FieldTypeRequiredDescription
allowedbooleanrequiredWhether the user may proceed with their request.
reasonstringoptionalPresent when allowed is false. E.g. "User is blocked: Exceeded fair usage".
usage.tokensnumberoptionalTotal tokens consumed by this user in the current period.
usage.costUsdnumberoptionalTotal cost in USD consumed in the current period.
remaining.tokensnumberoptionalRemaining token budget. 0 if no token limit is configured.
remaining.costUsdnumberoptionalRemaining cost budget in USD. 0 if no cost limit is configured.

Request

{
  "userId": "user_alice",
  "model": "gpt-4o",
  "requestType": "chat",
  "estimatedTokens": 500,
  "feature": "customer-support"
}

Response (allowed)

{
  "allowed": true,
  "usage": { "tokens": 1200, "costUsd": 0.08 },
  "remaining": { "tokens": 8800, "costUsd": 9.92 }
}
POST/sdk/record

Record actual token usage after an OpenAI request completes. Updates the user's running totals and re-evaluates their block status against any configured limits.

Request

FieldTypeRequiredDescription
userIdstringrequiredEnd user identifier.
modelstringrequiredOpenAI model that processed the request.
requestType"chat" | "realtime" | "embeddings"requiredType of request.
inputTokensnumberrequiredActual input tokens consumed.
outputTokensnumberrequiredActual output tokens consumed.
latencyMsnumberrequiredRound-trip latency in milliseconds.
successbooleanrequiredWhether the OpenAI request succeeded.
featurestringoptionalFeature tag, should match the value used in /sdk/check.

Response

FieldTypeRequiredDescription
recordedbooleanrequiredAlways true on success.
usage.tokensnumberoptionalUpdated total tokens for this user.
usage.costUsdnumberoptionalUpdated total cost in USD.
blockedbooleanrequiredWhether this usage pushed the user over a limit and triggered a block.

Request

{
  "userId": "user_alice",
  "model": "gpt-4o",
  "requestType": "chat",
  "inputTokens": 412,
  "outputTokens": 318,
  "latencyMs": 1240,
  "success": true,
  "feature": "customer-support"
}

Response

{
  "recorded": true,
  "usage": { "tokens": 1930, "costUsd": 0.11 },
  "blocked": false
}
POST/sdk/log

Store the full request and response payload for a completed OpenAI call. Enables conversation history, cost breakdowns per request, and sentiment analysis in the dashboard.

Request

FieldTypeRequiredDescription
modelstringrequiredOpenAI model used.
requestobjectrequiredThe original request body sent to OpenAI.
responseobjectoptionalThe response body received from OpenAI.
userIdstringoptionalEnd user identifier for attribution.
userEmailstringoptionalUser email for display in the dashboard.
userNamestringoptionalUser display name.
conversationIdstringoptionalGroups multiple log entries into a single conversation thread.
featurestringoptionalFeature tag.
latencyMsnumberoptionalTotal request latency in milliseconds.
status"success" | "error"optionalWhether the call succeeded. Defaults to success.

Request

{
  "model": "gpt-4o",
  "request": {
    "messages": [
      { "role": "user", "content": "Summarise my order history" }
    ]
  },
  "response": {
    "choices": [
      { "message": { "role": "assistant", "content": "You have placed 3 orders..." } }
    ],
    "usage": { "prompt_tokens": 18, "completion_tokens": 42 }
  },
  "userId": "user_alice",
  "conversationId": "conv_abc123",
  "feature": "order-assistant",
  "latencyMs": 980
}

Response

{
  "logged": true,
  "logId": "log_01jq3..."
}

TypeScript SDK

tokenist-js is a typed Node.js client for the Tokenist API. It wraps the three SDK endpoints and provides full TypeScript types for all request and response shapes.

Agent prompt
Integrate the tokenist-js TypeScript SDK into my project.

npm install tokenist-js

import { TokenistClient } from "tokenist-js";
const tokenist = new TokenistClient({
  apiKey: process.env.TOKENIST_API_KEY!, // ug_... prefix
  baseUrl: "https://api.tokenist.dev",
});

Three methods to call server-side around each OpenAI request:

tokenist.sdk.check(req) → Promise<SdkCheckResponse>
  req: { userId, model, requestType: "chat"|"realtime"|"embeddings", estimatedTokens?, feature? }
  res: { allowed: boolean, reason?: string, usage: { tokens, costUsd }, remaining?: { tokens, costUsd } }
  → Abort if !res.allowed

tokenist.sdk.record(req) → Promise<void>
  req: { userId, model, requestType, inputTokens, outputTokens, latencyMs, success: boolean, feature? }

tokenist.sdk.log(req) → Promise<void>
  req: { model, request: object, response?: object, userId?, userEmail?, userName?, conversationId?, feature?, latencyMs?, status?: "success"|"error" }

Installation

npm install tokenist-js
import { TokenistClient } from "tokenist-js";

const tokenist = new TokenistClient({
  apiKey: process.env.TOKENIST_API_KEY!, // ug_...
  baseUrl: "https://api.tokenist.dev",   // or your self-hosted URL
});

client.sdk.check()

Check whether a user is allowed to make an OpenAI request. Call this before forwarding to OpenAI and abort if allowed is false.

const result = await tokenist.sdk.check({
  userId: "user_alice",
  model: "gpt-4o",
  requestType: "chat",
  estimatedTokens: 500,    // optional
  feature: "support-chat", // optional
});

if (!result.allowed) {
  throw new Error(`Request blocked: ${result.reason}`);
}

Type signatures

interface SdkCheckRequest {
  userId: string;
  model: string;
  requestType: "chat" | "realtime" | "embeddings";
  estimatedTokens?: number;
  feature?: string;
}

interface SdkCheckResponse {
  allowed: boolean;
  reason?: string;
  usage: { tokens: number; costUsd: number };
  remaining?: { tokens: number; costUsd: number };
}

client.sdk.record()

Record actual token usage after a completed OpenAI call. Returns the user's updated totals and whether they were automatically blocked.

await tokenist.sdk.record({
  userId: "user_alice",
  model: "gpt-4o",
  requestType: "chat",
  inputTokens: openAiResponse.usage.prompt_tokens,
  outputTokens: openAiResponse.usage.completion_tokens,
  latencyMs: Date.now() - startTime,
  success: true,
  feature: "support-chat",
});

Type signatures

interface SdkRecordRequest {
  userId: string;
  model: string;
  requestType: "chat" | "realtime" | "embeddings";
  inputTokens: number;
  outputTokens: number;
  latencyMs: number;
  success: boolean;
  feature?: string;
}

client.sdk.log()

Persist the full request/response payload. Required to see conversation history and enable sentiment analysis in the dashboard.

await tokenist.sdk.log({
  model: "gpt-4o",
  request: { messages },        // the body sent to OpenAI
  response: openAiResponse,     // the full response object
  userId: "user_alice",
  userEmail: "[email protected]",
  userName: "Alice",
  conversationId: sessionId,
  feature: "support-chat",
  latencyMs: Date.now() - startTime,
  status: "success",
});

Type signatures

interface SdkLogRequest {
  model: string;
  request: Record<string, unknown>;
  response?: Record<string, unknown>;
  userId?: string;
  userEmail?: string;
  userName?: string;
  conversationId?: string;
  feature?: string;
  latencyMs?: number;
  status?: "success" | "error";
}

client.listRules()

Retrieve all rules configured for an organisation. Returns an array of Rule objects, each carrying the full rule definition — trigger conditions (token limits, cost thresholds, policy violations), restriction config (rate-limit windows, throttle delays, blocks), subject targeting, and timestamps.

Also available as client.admin.listRules(orgId, opts?) when you need the paginated wrapper including the total count.

import type { Rule, ListRulesOptions } from "tokenist-js";

// All rules for an org
const rules: Rule[] = await tokenist.listRules("org_123");

// Narrow to active rate-limit rules only
const rateLimits: Rule[] = await tokenist.listRules("org_123", {
  restrictionType: "rate_limit",
  enabled: true,
});

// Paginated variant with total count
const { rules: list, total } = await tokenist.admin.listRules("org_123");

Options

FieldTypeRequiredDescription
subjectType"user" | "group" | "feature"optionalOnly return rules targeting this subject type.
restrictionType"warning" | "rate_limit" | "throttle" | "block"optionalOnly return rules with this restriction action.
enabledbooleanoptionaltrue → active rules only; false → disabled rules only.

Type signatures

// Shared building block
interface TimeWindow {
  count: number;
  unit: "minute" | "hour" | "day" | "month";
}

// What causes the rule to fire
type RuleTriggerConfig =
  | { type: "token_limit";       tokens: number;  window: TimeWindow }
  | { type: "cost_limit";        costUsd: number; window: TimeWindow }
  | { type: "policy_violation";  policyId: string }
  | { type: "inactivity";        duration: TimeWindow };

// What happens when it fires
type RuleRestrictionConfig =
  | { type: "warning" }
  | { type: "rate_limit"; maxRequests: number; window: TimeWindow }
  | { type: "throttle";   delayMs: number }
  | { type: "block" };

interface Rule {
  id: string;
  name: string;
  enabled: boolean;
  subject: { type: "user" | "group" | "feature"; ids: string[] };
  trigger: RuleTriggerConfig;
  restriction: RuleRestrictionConfig;
  notifications: {
    webhookUrl?: string;
    injectResponse?: boolean;
    responseMessage?: string;
  };
  createdAt: string;
  updatedAt: string;
  createdBy?: string;
  lastTriggeredAt?: string | null;
}

Example response

[
  {
    "id": "rule_01jq3abc",
    "name": "Cap token usage per hour",
    "enabled": true,
    "subject": { "type": "user", "ids": [] },
    "trigger": { "type": "token_limit", "tokens": 50000, "window": { "count": 1, "unit": "hour" } },
    "restriction": { "type": "rate_limit", "maxRequests": 10, "window": { "count": 1, "unit": "hour" } },
    "notifications": { "injectResponse": true, "responseMessage": "Hourly token limit reached." },
    "createdAt": "2026-03-01T10:00:00Z",
    "updatedAt": "2026-03-05T14:22:00Z",
    "lastTriggeredAt": "2026-03-07T09:11:00Z"
  },
  {
    "id": "rule_01jq3xyz",
    "name": "Block on high spend",
    "enabled": true,
    "subject": { "type": "user", "ids": [] },
    "trigger": { "type": "cost_limit", "costUsd": 50, "window": { "count": 1, "unit": "month" } },
    "restriction": { "type": "block" },
    "notifications": { "injectResponse": true, "responseMessage": "Monthly budget exceeded." },
    "createdAt": "2026-03-02T08:00:00Z",
    "updatedAt": "2026-03-02T08:00:00Z",
    "lastTriggeredAt": null
  }
]

Putting it together

A typical server-side middleware pattern using all three methods:

async function openAiWithGuardrails(
  userId: string,
  messages: { role: string; content: string }[],
) {
  // 1. Check before the request
  const check = await tokenist.check({
    userId,
    model: "gpt-4o",
    requestType: "chat",
    feature: "support-chat",
  });
  if (!check.allowed) throw new Error(check.reason);

  // 2. Call OpenAI
  const start = Date.now();
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
  });
  const latencyMs = Date.now() - start;

  // 3. Record usage
  await tokenist.record({
    userId,
    model: "gpt-4o",
    requestType: "chat",
    inputTokens: response.usage!.prompt_tokens,
    outputTokens: response.usage!.completion_tokens,
    latencyMs,
    success: true,
  });

  // 4. Log full payload
  await tokenist.log({
    model: "gpt-4o",
    request: { messages },
    response,
    userId,
    latencyMs,
  });

  return response;
}