Getting Started
Tokenist is a guardrails layer for OpenAI API integrations. It tracks per-user token consumption and costs, lets you set spending limits, and automatically blocks users who exceed them — without changing your existing OpenAI code.
The SDK is designed for server-side use: call /sdk/check before forwarding a request to OpenAI, then /sdk/record once you have a response. Use /sdk/log to store the full request/response payload for auditing and sentiment analysis.
Get an API key
Log in to the Tokenist dashboard and create an API key under Settings → API Keys. Keys are prefixed ug_ and used as Bearer tokens on all SDK requests.
Install the TypeScript SDK
npm install tokenist-jsOr use the HTTP API directly — all endpoints accept standard JSON over HTTPS. No SDK required.
HTTP API Reference
All endpoints are authenticated with your API key as a Bearer token. Base URL: https://api.tokenist.dev (or your self-hosted instance).
/sdk/checkPre-flight check before forwarding a request to OpenAI. Returns whether the user is allowed to proceed based on their current usage and any active limits.
Request
| Field | Type | Required | Description |
|---|---|---|---|
| userId | string | required | Your application's identifier for the end user. |
| model | string | required | OpenAI model ID, e.g. "gpt-4o" or "gpt-4o-realtime-preview". |
| requestType | "chat" | "realtime" | "embeddings" | required | Type of OpenAI request being made. |
| estimatedTokens | number | optional | Token estimate for threshold pre-checking. Used to detect near-limit users before the request completes. |
| feature | string | optional | Optional feature tag for grouping usage in the dashboard. |
Response
| Field | Type | Required | Description |
|---|---|---|---|
| allowed | boolean | required | Whether the user may proceed with their request. |
| reason | string | optional | Present when allowed is false. E.g. "User is blocked: Exceeded fair usage". |
| usage.tokens | number | optional | Total tokens consumed by this user in the current period. |
| usage.costUsd | number | optional | Total cost in USD consumed in the current period. |
| remaining.tokens | number | optional | Remaining token budget. 0 if no token limit is configured. |
| remaining.costUsd | number | optional | Remaining cost budget in USD. 0 if no cost limit is configured. |
Request
{
"userId": "user_alice",
"model": "gpt-4o",
"requestType": "chat",
"estimatedTokens": 500,
"feature": "customer-support"
}Response (allowed)
{
"allowed": true,
"usage": { "tokens": 1200, "costUsd": 0.08 },
"remaining": { "tokens": 8800, "costUsd": 9.92 }
}/sdk/recordRecord actual token usage after an OpenAI request completes. Updates the user's running totals and re-evaluates their block status against any configured limits.
Request
| Field | Type | Required | Description |
|---|---|---|---|
| userId | string | required | End user identifier. |
| model | string | required | OpenAI model that processed the request. |
| requestType | "chat" | "realtime" | "embeddings" | required | Type of request. |
| inputTokens | number | required | Actual input tokens consumed. |
| outputTokens | number | required | Actual output tokens consumed. |
| latencyMs | number | required | Round-trip latency in milliseconds. |
| success | boolean | required | Whether the OpenAI request succeeded. |
| feature | string | optional | Feature tag, should match the value used in /sdk/check. |
Response
| Field | Type | Required | Description |
|---|---|---|---|
| recorded | boolean | required | Always true on success. |
| usage.tokens | number | optional | Updated total tokens for this user. |
| usage.costUsd | number | optional | Updated total cost in USD. |
| blocked | boolean | required | Whether this usage pushed the user over a limit and triggered a block. |
Request
{
"userId": "user_alice",
"model": "gpt-4o",
"requestType": "chat",
"inputTokens": 412,
"outputTokens": 318,
"latencyMs": 1240,
"success": true,
"feature": "customer-support"
}Response
{
"recorded": true,
"usage": { "tokens": 1930, "costUsd": 0.11 },
"blocked": false
}/sdk/logStore the full request and response payload for a completed OpenAI call. Enables conversation history, cost breakdowns per request, and sentiment analysis in the dashboard.
Request
| Field | Type | Required | Description |
|---|---|---|---|
| model | string | required | OpenAI model used. |
| request | object | required | The original request body sent to OpenAI. |
| response | object | optional | The response body received from OpenAI. |
| userId | string | optional | End user identifier for attribution. |
| userEmail | string | optional | User email for display in the dashboard. |
| userName | string | optional | User display name. |
| conversationId | string | optional | Groups multiple log entries into a single conversation thread. |
| feature | string | optional | Feature tag. |
| latencyMs | number | optional | Total request latency in milliseconds. |
| status | "success" | "error" | optional | Whether the call succeeded. Defaults to success. |
Request
{
"model": "gpt-4o",
"request": {
"messages": [
{ "role": "user", "content": "Summarise my order history" }
]
},
"response": {
"choices": [
{ "message": { "role": "assistant", "content": "You have placed 3 orders..." } }
],
"usage": { "prompt_tokens": 18, "completion_tokens": 42 }
},
"userId": "user_alice",
"conversationId": "conv_abc123",
"feature": "order-assistant",
"latencyMs": 980
}Response
{
"logged": true,
"logId": "log_01jq3..."
}TypeScript SDK
tokenist-js is a typed Node.js client for the Tokenist API. It wraps the three SDK endpoints and provides full TypeScript types for all request and response shapes.
Installation
npm install tokenist-jsimport { TokenistClient } from "tokenist-js";
const tokenist = new TokenistClient({
apiKey: process.env.TOKENIST_API_KEY!, // ug_...
baseUrl: "https://api.tokenist.dev", // or your self-hosted URL
});client.sdk.check()
Check whether a user is allowed to make an OpenAI request. Call this before forwarding to OpenAI and abort if allowed is false.
const result = await tokenist.sdk.check({
userId: "user_alice",
model: "gpt-4o",
requestType: "chat",
estimatedTokens: 500, // optional
feature: "support-chat", // optional
});
if (!result.allowed) {
throw new Error(`Request blocked: ${result.reason}`);
}Type signatures
interface SdkCheckRequest {
userId: string;
model: string;
requestType: "chat" | "realtime" | "embeddings";
estimatedTokens?: number;
feature?: string;
}
interface SdkCheckResponse {
allowed: boolean;
reason?: string;
usage: { tokens: number; costUsd: number };
remaining?: { tokens: number; costUsd: number };
}client.sdk.record()
Record actual token usage after a completed OpenAI call. Returns the user's updated totals and whether they were automatically blocked.
await tokenist.sdk.record({
userId: "user_alice",
model: "gpt-4o",
requestType: "chat",
inputTokens: openAiResponse.usage.prompt_tokens,
outputTokens: openAiResponse.usage.completion_tokens,
latencyMs: Date.now() - startTime,
success: true,
feature: "support-chat",
});Type signatures
interface SdkRecordRequest {
userId: string;
model: string;
requestType: "chat" | "realtime" | "embeddings";
inputTokens: number;
outputTokens: number;
latencyMs: number;
success: boolean;
feature?: string;
}client.sdk.log()
Persist the full request/response payload. Required to see conversation history and enable sentiment analysis in the dashboard.
await tokenist.sdk.log({
model: "gpt-4o",
request: { messages }, // the body sent to OpenAI
response: openAiResponse, // the full response object
userId: "user_alice",
userEmail: "[email protected]",
userName: "Alice",
conversationId: sessionId,
feature: "support-chat",
latencyMs: Date.now() - startTime,
status: "success",
});Type signatures
interface SdkLogRequest {
model: string;
request: Record<string, unknown>;
response?: Record<string, unknown>;
userId?: string;
userEmail?: string;
userName?: string;
conversationId?: string;
feature?: string;
latencyMs?: number;
status?: "success" | "error";
}client.listRules()
Retrieve all rules configured for an organisation. Returns an array of Rule objects, each carrying the full rule definition — trigger conditions (token limits, cost thresholds, policy violations), restriction config (rate-limit windows, throttle delays, blocks), subject targeting, and timestamps.
Also available as client.admin.listRules(orgId, opts?) when you need the paginated wrapper including the total count.
import type { Rule, ListRulesOptions } from "tokenist-js";
// All rules for an org
const rules: Rule[] = await tokenist.listRules("org_123");
// Narrow to active rate-limit rules only
const rateLimits: Rule[] = await tokenist.listRules("org_123", {
restrictionType: "rate_limit",
enabled: true,
});
// Paginated variant with total count
const { rules: list, total } = await tokenist.admin.listRules("org_123");Options
| Field | Type | Required | Description |
|---|---|---|---|
| subjectType | "user" | "group" | "feature" | optional | Only return rules targeting this subject type. |
| restrictionType | "warning" | "rate_limit" | "throttle" | "block" | optional | Only return rules with this restriction action. |
| enabled | boolean | optional | true → active rules only; false → disabled rules only. |
Type signatures
// Shared building block
interface TimeWindow {
count: number;
unit: "minute" | "hour" | "day" | "month";
}
// What causes the rule to fire
type RuleTriggerConfig =
| { type: "token_limit"; tokens: number; window: TimeWindow }
| { type: "cost_limit"; costUsd: number; window: TimeWindow }
| { type: "policy_violation"; policyId: string }
| { type: "inactivity"; duration: TimeWindow };
// What happens when it fires
type RuleRestrictionConfig =
| { type: "warning" }
| { type: "rate_limit"; maxRequests: number; window: TimeWindow }
| { type: "throttle"; delayMs: number }
| { type: "block" };
interface Rule {
id: string;
name: string;
enabled: boolean;
subject: { type: "user" | "group" | "feature"; ids: string[] };
trigger: RuleTriggerConfig;
restriction: RuleRestrictionConfig;
notifications: {
webhookUrl?: string;
injectResponse?: boolean;
responseMessage?: string;
};
createdAt: string;
updatedAt: string;
createdBy?: string;
lastTriggeredAt?: string | null;
}Example response
[
{
"id": "rule_01jq3abc",
"name": "Cap token usage per hour",
"enabled": true,
"subject": { "type": "user", "ids": [] },
"trigger": { "type": "token_limit", "tokens": 50000, "window": { "count": 1, "unit": "hour" } },
"restriction": { "type": "rate_limit", "maxRequests": 10, "window": { "count": 1, "unit": "hour" } },
"notifications": { "injectResponse": true, "responseMessage": "Hourly token limit reached." },
"createdAt": "2026-03-01T10:00:00Z",
"updatedAt": "2026-03-05T14:22:00Z",
"lastTriggeredAt": "2026-03-07T09:11:00Z"
},
{
"id": "rule_01jq3xyz",
"name": "Block on high spend",
"enabled": true,
"subject": { "type": "user", "ids": [] },
"trigger": { "type": "cost_limit", "costUsd": 50, "window": { "count": 1, "unit": "month" } },
"restriction": { "type": "block" },
"notifications": { "injectResponse": true, "responseMessage": "Monthly budget exceeded." },
"createdAt": "2026-03-02T08:00:00Z",
"updatedAt": "2026-03-02T08:00:00Z",
"lastTriggeredAt": null
}
]Putting it together
A typical server-side middleware pattern using all three methods:
async function openAiWithGuardrails(
userId: string,
messages: { role: string; content: string }[],
) {
// 1. Check before the request
const check = await tokenist.check({
userId,
model: "gpt-4o",
requestType: "chat",
feature: "support-chat",
});
if (!check.allowed) throw new Error(check.reason);
// 2. Call OpenAI
const start = Date.now();
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages,
});
const latencyMs = Date.now() - start;
// 3. Record usage
await tokenist.record({
userId,
model: "gpt-4o",
requestType: "chat",
inputTokens: response.usage!.prompt_tokens,
outputTokens: response.usage!.completion_tokens,
latencyMs,
success: true,
});
// 4. Log full payload
await tokenist.log({
model: "gpt-4o",
request: { messages },
response,
userId,
latencyMs,
});
return response;
}