Skimly API Documentation

Reduce LLM costs with intelligent content blobbing and Smart Compression Timing. Upload large context once, reference it with lightweight pointers in chat requests, and let AI optimize compression decisions.

New: Official SDKs Available

Get started quickly with our official SDKs for Node.js and Python, or use raw HTTP if you prefer. Both approaches are fully supported.

Quick Start

Node.js / TypeScript

npm install @skimly/sdk

import { fromEnv } from '@skimly/sdk'
const client = fromEnv()

const resp = await client.messages.create({
  provider: 'openai',
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello!' }]
})

View full quickstart →

Python

pip install skimly

from skimly import Skimly
client = Skimly.from_env()

resp = client.chat(
    provider="anthropic",
    model="claude-3-sonnet",
    messages=[{"role": "user", "content": "Hello!"}]
)

View full quickstart →

Core Concepts

1. Blob Once

Upload large, rarely-changing content (policies, docs, threads) to get a blob ID.

2. Reference in Chat

Use the blob ID as a pointer in chat requests instead of sending the full content.

3. Save Tokens

Dramatically reduce token usage and costs while maintaining full context.

Smart Compression Technology

Skimly goes beyond simple size thresholds with AI-powered compression that understands your content and workflow:

Smart Timing

Predicts when users will access content based on tool context and content analysis. Build logs are compressed aggressively, error details are preserved.

Content Analysis

Detects patterns in build logs, error stacks, diffs, and other development content to make intelligent compression decisions.

Cost Optimization

Prevents negative savings by analyzing deref costs before compression. Only compresses content that provides real ROI.

Session Tracking

Tracks compression decisions and calculates real ROI across user sessions. Provides honest assessment of actual vs predicted performance.

Smart Compression

Advanced compression technology that understands your content and workflow.

Features: Timing predictions, content analysis, cost optimization, session tracking

Production Ready

Enterprise-grade features with easy monitoring and rollback options.

Includes: Usage tracking, cost analysis, compression metrics, audit logs

Key Endpoints

Core Operations

POST /v1/chat - Chat completions with compression
POST /v1/blobs - Create content blobs
GET /v1/fetch - Retrieve blob content
POST /v1/transform - Smart content compression

Management

GET /v1/keys - List API keys
POST /v1/keys - Create new API key
DELETE /v1/keys/{id} - Revoke API key
GET /v1/config - System configuration

Ready to get started?

Start with our Quickstart Guide to get up and running in minutes, then explore the API Reference for complete documentation of all features.