TOON File Format Examples
Professional TOON format samples demonstrating token-efficient data serialization for LLM applications. Reduce API costs by 30-60% with real-world examples.
Download by Size (10KB - 1MB)
Need a specific file size for load testing or benchmarking? Download automatically generated dummy files in exactly the size you need without previewing.
10 KB
Small sample for basic testing
100 KB
Medium sample for throughput testing
1 MB
Large sample for benchmark testing
Real-World TOON Format Examples
User Profile Array (TOON Format)
Standard JSON array converted to TOON format demonstrating key deduplication and tabular structure for maximum token efficiency.
users:
id,name,email,role,verified,city,country
1,Alice Johnson,alice@example.com,admin,true,San Francisco,USA
2,Bob Smith,bob@example.com,user,false,London,UK
3,Carol White,carol@example.com,editor,true,Berlin,Germany
4,David Chen,david@example.com,user,true,Tokyo,JapanE-commerce Products with Pricing
Product catalog in TOON format showing how nested pricing and inventory data compresses efficiently for LLM applications.
products:
sku,name,category,price,stock,rating,currency
A101,Wireless Headphones,Electronics,299.99,50,4.8,USD
A102,Standing Desk,Furniture,599.00,12,4.6,USD
A103,Yoga Mat,Fitness,29.99,200,4.9,USD
A104,Coffee Maker,Appliances,89.95,35,4.7,USD
A105,Laptop Stand,Accessories,49.99,87,4.5,USDAPI Request Logs (Streaming Format)
Server log entries in TOON format ideal for feeding to LLMs analyzing API performance and error patterns.
logs:
timestamp,status,method,endpoint,duration_ms,user_id
2025-06-15T14:30:22Z,200,GET,/api/v1/users,45,u_1001
2025-06-15T14:30:23Z,201,POST,/api/v1/orders,120,u_1002
2025-06-15T14:30:24Z,404,GET,/api/v1/products/999,12,u_1003
2025-06-15T14:30:25Z,200,GET,/api/v1/dashboard,89,u_1001
2025-06-15T14:30:26Z,500,POST,/api/v1/payments,234,u_1004Application Configuration
Environment config and feature flags in TOON format, reducing token overhead while maintaining readability.
config:
app_name,MergeFlow
debug,false
api_version,v2.1
max_connections,100
timeout_seconds,30
features:
dark_mode,true
maintenance_mode,false
beta_access,true
analytics_enabled,trueAnalytics Dashboard Data
Time-series analytics data optimized for LLM-powered insights and natural language reporting.
daily_metrics:
date,pageviews,sessions,bounce_rate,avg_duration
2025-06-10,12450,8920,0.42,185
2025-06-11,13200,9150,0.39,192
2025-06-12,11800,8650,0.45,178
2025-06-13,14500,9800,0.37,205
2025-06-14,15200,10100,0.35,210LLM Training Conversation Sample
Conversation turns formatted in TOON for efficient fine-tuning and prompt context windows.
conversation:
turn,role,content,tokens
1,user,What is TOON format?,5
2,assistant,TOON is Token-Optimized Object Notation designed to reduce LLM token consumption.,12
3,user,How much can I save?,5
4,assistant,Typically 30-60% token reduction compared to standard JSON.,9
5,user,Where is it used?,4
6,assistant,API integrations chatbots data pipelines and LLM applications.,9Understanding TOON Format
TOON (Token-Optimized Object Notation) is a data serialization format specifically designed for Large Language Model applications. Unlike JSON which repeats keys for every object in an array, TOON declares keys once as headers and represents data in delimited rows, similar to CSV but optimized for nested structures.
Why TOON Format Matters for AI Applications
Language models like GPT-4, Claude, and Gemini charge based on token consumption. Standard JSON wastes tokens on redundant syntax: curly braces, quotes around keys, colons, and commas. For an array of 100 user objects with 10 fields each, JSON repeats those 10 keys 100 times. TOON declares them once.
- Token Reduction: Achieve 30-60% fewer tokens compared to equivalent JSON data structures.
- Cost Savings: Direct reduction in GPT-4 API costs at $0.03 per 1K tokens means significant monthly savings.
- Human Readable: Unlike binary formats, TOON maintains readability for debugging and manual inspection.
- LLM Compatible: Language models process TOON format naturally in prompt contexts without special handling.
The format excels with homogeneous arrays of objects (user lists, product catalogs, log entries) where repetitive structure creates maximum redundancy in JSON. TOON transforms these into table-like representations with headers declared once.
Convert & Optimize Your Data
Transform your existing JSON files to TOON format instantly. All processing happens in your browser with complete privacy and no server uploads.
Production Use Cases
Chatbot Context Windows
Load more conversation history or knowledge base content within LLM token limits by using TOON format instead of JSON.
E-commerce Product Feeds
Send entire product catalogs to LLMs for search relevance ranking while staying under context window constraints with 50% token reduction.
Analytics & Reporting
Generate natural language insights from tabular data more cost-effectively by feeding TOON-formatted metrics to GPT-4 or Claude.
TOON File Format Specifications
Technical specifications for the Token-Optimized Object Notation format, including syntax rules, supported structures, and integration guidelines.
| Property | Value |
|---|---|
| File Extension | .toon |
| MIME Type | text/plain (or text/toon) |
| Default Encoding | UTF-8 |
| Token Efficiency | 30-60% reduction vs JSON (varies by structure) |
| Primary Use Case | LLM prompt optimization, API cost reduction |
| Data Structure | Delimiter-separated values with header row for arrays |
| Default Delimiter | Comma (,) - configurable to tab, pipe, or semicolon |
| Nested Object Support | Key folding with dot notation (e.g., user.address.city) |
| Compatible LLMs | GPT-4, GPT-3.5, Claude, Gemini, Llama models |
How to Use TOON Format Files
TOON files work best when feeding structured data to Large Language Models. The format reduces token overhead while maintaining human readability for debugging. Below are practical examples for common programming languages.
Using TOON in Python LLM Applications
Read TOON files and include them in prompts sent to OpenAI, Anthropic, or other LLM providers. The reduced token count directly lowers API costs.
import openai
# Read TOON formatted data
with open("sample-users-toon.toon", "r", encoding="utf-8") as f:
toon_data = f.read()
# Include in LLM prompt with minimal token overhead
prompt = f"""
Context Data:
{toon_data}
Task: Summarize the user distribution by country.
"""
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content)Loading TOON in JavaScript/Node.js
Use TOON format in frontend applications or Node.js backends to optimize token consumption when interacting with language model APIs.
const fs = require('fs');
const { Configuration, OpenAIApi } = require('openai');
// Read TOON file
const toonData = fs.readFileSync('sample-products-toon.toon', 'utf-8');
// Prepare optimized prompt
const prompt = `Product Catalog:\n${toonData}\n\nGenerate SEO descriptions for top 3 products.`;
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
});
const openai = new OpenAIApi(configuration);
const completion = await openai.createChatCompletion({
model: "gpt-4",
messages: [{ role: "user", content: prompt }],
});
console.log(completion.data.choices[0].message.content);Converting JSON to TOON Programmatically
Integrate TOON conversion into your data pipeline for automatic optimization before sending to LLMs. This Python example shows basic array-to-table conversion.
import json
import csv
from io import StringIO
def json_to_toon(json_data, delimiter=','):
"""Convert JSON array to TOON format"""
if isinstance(json_data, list) and json_data and isinstance(json_data[0], dict):
# Extract all unique keys
keys = list(json_data[0].keys())
# Create header and rows
output = StringIO()
writer = csv.writer(output, delimiter=delimiter)
writer.writerow(keys)
for item in json_data:
row = [item.get(key, '') for key in keys]
writer.writerow(row)
return output.getvalue().strip()
return json.dumps(json_data)
# Example usage
with open('products.json', 'r') as f:
data = json.load(f)
toon_output = json_to_toon(data['products'])
# Save to file
with open('products.toon', 'w') as f:
f.write(f"products:\n{toon_output}")
print(f"Token reduction: ~{len(toon_output) / len(json.dumps(data)) * 100:.0f}% of original")Bash/Shell Script for Batch Conversion
Automate TOON conversion in CI/CD pipelines or batch processing workflows using command-line tools.
#!/bin/bash
# convert_to_toon.sh - Convert all JSON files in directory to TOON
for jsonfile in *.json; do
echo "Converting $jsonfile..."
# Extract filename without extension
basename="${jsonfile%.json}"
# Convert using jq and custom formatting
jq -r '.[] | [.id, .name, .email, .role] | @csv' "$jsonfile" > "${basename}.toon"
# Add header
echo "id,name,email,role" | cat - "${basename}.toon" > temp && mv temp "${basename}.toon"
echo "Created ${basename}.toon"
done
echo "Conversion complete. TOON files ready for LLM integration."Token & Cost Comparison: JSON vs TOON
Real-world examples demonstrating token reduction and cost savings across different data structures. Calculations based on GPT-4 pricing at $0.03 per 1,000 input tokens.
| Data Type | JSON Tokens | TOON Tokens | Reduction | Monthly Savings* |
|---|---|---|---|---|
| 100 User Profiles | 3,200 | 1,400 | 56% | $54 |
| 500 Product Catalog | 12,500 | 5,800 | 54% | $201 |
| 1,000 API Logs | 18,000 | 7,200 | 60% | $324 |
| Analytics Dashboard (30 days) | 2,800 | 1,200 | 57% | $48 |
* Savings calculated assuming 1,000 API calls per month at GPT-4 input token pricing ($0.03/1K tokens)
TOON Format Best Practices
- Use for Homogeneous Arrays: TOON excels with arrays of objects sharing the same structure (users, products, logs). Heterogeneous data benefits less.
- Choose Appropriate Delimiters: Use comma for standard data, tab for better visual alignment, pipe when data contains commas.
- Maintain Readability: TOON should remain debuggable. Avoid excessive compression that makes manual inspection difficult.
- Document the Format: When sharing TOON files with team members, include format documentation explaining the structure.
- Version Control Friendly: TOON text format works well with git and other VCS systems, showing clear line-by-line diffs.
- Test with Actual Tokenizers: Validate savings using actual LLM provider tokenizers rather than assumptions about compression ratios.