Code SnippetSingleV2
PublicClaudeMay 15, 202619 viewsV2

Prompt Cache Prewarm — Cut Time-to-First-Token on Claude API

Send your system prompt before the user prompt. Claude writes it to the cache but skips generating any output. When the real user request lands, it will hit a warm cache — reducing time-to-first-token by up to 52% for 160k+ token system prompts.

Code Snippet ContentV2
Click on [highlighted text] to fill in your details before copying
# prewarm.py
client.messages.create(
    model="claude-opus-4-7",
    max_tokens=0,  # disable generation
    system=[{
        "type": "text",
        "text": BIG_SYSTEM_PROMPT,
        "cache_control": {"type": "ephemeral"},
    }],
    messages=[{"role": "user", "content": "."}],
)

Like this code snippet?

Create a free account to save, fork, and improve it with AI.

Get Started Free

Comments

Sign in to join the conversation.

Join PromptCentral — it's free

Start Free