Code SnippetSingleV2
PublicClaudeMay 15, 2026133 viewsV2

Prompt Cache Prewarm — Cut Time-to-First-Token on Claude API

Send your system prompt before the user prompt. Claude writes it to the cache but skips generating any output. When the real user request lands, it will hit a warm cache — reducing time-to-first-token by up to 52% for 160k+ token system prompts.

Code Snippet ContentV2
Click on [highlighted text] to fill in your details before copying
# prewarm.py
client.messages.create(
    model="claude-opus-4-7",
    max_tokens=0,  # disable generation
    system=[{
        "type": "text",
        "text": BIG_SYSTEM_PROMPT,
        "cache_control": {"type": "ephemeral"},
    }],
    messages=[{"role": "user", "content": "."}],
)

Bookmark this code snippet to your private workspace

Keep it for later, fork a private copy, or improve it in Studio before you publish anything.

Comments

Sign in to join the conversation.

Explore related prompts

Continue through the creator, topic, or matching tags.

Explore all prompts

Bookmark prompts privately

Start Free