Prompt Cache Prewarm — Cut Time-to-First-Token on Claude API
Send your system prompt before the user prompt. Claude writes it to the cache but skips generating any output. When the real user request lands, it will hit a warm cache — reducing time-to-first-token by up to 52% for 160k+ token system prompts.
Code Snippet ContentV2
Click on [highlighted text] to fill in your details before copying
# prewarm.py
client.messages.create(
model="claude-opus-4-7",
max_tokens=0, # disable generation
system=[{
"type": "text",
"text": BIG_SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"},
}],
messages=[{"role": "user", "content": "."}],
)Bookmark this code snippet to your private workspace
Keep it for later, fork a private copy, or improve it in Studio before you publish anything.
Comments
Sign in to join the conversation.
Explore related prompts
Continue through the creator, topic, or matching tags.