Prompt Cache Prewarm — Cut Time-to-First-Token on Claude API
Send your system prompt before the user prompt. Claude writes it to the cache but skips generating any output. When the real user request lands, it will hit a warm cache — reducing time-to-first-token by up to 52% for 160k+ token system prompts.
Code Snippet ContentV2
Click on [highlighted text] to fill in your details before copying
# prewarm.py
client.messages.create(
model="claude-opus-4-7",
max_tokens=0, # disable generation
system=[{
"type": "text",
"text": BIG_SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"},
}],
messages=[{"role": "user", "content": "."}],
)Like this code snippet?
Create a free account to save, fork, and improve it with AI.
Comments
Sign in to join the conversation.