Prompt Management
Use Langfuse to effectively manage and version your prompts. Langfuse prompt management is a Prompt CMS (Content Management System).
What is prompt management?
Prompt management is a systematic approach to storing, versioning and retrieving prompts in LLM applications. Key aspects of prompt management include version control, decoupling prompts from code, monitoring, logging and optimizing prompts as well as integrating prompts with the rest of your application and tool stack.
Why use prompt management?
Can’t I just hardcode my prompts in my application and track them in Git? Yes, well… you can and all of us have done it.
Typical benefits of using a CMS apply here:
- Decoupling: deploy new prompts without redeploying your application.
- Non-technical users can create and update prompts via Langfuse Console.
- Quickly rollback to a previous version of a prompt.
Platform benefits:
- Track performance of prompt versions in Langfuse Tracing.
Performance benefits compared to other implementations:
- No latency impact after first use of a prompt due to client-side caching and asynchronous cache refreshing.
- Support for text and chat prompts.
- Edit/manage via UI, SDKs, or API.
Prompt Engineering FAQ
Langfuse prompt object
{
"name": "movie-critic",
"type": "text",
"prompt": "As a {{criticLevel}} movie critic, do you like {{movie}}?",
"config": {
"model": "gpt-3.5-turbo",
"temperature": 0.5,
"supported_languages": ["en", "fr"]
},
"version": 1,
"labels": ["production", "staging", "latest"],
"tags": ["movies"]
}
name
: Unique name of the prompt within a Langfuse project.type
: The type of the prompt content (text
orchat
). Default istext
.prompt
: The text template with variables (e.g.This is a prompt with a {{variable}}
). For chat prompts, this is a list of chat messages each withrole
andcontent
.config
: Optional JSON object to store any parameters (e.g. model parameters or model tools).version
: Integer to indicate the version of the prompt. The version is automatically incremented when creating a new prompt version.labels
: Labels that can be used to fetch specific prompt versions in the SDKs.- When using a prompt without specifying a label, Langfuse will serve the version with the
production
label. latest
points to the most recently created version.- You can create any additional labels, e.g. for different environments (
staging
,production
) or tenants (tenant-1
,tenant-2
).
- When using a prompt without specifying a label, Langfuse will serve the version with the
How it works
Create/update prompt
If you already have a prompt with the same name
, the prompt will be added as a new version.
Use prompt
At runtime, you can fetch the latest production version from Langfuse.
from langfuse import Langfuse
# Initialize Langfuse client
langfuse = Langfuse()
# Get current `production` version of a text prompt
prompt = langfuse.get_prompt("movie-critic")
# Insert variables into prompt template
compiled_prompt = prompt.compile(criticlevel="expert", movie="Dune 2")
# -> "As an expert movie critic, do you like Dune 2?"
Chat prompts
# Get current `production` version of a chat prompt
chat_prompt = langfuse.get_prompt("movie-critic-chat", type="chat") # type arg infers the prompt type (default is 'text')
# Insert variables into chat prompt template
compiled_chat_prompt = chat_prompt.compile(criticlevel="expert", movie="Dune 2")
# -> [{"role": "system", "content": "You are an expert movie critic"}, {"role": "user", "content": "Do you like Dune 2?"}]
Optional parameters
# Get specific version
prompt = langfuse.get_prompt("movie-critic", version=1)
# Get specific label
prompt = langfuse.get_prompt("movie-critic", label="staging")
# Get latest prompt version. The 'latest' label is automatically maintained by Langfuse.
prompt = langfuse.get_prompt("movie-critic", label="latest")
# Extend cache TTL from default 60 to 300 seconds
prompt = langfuse.get_prompt("movie-critic", cache_ttl_seconds=300)
# Number of retries on fetching prompts from the server. Default is 2.
prompt = langfuse.get_prompt("movie-critic", max_retries=3)
# Timeout per call to the Langfuse API in seconds. Default is 20.
prompt = langfuse.get_prompt("movie-critic", fetch_timeout_seconds=3)
Attributes
# Raw prompt including {{variables}}. For chat prompts, this is a list of chat messages.
prompt.prompt
# Config object
prompt.config
Link with Langfuse Tracing (optional)
Add the prompt object to the generation
call in the SDKs to link the generation in Langfuse Tracing to the prompt version. This linkage enables tracking of metrics by prompt version and name, such as “movie-critic”, directly in the Langfuse UI. Metrics like scores per prompt version provide insights into how modifications to prompts impact the quality of the generations. If a fallback prompt is used, no link will be created.
This is currently unavailable when using the LlamaIndex integration.
Decorators
from langfuse.decorators import langfuse_context, observe
@observe(as_type="generation")
def nested_generation():
prompt = langfuse.get_prompt("movie-critic")
langfuse_context.update_current_observation(
prompt=prompt,
)
@observe()
def main():
nested_generation()
main()
Low-level SDK
langfuse.generation(
...
+ prompt=prompt
...
)
Rollbacks (optional)
When a prompt has a production
label, then that version will be served by default in the SDKs. You can quickly rollback to a previous version by setting the production
label to that previous version in the Langfuse UI.
End-to-end examples
The following example notebooks include end-to-end examples of prompt management:
We also used Prompt Management for our Docs Q&A Chatbot and traced it with Langfuse. You can get view-only access to the project by signing up to the public demo.
Caching in client SDKs
Langfuse prompts are served from a client-side cache in the SDKs. Therefore, Langfuse Prompt Management does not add any latency to your application when a cached prompt is available from a previous use. Optionally, you can pre-fetch prompts on application startup to ensure that the cache is populated (example below).
Optional: Pre-fetch prompts on application start
To ensure that your application never hits an empty cache at runtime (and thus adding an initial delay of fetching the prompt), you can pre-fetch the prompts during the application startup. This pre-fetching will populate the cache and ensure that the prompts are readily available when needed.
Example implementations:
from flask import Flask, jsonify
from langfuse import Langfuse
# Initialize the Flask app and Langfuse client
app = Flask(__name__)
langfuse = Langfuse()
def fetch_prompts_on_startup():
# Fetch and cache the production version of the prompt
langfuse.get_prompt("movie-critic")
# Call the function during application startup
fetch_prompts_on_startup()
@app.route('/get-movie-prompt/<movie>', methods=['GET'])
def get_movie_prompt(movie):
prompt = langfuse.get_prompt("movie-critic")
compiled_prompt = prompt.compile(criticlevel="expert", movie=movie)
return jsonify({"prompt": compiled_prompt})
if __name__ == '__main__':
app.run(debug=True)
Optional: Customize caching duration (TTL)
The caching duration is configurable if you wish to reduce network overhead of the Langfuse Client. The default cache TTL is 60 seconds. After the TTL expires, the SDKs will refetch the prompt in the background and update the cache. Refetching is done asynchronously and does not block the application.
# Get current `production` prompt version and cache for 5 minutes
prompt = langfuse.get_prompt("movie-critic", cache_ttl_seconds=300)
Optional: Disable caching
You can disable caching by setting the cacheTtlSeconds
to 0
. This will ensure that the prompt is fetched from the Langfuse API on every call. This is recommended for non-production use cases where you want to ensure that the prompt is always up to date with the latest version in Langfuse.
prompt = langfuse.get_prompt("movie-critic", cache_ttl_seconds=0)
# Common in non-production environments, no cache + latest version
prompt = langfuse.get_prompt("movie-critic", cache_ttl_seconds=0, label="latest")
Performance measurement of inital fetch (empty client-side cache)
We measured the execution time of the following snippet with fully disabled caching.
prompt = langfuse.get_prompt("perf-test")
prompt.compile(input="test")
Results from 1000 sequential executions in a local jupyter notebook using Langfuse Cloud (includes network latency):
count 1000.000000
mean 0.178465 sec
std 0.058125 sec
min 0.137314 sec
25% 0.161333 sec
50% 0.165919 sec
75% 0.171736 sec
max 0.687994 sec
Optional: Guaranteed availability
Implementing this is usually not necessary as it adds complexity to your application and the Langfuse API is highly available. However, if you require 100% availability, you can use the following options.
The Langfuse API has high uptime and prompts are cached locally in the SDKs to prevent network issues from affecting your application.
However, get_prompt()
/getPrompt()
will throw an exception if:
- No local (fresh or stale) cached prompt is available -> new application instance fetching prompt for the first time
- and network request fails -> networking or Langfuse API issue (after retries)
To gurantee 100% availability, there are two options:
- Pre-fetch prompts on application startup and exit the application if the prompt is not available.
- Provide a
fallback
prompt that will be used in these cases.
Option 1: Pre-fetch prompts on application startup and exit if not available
from langfuse import Langfuse
import sys
# Initialize Langfuse client
langfuse = Langfuse()
def fetch_prompts_on_startup():
try:
# Fetch and cache the production version of the prompt
langfuse.get_prompt("movie-critic")
except Exception as e:
print(f"Failed to fetch prompt on startup: {e}")
sys.exit(1) # Exit the application if the prompt is not available
# Call the function during application startup
fetch_prompts_on_startup()
# Your application code here
Option 2: Fallback
from langfuse import Langfuse
langfuse = Langfuse()
# Get `text` prompt with fallback
prompt = langfuse.get_prompt(
"movie-critic",
fallback="Do you like {{movie}}?"
)
# Get `chat` prompt with fallback
chat_prompt = langfuse.get_prompt(
"movie-critic-chat",
type="chat",
fallback=[{"role": "system", "content": "You are an expert on {{movie}}"}]
)
# True if the prompt is a fallback
prompt.is_fallback