Features
- Dual Deployment Support: Works with Cloudflare Workers AI binding (zero-latency) or REST API fallback (for standalone binaries)
- 14 Hazard Categories: Detects violence, hate speech, sexual content, self-harm, and more
- Flexible Actions: Either reject unsafe requests or log warnings while allowing them through
- Conversation-Aware: Can scan individual messages or entire conversation history
- Token Usage Tracking: Reports token consumption for monitoring and cost control
Hazard Categories
Llama Guard 3 detects the following categories (S1-S14):- S1: Violent Crimes
- S2: Non-Violent Crimes
- S3: Sex-Related Crimes
- S4: Child Sexual Exploitation
- S5: Defamation
- S6: Specialized Advice (financial, medical, legal)
- S7: Privacy Violations
- S8: Intellectual Property
- S9: Indiscriminate Weapons
- S10: Hate Speech
- S11: Suicide & Self-Harm
- S12: Sexual Content
- S13: Elections
- S14: Code Interpreter Abuse
Configuration Options
action
(string)
reject
: Block requests containing unsafe content (default)warn
: Log warnings but allow requests to proceed- Default:
"reject"
scanAllMessages
(boolean)
true
: Scans all messages in the conversationfalse
: Only scans the latest message- Default:
false
rejectMessage
(string)
- Custom message to return when content is rejected
- If not provided, returns a message listing violated categories
- Example:
"Your message violates our content policy"
temperature
(number)
- Controls randomness in model output (0-5)
- Lower values = more consistent classification
- Default:
0.6
maxTokens
(integer)
- Maximum tokens to generate in response (1-2048)
- Default:
256
cloudflareAccountId
(string)
- Required for REST API fallback (standalone binary)
- Your Cloudflare Account ID
- Not needed when running in Cloudflare Workers
cloudflareAuthToken
(string)
- Required for REST API fallback (standalone binary)
- Cloudflare API token with Workers AI access
- Not needed when running in Cloudflare Workers
Example Configurations
Basic Usage (Cloudflare Workers)
Scan Entire Conversation
Warning Mode (Non-Blocking)
Standalone Binary with REST API
Response Behavior
When Content is Safe
- Request proceeds normally
- No modifications to messages
- Debug logs indicate “Content classified as SAFE”
When Content is Unsafe (action: “reject”)
- Request is blocked
- Returns rejection reason with violated categories
- Example:
"Content safety check failed. Violated categories: S10, S11"
When Content is Unsafe (action: “warn”)
- Request proceeds with warning in debug logs
- No modifications to messages
- Warning includes violated categories
- Useful for monitoring without blocking users
On Error
- Plugin fails gracefully
- Request is allowed to proceed with warning in debug
- Error details logged for troubleshooting
Deployment Modes
Cloudflare Workers (Recommended)
Uses the native AI binding for zero-latency, cost-efficient inference:- ✅ Zero cold start
- ✅ Lower latency
- ✅ No auth tokens needed
- ✅ Integrated billing
Standalone Binary (REST API Fallback)
Uses Cloudflare’s REST API when AI binding is not available:- ✅ Works outside Cloudflare Workers
- ✅ Self-hosted deployment
- ✅ Same model and accuracy
Use Cases
Content Moderation
Block harmful content before it reaches your LLM:Conversation Safety
Monitor entire conversations for policy violations:Compliance Monitoring
Log safety issues without blocking users:Fine-Grained Control
Adjust temperature for stricter or looser classification:Performance
- Cloudflare Workers: ~100-200ms latency
- REST API: ~200-500ms latency (depending on network)
- Token Usage: Typically 50-200 tokens per request
- Cost: 0.03 per 1M output tokens
Limitations
- Message Format: Only processes
user
andassistant
roles (system/developer/tool messages treated as user) - Text Only: Images, audio, and other media are not analyzed
- Language: Optimized for English, may have reduced accuracy in other languages
- Context Length: Limited by model’s context window (~8K tokens)
Troubleshooting
Error: “cloudflareAccountId and cloudflareAuthToken are required”
Cause: Running outside Cloudflare Workers without REST API credentials Solution: Add your Cloudflare Account ID and API token to the configurationError: “Cloudflare API error (401)”
Cause: Invalid or expired API token Solution:- Generate a new API token at https://dash.cloudflare.com/profile/api-tokens
- Ensure it has “Workers AI” permissions
- Update
cloudflareAuthToken
in configuration
False Positives
Cause: Model may flag legitimate content Solution:- Increase
temperature
(e.g., 0.8) for less strict classification - Use
action: "warn"
to monitor without blocking - Review debug logs to understand classification reasoning
High Latency
Cause: Using REST API instead of Workers AI binding Solution:- Deploy to Cloudflare Workers for optimal performance
- Consider caching results for repeated content
- Use
scanAllMessages: false
to scan only recent messages
Integration Examples
With Prompt Engineering
With Logging Plugin
Multi-Layer Protection
Use alongside other plugins for defense-in-depth:- Regex Detect: Block obvious patterns (profanity, PII)
- Llama Guard: Catch nuanced safety issues
- Invisible Text: Prevent steganography attacks