Extracts text from images via OCR, detects PII within the text, and returns redacted images with sensitive information obscured.Documentation Index
Fetch the complete documentation index at: https://docs.datawizz.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Image Redaction Plugin processes images in multimodal AI requests, using Optical Character Recognition (OCR) to extract text, detecting PII within that text, and then visually redacting the sensitive areas by overlaying them with colored boxes. This prevents PII in screenshots, documents, or photos from being exposed to AI models.How It Works
- Image Processing: Extracts images from message content (supports both URLs and base64 data URIs)
- OCR Analysis: Uses Tesseract OCR to extract text from images
- PII Detection: Analyzes extracted text using Microsoft Presidio to identify PII
- Visual Redaction: Overlays detected PII regions with colored boxes to obscure the text
- Return Modified Images: Returns images as base64 data URIs with PII redacted
Supported PII Types
The plugin can detect and redact 30+ entity types across multiple regions:Personal Information
PERSON- Person namesEMAIL_ADDRESS- Email addressesPHONE_NUMBER- Phone numbersDATE_TIME- Dates and timesLOCATION- Geographic locationsURL- Web addressesIP_ADDRESS- IP addresses
Financial
CREDIT_CARD- Credit card numbersCRYPTO- Cryptocurrency wallet addressesIBAN_CODE- International bank account numbers
United States
US_SSN- Social Security NumbersUS_DRIVER_LICENSE- Driver’s license numbersUS_PASSPORT- Passport numbersUS_BANK_NUMBER- Bank account numbersUS_ITIN- Individual Taxpayer Identification Numbers
International
UK_NHS- UK National Health Service numbersSG_NRIC_FIN- Singapore NRIC/FIN numbersAU_ABN,AU_ACN,AU_TFN,AU_MEDICARE- Australian identifiersIN_PAN,IN_AADHAAR,IN_VEHICLE_REGISTRATION- Indian identifiersES_NIF- Spanish tax identificationIT_FISCAL_CODE,IT_DRIVER_LICENSE,IT_VAT_CODE,IT_PASSPORT,IT_IDENTITY_CARD- Italian identifiers
Healthcare
MEDICAL_LICENSE- Medical license numbersNRP- Medical prescriber numbers
Configuration
Basic Settings
entities (optional, array of strings)
List of PII entity types to detect and redact in images. If not specified, all detected entities are redacted.
language (string, default: "en")
Language code for OCR text analysis (e.g., "en", "es", "de").
score_threshold (number, default: 0.5)
Minimum confidence score (0-1) required to redact an entity. Lower values catch more PII but may increase false positives.
Visual Redaction Settings
fill_color (string or RGB array, default: "black")
Fill color for redacted areas. Can be:
- Color name:
"black","white","gray","red", etc. - RGB tuple:
[0, 0, 0]for black,[255, 255, 255]for white,[255, 0, 0]for red
padding (number, default: 10)
Padding in pixels around detected text to ensure complete coverage. Higher values provide more margin but may obscure surrounding content.
Advanced OCR Settings
ocr_kwargs (optional, object)
Additional keyword arguments to pass to the OCR engine (Tesseract). Common options:
lang: Language code (e.g.,"eng","spa","deu")config: Tesseract configuration string (e.g.,"--psm 6"for uniform text block)
Custom Pattern Recognition
ad_hoc_recognizers (optional, array of objects)
Custom regex-based recognizers for detecting domain-specific patterns not covered by standard entity types.
Structure:
Advanced Detection
allow_list (optional, array of strings)
Terms/patterns that should NOT be redacted from images, even if they match detection patterns.
deny_list (optional, array of strings)
Terms/patterns that should ALWAYS be redacted from images, regardless of detection confidence.
Example Configurations
Basic Image Redaction
Custom Fill Color
RGB: 255, 0, 0) boxes.
Enhanced OCR for Non-English
Custom Pattern Detection
EMP-123456 or PROJ-ABC-1234 in addition to standard emails.
High Sensitivity Mode
Image Format Support
The plugin handles multiple image input formats:HTTP/HTTPS URLs
Base64 Data URIs
Behavior
- Fail-open: If the plugin encounters an error, the original messages are returned unmodified
- Multi-image support: Processes all images in all messages independently
- Format preservation: Maintains message structure (multimodal content arrays)
- URL conversion: Converts fetched URLs to base64 data URIs for redacted images
- Debug output: Returns detailed processing information when enabled in Gateway UI
- No blocking: Always allows requests to proceed (unlike Detection Plugin)
Performance Considerations
- OCR latency: Image processing takes 1-3 seconds per image depending on size and complexity
- Image size limits: Large images (>10MB) may timeout; consider resizing before processing
- Cold starts: First container invocation may take 2-3 seconds
- Base64 size: Redacted images returned as base64 may be large; monitor response sizes
Use Cases
- Screenshot sanitization: Remove PII from screenshots before sharing with AI models
- Document processing: Redact sensitive information from scanned documents
- Support tickets: Process user-submitted images containing PII in customer support scenarios
- Compliance: Ensure uploaded images don’t expose regulated data (HIPAA, GDPR)
- Testing: Sanitize production screenshots for use in development/testing environments
- Multi-modal AI safety: Prevent vision models from accessing PII in image content
Limitations
- OCR quality: Detection accuracy depends on image quality, text clarity, and font legibility
- Handwritten text: OCR may struggle with handwriting; results vary
- Complex layouts: Dense or overlapping text may reduce detection accuracy
- Non-text PII: Cannot detect faces, objects, or other non-textual PII
- Language support: OCR quality varies by language; best results with Latin scripts
Configuration Schema
Supported Phases
- Request Phase: Supports processing during the REQUEST phase
- Response Phase: Supports processing during the RESPONSE phase
- Log Phase: Supports processing during the LOG phase