Inference API
When you call a deployed reranker model through Datawizz (e.g. Datawizz Serverless or your endpoint), you use the chat completions request format. The gateway transforms it to the backend/v1/rerank format and converts the rerank response back to a chat completion so logging and evaluation work as with other models.
Request (chat completion format)
Send a standard chat completion request with:model: Your deployed reranker model name.messages:- One
systemmessage: itscontentis the query (string). - One
usermessage: the documents to score. Either:- A string (single document), or
- An array of content parts:
{"type": "text", "text": "Document content"}for each document.
- One
Recommendation (Qwen3-Reranker): For best performance with Qwen3-Reranker models, use the native
<Instruct>, <Query>, and <Document> tag format in your messages. The simplified format (plain query in system, plain document in user) also works, but the tagged format aligns with how the model was trained and may yield better relevance scores.The default Instruct for Qwen3-Reranker is: "Given a web search query, retrieve relevant passages that answer the query". You can customize this for your specific task.- Exactly one
systemmessage with non-empty stringcontent(the query, or<Instruct>+<Query>tags for Qwen3 format). - Exactly one
usermessage whosecontentis either a non-empty string or an array of{ "type": "text", "text": "..." }with at least one document (or<Document>tags for Qwen3 format).
Response (chat completion format)
The response is returned as a normal chat completion. The assistant message content is a JSON string of the rerank results (index and score only; document text is not echoed):choices[0].message.content: JSON string of an array of{ "index": number, "relevance_score": number }, ordered by the backend (typically by relevance, highest first).index: Original document index (0-based) in the request.relevance_score: Relevance score in [0, 1] (higher = more relevant).usage: Token counts.
Feedback / Improvement Signal
To send feedback on reranker results, use the feedback endpoint:X-Feedback-Url and X-Feedback-Token provide the URL and JWT token for submitting feedback. See Feedback Signals for full details.
For rerankers, provide an improvement object with the corrected relevance scores in the same structure as the model’s output:
Single result:
0.95 with your ideal relevance score (0–1).
Batch results (multiple documents):
relevance_score with your corrected score for that document.
Pointwise (binary) feedback: Use 1 for relevant (“Yes”) and 0 for not relevant (“No”):
index corresponds to the original document index in the request. This feedback can be used for supervised fine-tuning or to track model performance over time.
Example API call:
score and weight are required fields but not meaningful for reranker feedback—just use the defaults (score: 0, weight: 1.0). The important part is the improvement object with your corrected relevance scores. See Feedback Signals API for full schema details.Training Data Format
Reranker training in Datawizz uses the Swift worker. You provide data in the Datawizz format (one sample per row in your dataset or in a JSON/JSONL file). Each sample hasinput (query + document) and output (relevance label).
Per-sample structure
Each training example must have:| Field | Type | Description |
|---|---|---|
input | array | Exactly two messages: system = query (or <Instruct>+<Query> tags), user = document (or <Document> tag). |
output | object | Single message with role "assistant" and content = relevance label (see below). |
id | string | Optional; preserved for tracking. |
input[0]:{ "role": "system", "content": "..." }— the search query, or<Instruct>+<Query>tags for Qwen3 format.input[1]:{ "role": "user", "content": "..." }— the document to be judged, or<Document>tag for Qwen3 format.
Recommendation (Qwen3-Reranker): Use the same
<Instruct>/<Query>/<Document> tag format in training data as at inference. This aligns training with how you call the model and may improve fine-tuning results. The simplified format (plain query in system, plain document in user) also works.Use the default Instruct "Given a web search query, retrieve relevant passages that answer the query" or customize it for your specific task.output.content):
- Pointwise (default,
loss_type: "pointwise_reranker"): Binary 0/1.- Text:
"Yes","True","1","relevant"→ 1; anything else (e.g."No","False","0") → 0. - Numeric: Value in [0, 1]; ≥ 0.5 → 1, < 0.5 → 0.
- Text:
- MSE (
loss_type: "mse"): Continuous score in [0, 1].- Text:
"Yes"/"True"/"1"/"relevant"→ 1.0;"No"/"False"/"0"→ 0.0; otherwise 0.5. - Numeric: Clamped to [0, 1] and used as the target.
- Text:
Sample training file (Datawizz format)
Datawizz supports importing reranker training data only in JSONL format (one JSON object per line). A JSON array is not supported for import. Each line must be a single object with the structure below. Example JSONL — Qwen3 format (recommended):output.content, e.g. 0.9, 0.2, or strings that map to 0.0/0.5/1.0 as above.
Training configuration (Swift reranker)
When creating a reranker model in the app, you choose:- Loss type:
pointwise_reranker(binary 0/1) ormse(regression in [0, 1]). - Base models: e.g. Qwen3-Reranker-0.6B/4B, bge-reranker-v2-m3, gte-reranker-modernbert-base.
Summary
| Use case | Request / data | Response / label |
|---|---|---|
| Inference | Chat completion: messages = system (query or <Instruct>/<Query> tags) + user (documents or <Document> tags) | choices[0].message.content = JSON string of [{index, relevance_score}, ...] |
| Feedback | POST /{project_uid}/{endpoint_uid}/feedback/{inference_log_id} with improvement containing corrected scores | — |
| Training | Datawizz format: input = [system=query/tags, user=document/tags], output = assistant content = “Yes”/“No” or 0/1 or float [0,1] | N/A (labels define relevance for training) |
input messages (system then user) and one output with the appropriate label type for your chosen loss_type.