Create a dataset
From the project sidebar, open Datasets and create a new dataset with a name and optional description. New datasets start empty; use the toolbar actions below to add data.Dataset list
The list view shows all datasets in the project: name, creation time, item count, size, and a Manage link to open each dataset. The table is sortable by column.Inside a dataset
After opening a dataset, the page has a top toolbar and three tabs: Items, Stats, and Jobs. The dataset detail page has three tabs:- Items — A paginated table of every sample. Each row shows ID, created time, tags, input (user prompt), and output (model response). You can edit input/output in place, manage tags, and use the row menu for actions like adding outputs or deleting an item.
- Stats — A token-length histogram and summary stats (min, P95, P99, max) to help you set
max_seq_lengthfor training. - Jobs — Lists all background jobs (uploads, imports, exports, annotations, regenerations) with their status (Done / Failed / In Progress), progress, and timestamps. Expand a row for logs and details; completed export files are downloadable here.
Toolbar actions
Import Logs
Import Logs pulls inference requests from your Datawizz endpoints into the dataset. You can filter by endpoint, model, feedback signals, tags, and more; use the Query tab for advanced filters or the Views tab for saved presets. Selected logs are imported as dataset items (input + output).Import File
Import File uploads data from CSV, JSONL, or Parquet. You pick a format (Text, Full, Data, or OpenAI) and optionally a system message; preview is available before confirming. Supported formats:| Format | Description |
|---|---|
| Text | Input and output are plain strings; auto-wrapped into chat format. |
| Full | Input is a JSON array of messages; output is a single message. |
| Data | Raw columns; you map fields in the next step. |
| OpenAI | OpenAI-style messages array. |
Create Split
Create Split divides the dataset into two parts (e.g. 80/20 for train/eval). You set the split ratio and name the new datasets; they can then be used in training and evaluation configs.Annotate Dataset
Annotate Dataset runs evaluators on each item’s outputs and attaches scores. You choose evaluators, set weights, and start the job; scores show on the Items table and can drive KTO training or evaluation.Regen Outputs
Regen Outputs generates new assistant responses for all items with a chosen model. New outputs are added alongside existing ones. You configure provider/model (temperature, max tokens, etc.) and at least one output tag. Optional: a regeneration instruction and context mode (instruction + original input, or also include a selected output column for improvement-style prompts).Export Dataset
Export Dataset starts a background job that exports the dataset to JSONL or CSV. When the job finishes, the file appears in the Jobs tab and can be downloaded from there. The video below shows starting an export and finding the completed file in the Jobs tab.Edit
Edit lets you change the dataset name and description. If the dataset has evaluator scores, you can set Scorer Weights and Normalize to build a composite score.Outputs, tags, and scores
Each item can hold multiple outputs - essential for preference-based training:- SFT - Single output per item. No tags needed, or choose your output based on a specific tag.
- DPO - Two outputs per item tagged chosen and rejected.
- KTO - One output tag plus a score threshold. Scores come from importing logs that already have feedback (collected via the feedback API), or by running Annotate Dataset.
- GRPO - Scored outputs for group-relative optimization.
Adding data from Logs
You can also add data directly from the Logs page. Select rows and click Add to Dataset. Choose how to handle improvements: ignore them, use as the primary output, or add as a separate output (useful for building DPO pairs).Import from other tools
Langfuse
Langfuse
Export from Langfuse and use Import File - Datawizz supports the native Langfuse format. See the video walkthrough.
LangSmith
LangSmith
Use the Datawizz export script to produce a CSV in OpenAI format, then import. See the LangSmith guide.
Humanloop
Humanloop
Use this Colab notebook to export into a Datawizz-compatible CSV.
Other tools
Other tools
Convert your data to CSV or JSONL with input and output columns (plain text or chat-format JSON) and use Import File.