Dataset Formats

Supported dataset formats.

Axolotl supports a variety of dataset formats. It is recommended to use a JSONL format. The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.

Below are these various formats organized by task:

Title Description
Pre-training Data format for a pre-training completion task.
Instruction Tuning Instruction tuning formats for supervised fine-tuning.
Conversation Conversation format for supervised fine-tuning.
Template-Free Construct prompts without a template.
Custom Pre-Tokenized Dataset How to use a custom pre-tokenized dataset.
No matching items