Datasets
Datasets let you organize application trace data in a consistent way.
They let you organize data for running evals, LLM Judge labeling and fine-tuning custom evaluators.
A Dataset should contain at least one of these columns:
input
: Input to the application (e.g. a user question for a Q&A system)output
: The response generated by the application (e.g. LLM generation)ground_truth
: Factual data, either the ideal correct response, or context used to respond (e.g. data retrieved from a vector DB)
For compound AI systems, you can use Datasets to manage data for intermediate steps as well as the end-to-end flow. For example, in a multi-agent application, a Dataset can be used to capture individual agent traces, which can be evaluated separately.
Navigate to the Dataset Library to manage or create new Datasets.
Create a new Dataset
UI
Navigate to Dataset Library and cick + New Dataset
.
API
See the API section for more info on the API, such as provisioning API keys, examples, etc.
- python
from lastmile.lib.auto_eval import AutoEval
import pandas as pd
client = AutoEval(api_token="api_token_if_LASTMILE_API_TOKEN_not_set")
dataset_csv = "path_to_dataset.csv"
dataset_id = client.upload_dataset(
file_path=dataset_csv,
name="My New Dataset",
description="This Dataset is the latest batch of application trace data"
)
print(dataset_id)
Download a Dataset
UI
Navigate to Dataset Library and open a Dataset. Click the Download Dataset button (top right):
API
- python
from lastmile.lib.auto_eval import AutoEval
import pandas as pd
client = AutoEval(api_token="api_token_if_LASTMILE_API_TOKEN_not_set")
dataset_df = client.download_dataset(
dataset_id="my_dataset_id",
output_file_path="optional_path_to_save_file"
)
print(dataset_df.head(5))
List Datasets
UI
Navigate to Dataset Library. All Datasets that you have access to will be listed here.
API
- python
from lastmile.lib.auto_eval import AutoEval
import pandas as pd
client = AutoEval(api_token="api_token_if_LASTMILE_API_TOKEN_not_set")
datasets = client.list_datasets()
for dataset in datasets:
print(f"Dataset ID: {dataset['id']}, Name: {dataset['name']}")