Skip to main content

LastMile SDK

PyPI version npm

tip

Recommendation: This page covers the LastMile Python client, which includes more ergonomic APIs for AutoEval. You can see other sections to use the primitive APIs in the LastMile API that are available via REST, Python and Node.js (TypeScript) clients.

Setup

Provision API key

To get started, provision your free LastMile API key.

Installation

pip install lastmile

Capabilities

Run Evals

The most basic function of AutoEval is to be able to run evaluations at scale. The Evaluator Models usage guide covers how to do this via API.

Learn more about evaluation metrics that are supported out of the box.

Manage Datasets

The Datasets section covers managing Datasets using the API.

Synthetic Labeling

The Synthetic Labeling usage guide covers generating labels given custom evaluation criteria using an LLM Judge.

Fine-tuning

The evaluator fine-tuning section covers examples of scheduling fine-tuning jobs via API.

Guides

These guides provide targeted tutorials for accomplishing specific real-world tasks with the LastMile API.

AutoEval Client Reference

imports
from lastmile.lib.auto_eval import AutoEval, BuiltinMetrics, Metric
client = AutoEval(api_token="api_token_if_LASTMILE_API_TOKEN_not_set")

AutoEval class

A high-level client wrapper for the Lastmile SDK that simplifies common workflows.

Initialization

def __init__(
api_token: Optional[str] = None,
base_url: str = "https://lastmileai.dev",
logger: Optional[logging.Logger] = None,
)

Parameters:

  • api_token (str): API token for authentication.
  • base_url (str): Base URL of the Lastmile API (default: https://lastmileai.dev).
  • logger (logging.Logger): Optional logger object.

Evaluation Methods

evaluate_data

Evaluates data with specified metrics.

def evaluate_data(
data: pd.DataFrame,
metrics: Union[Metric, List[Metric]],
) -> pd.DataFrame

Parameters:

  • data (pd.DataFrame): Data to evaluate.
  • metrics (Union[Metric, List[Metric]]): Metrics to use.

Returns: pd.DataFrame — DataFrame with evaluation scores.


evaluate_dataset

Evaluates a dataset directly with specified metrics.

def evaluate_dataset(
dataset_id: str,
metrics: Union[Metric, List[Metric]],
) -> pd.DataFrame

Parameters:

  • dataset_id (str): ID of the dataset.
  • metrics (Union[Metric, List[Metric]]): Metrics to use.

Returns: pd.DataFrame — DataFrame with evaluation scores.


list_metrics

Lists all available metrics.

def list_metrics() -> List[Metric]

Returns: List[Metric] — List of metrics.


get_metric

Retrieves a specific metric.

def get_metric(
metric: Metric,
) -> Metric

Parameters:

  • metric (Metric): Metric to retrieve (name or ID must be set).

Returns: Metric — The retrieved metric.


wait_for_metric_online

Waits for a metric to become available for evaluation.

def wait_for_metric_online(
metric: Metric,
timeout: int = 360,
retry_interval: int = 10,
) -> Metric

Parameters:

  • metric (Metric): Metric to wait for.
  • timeout (int): Maximum wait time in seconds.
  • retry_interval (int): Status check interval in seconds.

Returns: Metric — Online metric.


Built-in Metrics

The BuiltinMetrics class includes predefined metrics:

  • FAITHFULNESS: Measures adherence to the context.
  • RELEVANCE: Evaluates semantic similarity.
  • TOXICITY: Detects potentially harmful content.
  • ANSWER_CORRECTNESS: Validates correctness of answers.
  • SUMMARIZATION: Evaluates quality of summaries.

Dataset Management Methods

upload_dataset

Uploads a dataset to Lastmile.

def upload_dataset(
file_path: str,
name: str,
description: Optional[str] = None,
) -> str

Parameters:

  • file_path (str): Path to the dataset CSV file.
  • name (str): Name of the dataset.
  • description (Optional[str]): Description of the dataset.

Returns: str — ID of the created dataset.


download_dataset

Downloads a dataset to a file or as a pandas DataFrame.

def download_dataset(
dataset_id: str,
output_file_path: Optional[str] = None,
) -> pd.DataFrame

Parameters:

  • dataset_id (str): ID of the dataset to download.
  • output_file_path (Optional[str]): Path to save the dataset locally.

Returns: pd.DataFrame — The dataset.


list_datasets

Lists all datasets available to the user.

def list_datasets() -> List[Dataset]

Returns: List[Dataset] — List of available datasets.


Labeling Methods

label_dataset

Labels a dataset using a predefined prompt or custom template.

def label_dataset(
dataset_id: str,
prompt_template: Union[str, Metric],
few_shot_examples: Optional[pd.DataFrame] = None,
wait_for_completion: bool = False,
) -> str

Parameters:

  • dataset_id (str): ID of the dataset.
  • prompt_template (Union[str, Metric]): Prompt template or predefined metric.
  • few_shot_examples (Optional[pd.DataFrame]): Few-shot examples.
  • wait_for_completion (bool): Wait for the job to complete.

Returns: str — Job ID of the labeling task.


wait_for_label_dataset_job

Waits for a label dataset job to complete.

def wait_for_label_dataset_job(
job_id: str,
timeout: int = 3600,
interval: int = 30,
) -> None

Parameters:

  • job_id (str): ID of the fine-tuning job.
  • timeout (int): Maximum wait time in seconds (default: 3600s).
  • interval (int): Status check interval in seconds (default: 30s).

get_prompt_templates

Retrieves predefined prompt templates for labeling and evaluation.

def get_prompt_templates() -> Dict[Metric, str]

Returns: Dict[Metric, str] — Mapping of metrics to prompt templates.


Fine-Tuning

fine_tune_model

Fine-tunes a model on training and test datasets.

def fine_tune_model(
train_dataset_id: str,
test_dataset_id: str,
model_name: str,
baseline_model_id: Optional[str] = None,
selected_columns: Optional[List[str]] = None,
wait_for_completion: bool = False,
) -> str

Parameters:

  • train_dataset_id (str): Training dataset ID.
  • test_dataset_id (str): Test dataset ID.
  • model_name (str): Name for the fine-tuned model.
  • baseline_model_id (Optional[str]): Baseline model ID.
  • selected_columns (Optional[List[str]]): Columns to use for training.
  • wait_for_completion (bool): Wait for the fine-tuning process to complete.

Returns: str — Job ID of the fine-tuning job.


wait_for_fine_tune_job

Waits for a fine-tune job to complete.

def wait_for_fine_tune_job(
job_id: str,
timeout: int = 3600,
interval: int = 30,
) -> None

Parameters:

  • job_id (str): ID of the fine-tuning job.
  • timeout (int): Maximum wait time in seconds.
  • interval (int): Status check interval in seconds.

Examples

Upload a Dataset

dataset_id = client.upload_dataset(
file_path="data.csv",
name="Example Dataset",
description="Sample data for evaluation."
)

Fine-Tune a Model

fine_tune_job_id = client.fine_tune_model(
train_dataset_id="train_id",
test_dataset_id="test_id",
model_name="FineTunedModel"
)

Evaluate Data with Metrics

results = client.evaluate_data(
data=dataframe,
metrics=[BuiltinMetrics.FAITHFULNESS, BuiltinMetrics.RELEVANCE]
)