LastMile SDK

tip

Recommendation: This page covers the LastMile Python client, which includes more ergonomic APIs for AutoEval. You can see other sections to use the primitive APIs in the LastMile API that are available via REST, Python and Node.js (TypeScript) clients.

Setup

Provision API key

To get started, provision your free LastMile API key.

Installation

pip install lastmile

npm install lastmile

Capabilities

Run Evals

The most basic function of AutoEval is to be able to run evaluations at scale. The Evaluator Models usage guide covers how to do this via API.

Learn more about evaluation metrics that are supported out of the box.

Manage Datasets

The Datasets section covers managing Datasets using the API.

Synthetic Labeling

The Synthetic Labeling usage guide covers generating labels given custom evaluation criteria using an LLM Judge.

Fine-tuning

The evaluator fine-tuning section covers examples of scheduling fine-tuning jobs via API.

Guides

These guides provide targeted tutorials for accomplishing specific real-world tasks with the LastMile API.

AutoEval Client Reference

imports
from lastmile.lib.auto_eval import AutoEval, BuiltinMetrics, Metric
client = AutoEval(api_token="api_token_if_LASTMILE_API_TOKEN_not_set")

`AutoEval` class

A high-level client wrapper for the Lastmile SDK that simplifies common workflows.

Initialization

def __init__(
    api_token: Optional[str] = None,
    base_url: str = "https://lastmileai.dev",
    logger: Optional[logging.Logger] = None,
)

Parameters:

api_token (str): API token for authentication.
base_url (str): Base URL of the Lastmile API (default: https://lastmileai.dev).
logger (logging.Logger): Optional logger object.

Evaluation Methods

`evaluate_data`

Evaluates data with specified metrics.

def evaluate_data(
    data: pd.DataFrame,
    metrics: Union[Metric, List[Metric]],
) -> pd.DataFrame

Parameters:

data (pd.DataFrame): Data to evaluate.
metrics (Union[Metric, List[Metric]]): Metrics to use.

Returns: pd.DataFrame — DataFrame with evaluation scores.

`evaluate_dataset`

Evaluates a dataset directly with specified metrics.

def evaluate_dataset(
    dataset_id: str,
    metrics: Union[Metric, List[Metric]],
) -> pd.DataFrame

Parameters:

dataset_id (str): ID of the dataset.
metrics (Union[Metric, List[Metric]]): Metrics to use.

Returns: pd.DataFrame — DataFrame with evaluation scores.

`list_metrics`

Lists all available metrics.

def list_metrics() -> List[Metric]

Returns: List[Metric] — List of metrics.

`get_metric`

Retrieves a specific metric.

def get_metric(
    metric: Metric,
) -> Metric

Parameters:

metric (Metric): Metric to retrieve (name or ID must be set).

Returns: Metric — The retrieved metric.

`wait_for_metric_online`

Waits for a metric to become available for evaluation.

def wait_for_metric_online(
    metric: Metric,
    timeout: int = 360,
    retry_interval: int = 10,
) -> Metric

Parameters:

metric (Metric): Metric to wait for.
timeout (int): Maximum wait time in seconds.
retry_interval (int): Status check interval in seconds.

Returns: Metric — Online metric.

Built-in Metrics

The BuiltinMetrics class includes predefined metrics:

FAITHFULNESS: Measures adherence to the context.
RELEVANCE: Evaluates semantic similarity.
TOXICITY: Detects potentially harmful content.
ANSWER_CORRECTNESS: Validates correctness of answers.
SUMMARIZATION: Evaluates quality of summaries.

Dataset Management Methods

`upload_dataset`

Uploads a dataset to Lastmile.

def upload_dataset(
    file_path: str,
    name: str,
    description: Optional[str] = None,
) -> str

Parameters:

file_path (str): Path to the dataset CSV file.
name (str): Name of the dataset.
description (Optional[str]): Description of the dataset.

Returns: str — ID of the created dataset.

`download_dataset`

Downloads a dataset to a file or as a pandas DataFrame.

def download_dataset(
    dataset_id: str,
    output_file_path: Optional[str] = None,
) -> pd.DataFrame

Parameters:

dataset_id (str): ID of the dataset to download.
output_file_path (Optional[str]): Path to save the dataset locally.

Returns: pd.DataFrame — The dataset.

`list_datasets`

Lists all datasets available to the user.

def list_datasets() -> List[Dataset]

Returns: List[Dataset] — List of available datasets.

Labeling Methods

`label_dataset`

Labels a dataset using a predefined prompt or custom template.

def label_dataset(
    dataset_id: str,
    prompt_template: Union[str, Metric],
    few_shot_examples: Optional[pd.DataFrame] = None,
    wait_for_completion: bool = False,
) -> str

Parameters:

dataset_id (str): ID of the dataset.
prompt_template (Union[str, Metric]): Prompt template or predefined metric.
few_shot_examples (Optional[pd.DataFrame]): Few-shot examples.
wait_for_completion (bool): Wait for the job to complete.

Returns: str — Job ID of the labeling task.

`wait_for_label_dataset_job`

Waits for a label dataset job to complete.

def wait_for_label_dataset_job(
    job_id: str,
    timeout: int = 3600,
    interval: int = 30,
) -> None

Parameters:

job_id (str): ID of the fine-tuning job.
timeout (int): Maximum wait time in seconds (default: 3600s).
interval (int): Status check interval in seconds (default: 30s).

`get_prompt_templates`

Retrieves predefined prompt templates for labeling and evaluation.

def get_prompt_templates() -> Dict[Metric, str]

Returns: Dict[Metric, str] — Mapping of metrics to prompt templates.

Fine-Tuning

`fine_tune_model`

Fine-tunes a model on training and test datasets.

def fine_tune_model(
    train_dataset_id: str,
    test_dataset_id: str,
    model_name: str,
    baseline_model_id: Optional[str] = None,
    selected_columns: Optional[List[str]] = None,
    wait_for_completion: bool = False,
) -> str

Parameters:

train_dataset_id (str): Training dataset ID.
test_dataset_id (str): Test dataset ID.
model_name (str): Name for the fine-tuned model.
baseline_model_id (Optional[str]): Baseline model ID.
selected_columns (Optional[List[str]]): Columns to use for training.
wait_for_completion (bool): Wait for the fine-tuning process to complete.

Returns: str — Job ID of the fine-tuning job.

`wait_for_fine_tune_job`

Waits for a fine-tune job to complete.

def wait_for_fine_tune_job(
    job_id: str,
    timeout: int = 3600,
    interval: int = 30,
) -> None

Parameters:

job_id (str): ID of the fine-tuning job.
timeout (int): Maximum wait time in seconds.
interval (int): Status check interval in seconds.

Examples

Upload a Dataset

dataset_id = client.upload_dataset(
    file_path="data.csv",
    name="Example Dataset",
    description="Sample data for evaluation."
)

Fine-Tune a Model

fine_tune_job_id = client.fine_tune_model(
    train_dataset_id="train_id",
    test_dataset_id="test_id",
    model_name="FineTunedModel"
)

Evaluate Data with Metrics

results = client.evaluate_data(
    data=dataframe,
    metrics=[BuiltinMetrics.FAITHFULNESS, BuiltinMetrics.RELEVANCE]
)

LastMile SDK

Setup

Provision API key

Installation

Capabilities

Run Evals

Manage Datasets

Synthetic Labeling

Fine-tuning

Guides

Getting Started Guide>

RAG Evaluation>

Real-time guardrails>

AutoEval Client Reference

`AutoEval` class

Initialization

Evaluation Methods

`evaluate_data`

`evaluate_dataset`

`list_metrics`

`get_metric`

`wait_for_metric_online`

Built-in Metrics

Dataset Management Methods

`upload_dataset`

`download_dataset`

`list_datasets`

Labeling Methods

`label_dataset`

`wait_for_label_dataset_job`

`get_prompt_templates`

Fine-Tuning

`fine_tune_model`

`wait_for_fine_tune_job`

Examples

Upload a Dataset

Fine-Tune a Model

Evaluate Data with Metrics

Setup​

Provision API key​

Installation​

Capabilities​

Run Evals​

Manage Datasets​

Synthetic Labeling​

Fine-tuning​

Guides​

Getting Started Guide>

RAG Evaluation>

Real-time guardrails>

AutoEval Client Reference​

AutoEval class​

Initialization​

Evaluation Methods​

evaluate_data​

evaluate_dataset​

list_metrics​

get_metric​

wait_for_metric_online​

Built-in Metrics​

Dataset Management Methods​

upload_dataset​

download_dataset​

list_datasets​

Labeling Methods​

label_dataset​

wait_for_label_dataset_job​

get_prompt_templates​

Fine-Tuning​

fine_tune_model​

wait_for_fine_tune_job​

Examples​

Upload a Dataset​

Fine-Tune a Model​

Evaluate Data with Metrics​

Setup

Provision API key

Installation

Capabilities

Run Evals

Manage Datasets

Synthetic Labeling

Fine-tuning

Guides

AutoEval Client Reference

`AutoEval` class

Initialization

Evaluation Methods

`evaluate_data`

`evaluate_dataset`

`list_metrics`

`get_metric`

`wait_for_metric_online`

Built-in Metrics

Dataset Management Methods

`upload_dataset`

`download_dataset`

`list_datasets`

Labeling Methods

`label_dataset`

`wait_for_label_dataset_job`

`get_prompt_templates`

Fine-Tuning

`fine_tune_model`

`wait_for_fine_tune_job`

Examples

Upload a Dataset

Fine-Tune a Model

Evaluate Data with Metrics