Skip to main content

Introduction

LastMile is the full-stack developer platform to debug, evaluate and improve LLM applications. We make it easy to fine-tune custom evaluators, set up guardrails & monitor app performance.

from lastmile.lib.auto_eval import AutoEval, Metric
import pandas as pd

result = AutoEval().evaluate_data(
data=pd.DataFrame({
"input": ["Where did the author grow up?"],
"output": ["France"],
"ground_truth": ["England"]
}),
metrics=[Metric(name="Faithfulness")]
)

print(f'Evlauation result:', result)

Design your own metric

Use the fine-tuning service to design your own evaluators that represent custom criteria for your app quality.

Out-of-the-box metrics

Batteries-included evaluation metrics covering common AI application types, such as RAG and multi-agent compound AI systems.

Meet alBERTa 🍁

alBERTa is a family of small language models (SLMs) designed for evaluation. They are optimized to be:

  • small -- 400M parameter entailment model
  • fast -- can run inference on CPU in < 300ms
  • customizable -- fine-tune for custom evaluation tasks

Explore our guides