Skip to main content

AutoEval developer platform

LastMile is the full-stack developer platform to debug, evaluate and improve LLM applications. We make it easy to fine-tune custom evaluators, set up guardrails & monitor app performance.

from lastmile import LastMile;
LastMile.eval("Hello world")

Meet alBERTa 🍁

alBERTa is a family of small language models designed for evaluation. They are optimized to be:

  • small -- 400M parameter entailment model
  • fast -- can run inference on CPU in < 300ms
  • customizable -- fine-tune for custom evaluation tasks

Out-of-the-box metrics

Batteries-included evaluation metrics covering common AI application types, such as RAG and multi-agent compound AI systems.

Design your own metric

Use the fine-tuning service to design your own evaluators that represent custom criteria for your app quality.

Explore our guides