AutoEval developer platform

LastMile is the full-stack developer platform to debug, evaluate and improve LLM applications. We make it easy to fine-tune custom evaluators, set up guardrails & monitor app performance.

Developer quickstart

Compute your first evaluation metric within 5 minutes.

python
node.js

from lastmile import LastMile;
      LastMile.eval("Hello world")

const { LastMile } = require('lastmile');
      LastMile.eval("Hello world");

Meet alBERTa 🍁

alBERTa is a family of small language models designed for evaluation. They are optimized to be:

small -- 400M parameter entailment model
fast -- can run inference on CPU in < 300ms
customizable -- fine-tune for custom evaluation tasks

alBERTa-512 🍁>

2048 token context, specialized for evaluation tasks (like faithfulness), and gives a numeric 0->1 score.

alBERTa-LC-8k 🍁>

Long-context window variant that can scale to 128k+ tokens using a scaled dot-product attention layer

Out-of-the-box metrics

Batteries-included evaluation metrics covering common AI application types, such as RAG and multi-agent compound AI systems.

Design your own metric

Use the fine-tuning service to design your own evaluators that represent custom criteria for your app quality.

AutoEval developer platform

Developer quickstart

Meet alBERTa 🍁

alBERTa-512 🍁>

alBERTa-LC-8k 🍁>

Out-of-the-box metrics

Faithfulness>

Semantic Similarity>

Summarization Quality>

Toxicity>

More>

Design your own metric

Datasets>

LLM Judge>

Fine-tune>

Run Evals>

Explore our guides

Retrieval systems>

Multi-agent applications>

Real-time guardrails>

Developer quickstart

Meet alBERTa 🍁​

alBERTa-512 🍁>

alBERTa-LC-8k 🍁>

Out-of-the-box metrics​

Faithfulness>

Semantic Similarity>

Summarization Quality>

Toxicity>

More>

Design your own metric​

Datasets>

LLM Judge>

Fine-tune>

Run Evals>

Explore our guides​

Retrieval systems>

Multi-agent applications>

Real-time guardrails>

Meet alBERTa 🍁

Out-of-the-box metrics

Design your own metric

Explore our guides