Evaluate Dataset

POST /api/2/auto_eval/evaluation/evaluate_dataset

Evaluate a metric on a dataset, returning scores for each example. Specify metric.id or metric.name to identify the metric. Persists results as an EvaluationRun for further capabilities.

Request

application/json

Body

metric

object

The metric to compute for the dataset. Use if only a single metric is required. For multiple metrics, use 'metrics'.

id string

name string

description string

deploymentStatus ModelDeploymentStatus (string)

Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

metrics

object[]

required

Array [

id string

name string

description string

deploymentStatus ModelDeploymentStatus (string)

Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

]

datasetId stringrequired

The dataset to evaluate

projectId string

The project where the evaluation run will be persisted

metadata

object

Common metadata relevant to the application configuration from which all request inputs were derived. E.g. 'llm_model', 'chunk_size'

fields

object

required

property name*

object

Ordered row values with length always equal to num_rows on the corresponding view.

property name* any

Ordered row values with length always equal to num_rows on the corresponding view.

experimentId string

If specified, the evaluation run will be associated with this experiment

Responses

Successful operation

application/json

Schema
Example (from schema)

Schema

metric

object

required

id string

name string

description string

deploymentStatus ModelDeploymentStatus (string)

Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

scores number[]required

runId stringrequired

metricScores

object[]

required

Array [

metric

object

required

id string

name string

description string

deploymentStatus ModelDeploymentStatus (string)

Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

scores number[]required

]

{
  "metric": {
    "id": "string",
    "name": "string",
    "description": "string",
    "deploymentStatus": "MODEL_DEPLOYMENT_STATUS_UNSPECIFIED"
  },
  "scores": [
    0
  ],
  "runId": "string",
  "metricScores": [
    {
      "metric": {
        "id": "string",
        "name": "string",
        "description": "string",
        "deploymentStatus": "MODEL_DEPLOYMENT_STATUS_UNSPECIFIED"
      },
      "scores": [
        0
      ]
    }
  ]
}

Evaluate Dataset

/api/2/auto_eval/evaluation/evaluate_dataset

Request​

Body

Responses​

Request

Responses