Skip to main content

Evaluate Dataset

Evaluate Dataset

POST 

/api/2/auto_eval/evaluation/evaluate_dataset

Evaluate a metric on a dataset, returning scores for each example. Specify metric.id or metric.name to identify the metric. Persists results as an EvaluationRun for further capabilities.

Request

Body

    metric

    object

    The metric to compute for the dataset. Use if only a single metric is required. For multiple metrics, use 'metrics'.

    id string
    name string
    description string
    deploymentStatus ModelDeploymentStatus (string)

    Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

    metrics

    object[]

    required

  • Array [

  • id string
    name string
    description string
    deploymentStatus ModelDeploymentStatus (string)

    Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

  • ]

  • datasetId stringrequired

    The dataset to evaluate

    projectId string

    The project where the evaluation run will be persisted

    metadata

    object

    Common metadata relevant to the application configuration from which all request inputs were derived. E.g. 'llm_model', 'chunk_size'

    fields

    object

    required

    property name*

    object

    Ordered row values with length always equal to num_rows on the corresponding view.

    property name* any

    Ordered row values with length always equal to num_rows on the corresponding view.

    experimentId string

    If specified, the evaluation run will be associated with this experiment

Responses

Successful operation

Schema

    metric

    object

    required

    id string
    name string
    description string
    deploymentStatus ModelDeploymentStatus (string)

    Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

    scores number[]required
    runId stringrequired

    metricScores

    object[]

    required

  • Array [

  • metric

    object

    required

    id string
    name string
    description string
    deploymentStatus ModelDeploymentStatus (string)

    Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

    scores number[]required
  • ]

Loading...