Evaluate Run

POST /api/2/auto_eval/evaluation/evaluate_run

Similar to Evaluate, but persists results as an EvaluationRun for further capabilites.

Request

application/json

Body

metrics

object[]

required

Array [

id string

name string

description string

deploymentStatus ModelDeploymentStatus (string)

Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

]

input string[]required

output string[]required

groundTruth string[]required

projectId string

The project where the evaluation run will be persisted

experimentId string

If specified, the evaluation run will be associated with this experiment

metadata

object

Common metadata relevant to the application configuration from which all request inputs were derived. E.g. 'llm_model', 'chunk_size' E.g. 'llm_model', 'chunk_size'

fields

object

required

property name*

object

Ordered row values with length always equal to num_rows on the corresponding view.

property name* any

Ordered row values with length always equal to num_rows on the corresponding view.

Responses

Successful operation

application/json

Schema
Example (from schema)

Schema

runId stringrequired

metricScores

object[]

required

Array [

metric

object

required

id string

name string

description string

deploymentStatus ModelDeploymentStatus (string)

Possible values: [MODEL_DEPLOYMENT_STATUS_UNSPECIFIED, MODEL_DEPLOYMENT_STATUS_PENDING, MODEL_DEPLOYMENT_STATUS_ONLINE, MODEL_DEPLOYMENT_STATUS_OFFLINE, MODEL_DEPLOYMENT_STATUS_PAUSED]

scores number[]required

]

{
  "runId": "string",
  "metricScores": [
    {
      "metric": {
        "id": "string",
        "name": "string",
        "description": "string",
        "deploymentStatus": "MODEL_DEPLOYMENT_STATUS_UNSPECIFIED"
      },
      "scores": [
        0
      ]
    }
  ]
}

Evaluate Run

/api/2/auto_eval/evaluation/evaluate_run

Request​

Body

Responses​

Request

Responses