Skip to main content
POST
/
v1
/
retrievers
/
benchmarks
Create benchmark
curl --request POST \
  --url https://api.mixpeek.com/v1/retrievers/benchmarks \
  --header 'Content-Type: application/json' \
  --data '
{
  "benchmark_name": "<string>",
  "baseline_retriever_id": "<string>",
  "candidate_retriever_ids": [
    "<string>"
  ],
  "session_filter": {
    "retriever_ids": [
      "<string>"
    ],
    "taxonomy_node_ids": [
      "<string>"
    ],
    "time_range": {
      "start": "2023-11-07T05:31:56Z",
      "end": "2023-11-07T05:31:56Z"
    },
    "min_interactions": 1,
    "interaction_types": [
      "<string>"
    ],
    "sample_strategy": "random",
    "interaction_weights": {
      "weights": {}
    }
  },
  "session_count": 1000
}
'
{
  "benchmark_id": "<string>",
  "benchmark_name": "<string>",
  "baseline_retriever_id": "<string>",
  "candidate_retriever_ids": [
    "<string>"
  ],
  "session_count": 123,
  "status": "pending",
  "created_at": "2023-11-07T05:31:56Z",
  "session_filter": {
    "retriever_ids": [
      "<string>"
    ],
    "taxonomy_node_ids": [
      "<string>"
    ],
    "time_range": {
      "start": "2023-11-07T05:31:56Z",
      "end": "2023-11-07T05:31:56Z"
    },
    "min_interactions": 1,
    "interaction_types": [
      "<string>"
    ],
    "sample_strategy": "random",
    "interaction_weights": {
      "weights": {}
    }
  },
  "results": [
    {
      "retriever_id": "<string>",
      "retriever_name": "<string>",
      "pipeline_hash": "<string>",
      "metrics": {
        "ndcg_at_k": {},
        "mean_rank_clicked": 123,
        "recall_at_k": {},
        "avg_position_delta": 123,
        "items_promoted": 1,
        "items_demoted": 1,
        "sessions_improved": 1,
        "sessions_degraded": 1,
        "sessions_neutral": 1,
        "mean_rank_purchased": 123
      },
      "latency": {
        "p50_ms": 1,
        "p90_ms": 1,
        "p99_ms": 1,
        "mean_ms": 1,
        "stage_latencies": {}
      },
      "failed_sessions": 1,
      "taxonomy_deltas": {},
      "error_summary": {}
    }
  ],
  "comparison": {
    "baseline_retriever_id": "<string>",
    "comparisons": [
      {
        "candidate_retriever_id": "<string>",
        "ndcg_delta": {},
        "recall_delta": {},
        "latency_delta_ms": 123,
        "p_value": 123,
        "confidence_interval": {
          "[0]": 123,
          "[1]": 123
        },
        "taxonomy_wins": [
          "<string>"
        ],
        "taxonomy_losses": [
          "<string>"
        ]
      }
    ],
    "recommendation": "<string>"
  },
  "started_at": "2023-11-07T05:31:56Z",
  "completed_at": "2023-11-07T05:31:56Z",
  "error_message": "<string>"
}

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Headers

Authorization
string

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer YOUR_API_KEY"

"Bearer YOUR_STRIPE_API_KEY"

authorization
string
X-Namespace
string

Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'. Falls back to ?namespace= query parameter if the header is omitted.

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Body

application/json

Request to create a new benchmark run.

benchmark_name
string
required

Human-readable name for this benchmark.

Required string length: 1 - 255
baseline_retriever_id
string
required

ID of the baseline retriever pipeline to compare against.

candidate_retriever_ids
string[]
required

IDs of candidate retriever pipelines to evaluate.

Minimum array length: 1
session_filter
SessionFilter · object

Optional filter criteria for selecting sessions to replay.

session_count
integer
default:1000

Number of sessions to include in the benchmark.

Required range: 10 <= x <= 10000

Response

Successful Response

Response containing benchmark details and results.

benchmark_id
string
required

Unique benchmark identifier.

benchmark_name
string
required

Human-readable name.

baseline_retriever_id
string
required

Baseline retriever ID.

candidate_retriever_ids
string[]
required

Candidate retriever IDs.

session_count
integer
required

Number of sessions in benchmark.

status
enum<string>
required

Current benchmark status.

Available options:
pending,
building_sessions,
replaying,
computing_metrics,
completed,
failed
created_at
string<date-time>
required

Creation timestamp.

session_filter
SessionFilter · object

Filter criteria used.

results
BenchmarkResult · object[] | null

Results per pipeline (available when completed).

comparison
BenchmarkComparison · object

Statistical comparison (available when completed).

started_at
string<date-time> | null

Execution start time.

completed_at
string<date-time> | null

Completion time.

error_message
string | null

Error message if failed.