Create benchmark

curl --request POST \ --url https://api.mixpeek.com/v1/retrievers/benchmarks \ --header 'Content-Type: application/json' \ --data ' { "benchmark_name": "<string>", "baseline_retriever_id": "<string>", "candidate_retriever_ids": [ "<string>" ], "session_filter": { "retriever_ids": [ "<string>" ], "taxonomy_node_ids": [ "<string>" ], "time_range": { "start": "2023-11-07T05:31:56Z", "end": "2023-11-07T05:31:56Z" }, "min_interactions": 1, "interaction_types": [ "<string>" ], "sample_strategy": "random", "interaction_weights": { "weights": {} } }, "session_count": 1000 } '

{ "benchmark_id": "<string>", "benchmark_name": "<string>", "baseline_retriever_id": "<string>", "candidate_retriever_ids": [ "<string>" ], "session_count": 123, "status": "pending", "created_at": "2023-11-07T05:31:56Z", "session_filter": { "retriever_ids": [ "<string>" ], "taxonomy_node_ids": [ "<string>" ], "time_range": { "start": "2023-11-07T05:31:56Z", "end": "2023-11-07T05:31:56Z" }, "min_interactions": 1, "interaction_types": [ "<string>" ], "sample_strategy": "random", "interaction_weights": { "weights": {} } }, "results": [ { "retriever_id": "<string>", "retriever_name": "<string>", "pipeline_hash": "<string>", "metrics": { "ndcg_at_k": {}, "mean_rank_clicked": 123, "recall_at_k": {}, "avg_position_delta": 123, "items_promoted": 1, "items_demoted": 1, "sessions_improved": 1, "sessions_degraded": 1, "sessions_neutral": 1, "mean_rank_purchased": 123 }, "latency": { "p50_ms": 1, "p90_ms": 1, "p99_ms": 1, "mean_ms": 1, "stage_latencies": {} }, "failed_sessions": 1, "taxonomy_deltas": {}, "error_summary": {} } ], "comparison": { "baseline_retriever_id": "<string>", "comparisons": [ { "candidate_retriever_id": "<string>", "ndcg_delta": {}, "recall_delta": {}, "latency_delta_ms": 123, "p_value": 123, "confidence_interval": { "[0]": 123, "[1]": 123 }, "taxonomy_wins": [ "<string>" ], "taxonomy_losses": [ "<string>" ] } ], "recommendation": "<string>" }, "started_at": "2023-11-07T05:31:56Z", "completed_at": "2023-11-07T05:31:56Z", "error_message": "<string>" }

Headers

Authorization

string

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer YOUR_API_KEY"

"Bearer YOUR_STRIPE_API_KEY"

authorization

string

X-Namespace

string

Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'. Falls back to ?namespace= query parameter if the header is omitted.

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Body

application/json

Request to create a new benchmark run.

benchmark_name

string

required

Human-readable name for this benchmark.

Required string length: 1 - 255

baseline_retriever_id

string

required

ID of the baseline retriever pipeline to compare against.

candidate_retriever_ids

string[]

required

IDs of candidate retriever pipelines to evaluate.

Minimum array length: 1

session_filter

SessionFilter · object

Optional filter criteria for selecting sessions to replay.

Show child attributes

session_count

integer

default:1000

Number of sessions to include in the benchmark.

Required range: 10 <= x <= 10000

Response

Successful Response

Response containing benchmark details and results.

benchmark_id

string

required

Unique benchmark identifier.

benchmark_name

string

required

Human-readable name.

baseline_retriever_id

string

required

Baseline retriever ID.

candidate_retriever_ids

string[]

required

Candidate retriever IDs.

session_count

integer

required

Number of sessions in benchmark.

status

enum<string>

required

Current benchmark status.

Available options:

pending,

building_sessions,

replaying,

computing_metrics,

completed,

failed

created_at

string<date-time>

required

Creation timestamp.

session_filter

SessionFilter · object

Filter criteria used.

Show child attributes

results

BenchmarkResult · object[] | null

Results per pipeline (available when completed).

Show child attributes

comparison

BenchmarkComparison · object

Statistical comparison (available when completed).

Show child attributes

started_at

string<date-time> | null

Execution start time.

completed_at

string<date-time> | null

Completion time.

error_message

string | null

Error message if failed.

Organization

Namespaces

Buckets

Feature Extractors

Batch Queue

Collections

Documents

Retrievers

Taxonomies

Clusters

Triggers

Alerts

Webhooks

Apps

Agent Sessions

Annotations

Templates

Manifest

Discovery

Analytics

Notifications

Tasks

Inference

Resource Search

Pricing

Create benchmark

Headers

Body

Response

Organization

Namespaces

Buckets

Feature Extractors

Batch Queue

Collections

Documents

Retrievers

Taxonomies

Clusters

Triggers

Alerts

Webhooks

Apps

Agent Sessions

Annotations

Templates

Manifest

Discovery

Analytics

Notifications

Tasks

Inference

Resource Search

Pricing

Documentation Index

Headers

Body

Response