Mixpeek CLI - Mixpeek

Installation

pip install mixpeek
mixpeek --version

Quick Start

# 1. Create plugin
mixpeek plugin init my_extractor --category text

# 2. Edit processors/core.py with your logic

# 3. Test locally
cd my_extractor && mixpeek plugin test

# 4. Publish
mixpeek plugin publish --namespace ns_xxx

See Custom Extractors for full extractor development guide.

Configuration

export MIXPEEK_API_KEY="sk_your_api_key"
export MIXPEEK_NAMESPACE="ns_your_namespace"

Option	Environment Variable	Description
`--api-key`	`MIXPEEK_API_KEY`	Your Mixpeek API key
`--base-url`	`MIXPEEK_BASE_URL`	API base URL (default: https://api.mixpeek.com)

Commands

`mixpeek plugin init`

Create a new plugin from template.

mixpeek plugin init <name> [options]

Option	Description
`--category`	`text`, `image`, `video`, `audio`, `document`, `multimodal`
`--description`	Plugin description
`--author`	Author name
`--output`	Output directory

# Examples
mixpeek plugin init sentiment_analyzer --category text
mixpeek plugin init face_detector --category image --description "Detect faces"

`mixpeek plugin test`

Validate and test plugin locally.

mixpeek plugin test [options]

Option	Description
`--path`	Plugin directory (default: `.`)
`--sample-data`	JSON/CSV file with test data
`--verbose`	Detailed output

Validates:

Structure (manifest.py, pipeline.py exist)
Schemas (valid Pydantic models)
Pipeline (build_steps() callable)
Tests (runs pytest if tests/ exists)

# Examples
mixpeek plugin test
mixpeek plugin test --path ./my_extractor --verbose
mixpeek plugin test --sample-data samples.json

`mixpeek plugin publish`

Upload and deploy plugin to Mixpeek.

mixpeek plugin publish [options]

Option	Description
`--path`	Plugin directory
`--namespace`	Target namespace ID
`--dry-run`	Validate without uploading

What happens:

Validates structure and schemas
Runs security scan
Creates .tar.gz archive
Uploads to S3 via presigned URL
Confirms and triggers deployment

# Examples
mixpeek plugin publish
mixpeek plugin publish --namespace ns_abc123 --dry-run

`mixpeek plugin list`

List plugins in namespace.

mixpeek plugin list [options]

Option	Description
`--namespace`	Namespace ID
`--source`	`all`, `builtin`, `custom`, `community`

mixpeek plugin list --source custom

Plugin Structure

my_extractor/
├── manifest.py      # Metadata + schemas
├── pipeline.py      # Batch processing
├── realtime.py      # HTTP endpoint (optional, Enterprise)
└── processors/
    └── core.py      # Your logic

manifest.py

from pydantic import BaseModel, Field
from typing import List

class MyInput(BaseModel):
    text: str

class MyOutput(BaseModel):
    embedding: List[float]

class MyParams(BaseModel):
    threshold: float = Field(default=0.5)

metadata = {
    "feature_extractor_name": "my_extractor",
    "version": "1.0.0",
    "description": "My extractor",
    "category": "text",
}

input_schema = MyInput
output_schema = MyOutput
parameter_schema = MyParams
supported_input_types = ["text"]

features = [
    {
        "feature_name": "my_embedding",
        "feature_type": "embedding",
        "embedding_dim": 384,
        "distance_metric": "cosine",
    },
]

processors/core.py

from dataclasses import dataclass
import pandas as pd

@dataclass
class MyConfig:
    threshold: float = 0.5

class MyProcessor:
    def __init__(self, config: MyConfig, progress_actor=None):
        self.config = config
        self._model = None

    def _load_model(self):
        if self._model is None:
            from sentence_transformers import SentenceTransformer
            self._model = SentenceTransformer("all-MiniLM-L6-v2")

    def __call__(self, batch: pd.DataFrame) -> pd.DataFrame:
        self._load_model()
        texts = batch["text"].fillna("").tolist()
        batch["my_embedding"] = self._model.encode(texts).tolist()
        return batch

pipeline.py

from typing import Any, Dict, Optional
from engine.plugins.extractors.pipeline import (
    PipelineDefinition, ResourceType, RowCondition, StepDefinition, build_pipeline_steps
)
from .manifest import MyParams, metadata
from .processors.core import MyConfig, MyProcessor

def build_steps(
    extractor_request: Any,
    container: Optional[Any] = None,
    base_steps: Optional[list] = None,
    **kwargs
) -> Dict[str, Any]:
    params = MyParams(**(extractor_request.extractor_config.parameters or {}))

    steps = [
        StepDefinition(
            service_class=MyProcessor,
            resource_type=ResourceType.CPU,
            config=MyConfig(threshold=params.threshold),
            condition=RowCondition.IS_TEXT,
        ),
    ]

    pipeline = PipelineDefinition(name=metadata["feature_extractor_name"], version=metadata["version"], steps=steps)
    return {"steps": (base_steps or []) + build_pipeline_steps(pipeline), "prepare": lambda ds: ds}

realtime.py (Enterprise)

from typing import Any, Dict

class RealtimeHandler:
    def __init__(self):
        self._model = None

    def predict(self, request: Dict[str, Any]) -> Dict[str, Any]:
        if self._model is None:
            from sentence_transformers import SentenceTransformer
            self._model = SentenceTransformer("all-MiniLM-L6-v2")

        text = request.get("text", "")
        embedding = self._model.encode([text])[0].tolist()
        return {"embedding": embedding}

Resource Types

Type	Use For
`ResourceType.CPU`	Text embeddings, classification
`ResourceType.GPU`	Local models (Whisper, CLIP)
`ResourceType.API`	External APIs (OpenAI, Vertex)

Row Conditions

RowCondition.IS_TEXT       # text/* MIME types
RowCondition.IS_IMAGE      # image/* MIME types
RowCondition.IS_VIDEO      # video/* MIME types
RowCondition.IS_AUDIO      # audio/* MIME types
RowCondition.IS_PDF        # application/pdf
RowCondition.ALWAYS        # All rows (default)

Security Constraints

Plugins are scanned before deployment. Forbidden:

Pattern	Reason
`subprocess`, `os.system`	Shell execution
`eval`, `exec`	Dynamic code
`socket`	Direct network
`ctypes`	Memory access
`__import__`	Dynamic imports

Using Your Plugin

After publishing:

client.collections.create(
    collection_name="my_collection",
    source={"type": "bucket", "bucket_ids": ["bkt_..."]},
    feature_extractor={
        "feature_extractor_name": "my_extractor",
        "version": "1.0.0",
        "parameters": {"threshold": 0.7}
    }
)

API Reference

Endpoint	Method	Description
`/v1/namespaces/{id}/plugins/uploads`	POST	Get presigned upload URL
`/v1/namespaces/{id}/plugins/uploads/{id}/confirm`	POST	Confirm upload
`/v1/namespaces/{id}/plugins`	GET	List plugins
`/v1/namespaces/{id}/plugins/{id}`	GET	Get plugin details
`/v1/namespaces/{id}/plugins/{id}`	DELETE	Delete plugin
`/v1/namespaces/{id}/plugins/{id}/deploy`	POST	Deploy for realtime (Enterprise)
`/v1/namespaces/{id}/plugins/{id}/status`	GET	Check deployment status

Troubleshooting

Issue	Solution
Plugin not found	Check namespace, wait for deployment
Import errors	Ensure `__init__.py` files exist
Security scan fails	Remove forbidden patterns
Validation errors	Check manifest.py exports metadata/schemas

Debug mode:

mixpeek plugin test --verbose
mixpeek plugin publish --dry-run

Integrations

Documentation Index

​Installation

​Quick Start

​Configuration

​Commands

​mixpeek plugin init

​mixpeek plugin test

​mixpeek plugin publish

​mixpeek plugin list

​Plugin Structure

​manifest.py

​processors/core.py

​pipeline.py

​realtime.py (Enterprise)

​Resource Types

​Row Conditions

​Security Constraints

​Using Your Plugin

​API Reference

​Troubleshooting