Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Models run inside custom extractors. The model registry handles downloading, caching, and serving — you just declare which model to use and the infrastructure shares it across all workers via Ray’s object store.

Three Ways to Load Models

ApproachWhen to Use
Built-in ModelsCommon tasks — embeddings, transcription, reranking. No code needed, just reference the feature URI.
HuggingFace ModelsAny public HF model. Cached cluster-wide on first download.
Custom Models (Enterprise)Your own fine-tuned weights uploaded as .tar.gz. Stored in S3, deployed to Ray.
Use LazyModelMixin in your extractor’s pipeline. Models load on first batch, not at actor creation, and are shared zero-copy across all workers.
from engine.models.lazy import LazyModelMixin
from engine.inference.services import BaseBatchInferenceService

class MyEmbeddingProcessor(LazyModelMixin, BaseBatchInferenceService):
    model_id = "intfloat/multilingual-e5-large-instruct"
    model_class = "AutoModel"
    tokenizer_class = "AutoTokenizer"
    torch_dtype = "float16"
    model_source = "huggingface"

    def _process_batch(self, batch):
        model, tokenizer = self.get_model()

        inputs = tokenizer(
            batch["text"].tolist(),
            padding=True,
            truncation=True,
            return_tensors="pt",
        )

        with torch.no_grad():
            outputs = model(**inputs)

        batch["embedding"] = outputs.last_hidden_state.mean(dim=1).tolist()
        return batch

LazyModelMixin Attributes

AttributeTypeDefaultDescription
model_idstr""HuggingFace model ID or namespace model ID
model_classstr"AutoModel"Transformers model class name
tokenizer_classstr | None"AutoTokenizer"Tokenizer class, or None to skip
torch_dtypestr"float32""float16", "float32", or "bfloat16"
model_sourcestr"huggingface""huggingface" or "namespace"
Call self.get_model() to get a (model, tokenizer) tuple. Override _instantiate_model(cached_data) for non-standard architectures.

Custom Models (Enterprise)

Custom models require an Enterprise subscription. Contact sales to enable.
Upload fine-tuned weights and use them in extractors. Three steps:
1

Upload

tar -czvf my_model.tar.gz ./model_weights/

curl -X POST "$MP_API_URL/v1/namespaces/$NS_ID/models" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -F "file=@my_model.tar.gz" \
  -F "name=my-embedding-model" \
  -F "version=1.0.0" \
  -F "model_format=pytorch" \
  -F "task_type=embedding" \
  -F "num_gpus=0" \
  -F "memory_gb=4.0"
Supported formats: pytorch (.pt, .pth), safetensors, onnx, huggingface (directory).
2

Deploy to Ray

curl -X POST "$MP_API_URL/v1/namespaces/$NS_ID/models/my-embedding-model_1_0_0/deploy" \
  -H "Authorization: Bearer $MP_API_KEY"
3

Use in your extractor

Set model_source = "namespace" and override _instantiate_model():
class MyCustomProcessor(LazyModelMixin, BaseBatchInferenceService):
    model_id = "my-embedding-model_1_0_0"
    model_source = "namespace"

    def _instantiate_model(self, weights):
        import torch
        model = torch.nn.Linear(768, 256)
        model.load_state_dict(weights)
        model.to(self._detect_device())
        model.eval()
        return model, None

    def _process_batch(self, batch):
        model, _ = self.get_model()
        # Use model...

Model Versioning

Models are versioned independently. Deploy a new version alongside the existing one, test in staging, then shift traffic:
my-embedding-model_1_0_0  (production)
my-embedding-model_2_0_0  (staging — validate before promoting)
The feature URI updates with the extractor version (mixpeek://my_extractor@2.0.0/my_embedding), so both versions can coexist.

Python SDK

from mixpeek import Mixpeek

client = Mixpeek(api_key="sk_...")

with open("my_model.tar.gz", "rb") as f:
    result = client.models.upload(
        namespace_id="ns_abc123",
        file=f,
        name="my-reranker",
        version="1.0.0",
        model_format="pytorch",
        task_type="reranking",
    )

client.models.deploy(
    namespace_id="ns_abc123",
    model_id=result.model_id,
)

Limits

LimitValue
Max models per namespace50
Max archive size10 GB
Supported formatspytorch, safetensors, onnx, huggingface

Custom Extractors

Package and deploy extractors that use these models.

Extractor Quickstart

Build a working extractor with model loading end-to-end.

Model API Reference

Upload, deploy, list, and delete model archives.

Self-Improving CV Pipeline

Full tutorial: deploy YOLO, annotate, fine-tune, redeploy.