Model Registry

Models run inside custom extractors. The model registry handles downloading, caching, and serving — you just declare which model to use and the infrastructure shares it across all workers via Ray’s object store.

Three Ways to Load Models

Approach	When to Use
Built-in Models	Common tasks — embeddings, transcription, reranking. No code needed, just reference the feature URI.
HuggingFace Models	Any public HF model. Cached cluster-wide on first download.
Custom Models (Enterprise)	Your own fine-tuned weights uploaded as `.tar.gz`. Stored in S3, deployed to Ray.

HuggingFace Models (Recommended)

Use LazyModelMixin in your extractor’s pipeline. Models load on first batch, not at actor creation, and are shared zero-copy across all workers.

from engine.models.lazy import LazyModelMixin
from engine.inference.services import BaseBatchInferenceService

class MyEmbeddingProcessor(LazyModelMixin, BaseBatchInferenceService):
    model_id = "intfloat/multilingual-e5-large-instruct"
    model_class = "AutoModel"
    tokenizer_class = "AutoTokenizer"
    torch_dtype = "float16"
    model_source = "huggingface"

    def _process_batch(self, batch):
        model, tokenizer = self.get_model()

        inputs = tokenizer(
            batch["text"].tolist(),
            padding=True,
            truncation=True,
            return_tensors="pt",
        )

        with torch.no_grad():
            outputs = model(**inputs)

        batch["embedding"] = outputs.last_hidden_state.mean(dim=1).tolist()
        return batch

LazyModelMixin Attributes

Attribute	Type	Default	Description
`model_id`	str	`""`	HuggingFace model ID or namespace model ID
`model_class`	str	`"AutoModel"`	Transformers model class name
`tokenizer_class`	str \| None	`"AutoTokenizer"`	Tokenizer class, or `None` to skip
`torch_dtype`	str	`"float32"`	`"float16"`, `"float32"`, or `"bfloat16"`
`model_source`	str	`"huggingface"`	`"huggingface"` or `"namespace"`

Call self.get_model() to get a (model, tokenizer) tuple. Override _instantiate_model(cached_data) for non-standard architectures.

Custom Models (Enterprise)

Custom models require an Enterprise subscription. Contact sales to enable.

Upload fine-tuned weights and use them in extractors. Three steps:

Upload

tar -czvf my_model.tar.gz ./model_weights/

curl -X POST "$MP_API_URL/v1/namespaces/$NS_ID/models" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -F "file=@my_model.tar.gz" \
  -F "name=my-embedding-model" \
  -F "version=1.0.0" \
  -F "model_format=pytorch" \
  -F "task_type=embedding" \
  -F "num_gpus=0" \
  -F "memory_gb=4.0"

Supported formats: pytorch (.pt, .pth), safetensors, onnx, huggingface (directory).

Deploy to Ray

curl -X POST "$MP_API_URL/v1/namespaces/$NS_ID/models/my-embedding-model_1_0_0/deploy" \
  -H "Authorization: Bearer $MP_API_KEY"

Use in your extractor

Set model_source = "namespace" and override _instantiate_model():

class MyCustomProcessor(LazyModelMixin, BaseBatchInferenceService):
    model_id = "my-embedding-model_1_0_0"
    model_source = "namespace"

    def _instantiate_model(self, weights):
        import torch
        model = torch.nn.Linear(768, 256)
        model.load_state_dict(weights)
        model.to(self._detect_device())
        model.eval()
        return model, None

    def _process_batch(self, batch):
        model, _ = self.get_model()
        # Use model...

Model Versioning

Models are versioned independently. Deploy a new version alongside the existing one, test in staging, then shift traffic:

my-embedding-model_1_0_0  (production)
my-embedding-model_2_0_0  (staging — validate before promoting)

The feature URI updates with the extractor version (mixpeek://my_extractor@2.0.0/my_embedding), so both versions can coexist.

Python SDK

from mixpeek import Mixpeek

client = Mixpeek(api_key="sk_...")

with open("my_model.tar.gz", "rb") as f:
    result = client.models.upload(
        namespace_id="ns_abc123",
        file=f,
        name="my-reranker",
        version="1.0.0",
        model_format="pytorch",
        task_type="reranking",
    )

client.models.deploy(
    namespace_id="ns_abc123",
    model_id=result.model_id,
)

Limits

Limit	Value
Max models per namespace	50
Max archive size	10 GB
Supported formats	pytorch, safetensors, onnx, huggingface

Custom Extractors

Package and deploy extractors that use these models.

Extractor Quickstart

Build a working extractor with model loading end-to-end.

Model API Reference

Upload, deploy, list, and delete model archives.

Self-Improving CV Pipeline

Full tutorial: deploy YOLO, annotate, fine-tune, redeploy.

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

Three Ways to Load Models

HuggingFace Models (Recommended)

LazyModelMixin Attributes

Custom Models (Enterprise)

Model Versioning

Python SDK

Limits

Custom Extractors

Extractor Quickstart

Model API Reference

Self-Improving CV Pipeline

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

Documentation Index

​Three Ways to Load Models

​HuggingFace Models (Recommended)

​LazyModelMixin Attributes

​Custom Models (Enterprise)

​Model Versioning

​Python SDK

​Limits

​Related

Custom Extractors

Extractor Quickstart

Model API Reference

Self-Improving CV Pipeline

Three Ways to Load Models

HuggingFace Models (Recommended)

LazyModelMixin Attributes

Custom Models (Enterprise)

Model Versioning

Python SDK

Limits

Related