Documentation Index
Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Email connector lets you ingest documents by forwarding emails to a dedicated address. Each email becomes a bucket object with the body as a text blob, each attachment as a typed blob, and the original.eml preserved for chain of custody.
This is built for compliance-oriented workflows — legal document intake, healthcare record forwarding, secure support inboxes — where email is the transport and the documents (attachments) are the payload.
Prerequisites
- A Mixpeek account with an active namespace
- A bucket and sync configured for the email connection
- Cloudflare account with
mixpeek.com(or your custom domain) in Cloudflare DNS — Email Routing is free on all plans
How It Works
- Create an email connection — Mixpeek assigns a unique inbound address (e.g.,
conn_abc123@inbound.mixpeek.com) - Cloudflare receives the email — MX records point to Cloudflare Email Routing, which routes to a Worker
- Worker POSTs raw .eml — The Cloudflare Email Worker reads the raw RFC 2822 bytes and POSTs them to the Mixpeek webhook
- Mixpeek parses and stores — MIME parsing extracts headers → metadata, body → text blob, attachments → S3-backed blobs, raw .eml → S3
Configuration
Connection-level fields
| Field | Required | Default | Description |
|---|---|---|---|
allowed_senders | No | [] (all) | Sender allowlist. Exact addresses or domain wildcards (*@company.com). Empty = accept all. |
store_raw_eml | No | true | Store the original .eml file as an additional blob for chain of custody. |
Auto-provisioned fields (read-only)
| Field | Description |
|---|---|
inbound_address | System-assigned email address for this connection (e.g., conn_abc123@inbound.mixpeek.com) |
webhook_secret | Auto-generated HMAC-SHA256 signing secret for webhook verification |
Setup
Deploy the Cloudflare Email Worker
The Email Worker receives emails at Optionally set a global signing key:
*@inbound.mixpeek.com and POSTs the raw .eml bytes to the Mixpeek webhook. The worker source is in server/infra/cloudflare/email-worker/.Enable Cloudflare Email Routing
In the Cloudflare Dashboard:
- Go to your domain (
mixpeek.com) → Email Routing - Enable Email Routing — Cloudflare auto-adds MX records for
inbound.mixpeek.com - Go to Routing rules → Catch-all address
- Set action to Send to a Worker → select
mixpeek-email-ingest
Cloudflare Email Routing is free on all plans. MX records are managed automatically — no manual DNS configuration needed.
Object Structure
Each email becomes one bucket object with multiple blobs:| Blob Property | Type | Content |
|---|---|---|
email_body | text | Email body (plain text preferred, HTML fallback) |
attachment_0, attachment_1, … | varies | Each attachment, typed by MIME (image, pdf, video, etc.) |
raw_eml | text | Original .eml file stored in S3 (if store_raw_eml is enabled) |
Email metadata fields
These are set as root-level fields on the object and can be mapped to your bucket schema:| Field | Type | Description |
|---|---|---|
email_from | string | Sender address |
email_to | list[string] | Recipient addresses |
email_cc | list[string] | CC addresses |
email_subject | string | Subject line |
email_date | string (ISO 8601) | Date the email was sent |
email_message_id | string | RFC 2822 Message-ID (used for deduplication) |
email_in_reply_to | string | Parent message ID (for threading) |
email_references | list[string] | Thread reference IDs |
email_attachment_count | integer | Number of attachments |
Schema Mapping
Map email fields to your collection schema to make them searchable:attribute_filter in your retriever to query by sender, date, or subject:
Security
| Feature | Description |
|---|---|
| Sender allowlist | Only accept emails from specified addresses or domains |
| Webhook signature | HMAC-SHA256 verification of inbound payloads |
| Deduplication | Duplicate emails (same Message-ID) are skipped |
| Chain of custody | Raw .eml uploaded to S3 with SHA-256 hash for forensic integrity |
| Credential encryption | Webhook secret encrypted at rest (Fernet / CSFLE) |
| Audit logging | All connection events logged to ClickHouse (365-day retention) |
Compliance Notes
| Requirement | How Mixpeek addresses it |
|---|---|
| HIPAA — encryption in transit | Cloudflare enforces TLS on MX; webhook endpoint requires HTTPS (TLS 1.2+) |
| HIPAA — encryption at rest | Credentials encrypted via CSFLE; all blobs (body, attachments, raw .eml) stored in encrypted S3 |
| HIPAA — audit trail | All access logged to ClickHouse audit service |
| eDiscovery — immutability | Raw .eml in S3 with SHA-256 hash, stored alongside parsed content |
| eDiscovery — chain of custody | Source tracking: source_provider=email, source_object_id=email://{message_id} |
| SOC 2 — access control | Per-namespace RBAC (ADMIN, MEMBER, VIEWER) with granular operations |
Mixpeek does not currently hold a HIPAA BAA. If you need a BAA for PHI handling, contact us at sales@mixpeek.com to discuss your requirements.
Troubleshooting
| Issue | Solution |
|---|---|
| 403 — sender not in allowlist | Add the sender’s address or domain to allowed_senders |
| 404 — connection not found | Verify the connection_id in the webhook URL matches an active email connection |
| 400 — no active bucket sync | Create a bucket sync linked to this email connection |
| Duplicate emails skipped | Expected behavior — emails with the same Message-ID are deduplicated |
| Attachments not appearing | Check that the email service is sending the full raw RFC 2822 message, not a stripped-down version |
Related
- Buckets — Bucket schemas and objects
- Create Sync Configuration — Link connections to buckets
- Attribute Filter — Filter by email metadata

