MongoDB Enterprise Advanced

Enterprise-grade features for security, management, and support. Everything in Community plus advanced capabilities for mission-critical deployments.

MongoDB EA Component Stack CLICK COMPONENTS FOR DETAILS
Application Layer
πŸ–₯️ Application Drivers (all languages)
πŸ”Œ BI Connector (SQL β†’ MQL translation)
🧭 Compass GUI
🐚 mongosh
TLS / HTTPS
Management Layer β€” Ops Manager
βš™οΈ Automation
πŸ“Š Monitoring
πŸ’Ύ Backup
πŸ”” Alerting
πŸ”§ REST API
πŸ–₯️ Web UI
Agent Protocol
Database Layer β€” MongoDB Enterprise Server
WiredTiger (Encrypted)
In-Memory Engine
πŸ” LDAP / Kerberos / x.509
πŸ“‹ Auditing
πŸ”‘ CSFLE / Queryable Encryption
πŸ”— Cluster-to-Cluster Sync
Process Mgmt
Infrastructure Layer
☸️ Kubernetes Operator
🐳 Container Support
πŸ–§ Bare Metal / VM
☁️ Any Cloud (AWS, GCP, Azure)

EA vs Community Comparison

Key differentiators that make Enterprise Advanced the choice for production workloads.

Feature Community Enterprise Advanced
Storage Engine - WiredTigerβœ“βœ“
Encryption at Rest (Native)βœ—βœ“ KMIP, AWS KMS, Azure Key Vault, GCP KMS
In-Memory Storage Engineβœ—βœ“
LDAP Authentication & Authorizationβœ—βœ“
Kerberos Authenticationβœ—βœ“
Audit Loggingβœ—βœ“ Configurable filters
Client-Side Field Level Encryption⚠ Manual onlyβœ“ Automatic + Queryable
Ops Managerβœ—βœ“ Full management platform
BI Connectorβœ—βœ“ SQL β†’ MQL
Kubernetes Operator⚠ Community Operatorβœ“ Enterprise Operator
Enterprise Supportβœ—βœ“ 24/7 SLA-backed
Cluster-to-Cluster Syncβœ—βœ“ mongosync
SNMP Monitoringβœ—βœ“

EA Product Components

βš™οΈ Ops Manager
Self-hosted management platform. Deploy, monitor, back up, and scale MongoDB clusters. Includes automation, alerting, and a web UI. The on-premises equivalent of Atlas.
πŸ” Enterprise Server
Enhanced mongod/mongos with encryption at rest, LDAP, Kerberos, auditing, in-memory engine, SNMP, FIPS 140-2 compliance, and Queryable Encryption.
☸️ Kubernetes Operator
CustomResourceDefinitions for MongoDB, MongoDBOpsManager, MongoDBUser. Manages lifecycle via StatefulSets. Integrates with Ops Manager for automation.
πŸ”Œ BI Connector
Translates SQL queries to MQL. Enables BI tools (Tableau, Power BI, Looker) to query MongoDB directly via a MySQL-compatible wire protocol.
πŸ”— Cluster-to-Cluster Sync
mongosync enables continuous data synchronization between clusters. Supports migrations, DR, and active-passive topologies.
πŸ›‘οΈ Enterprise Support
24/7 SLA-backed support with <1 hour P1 response time. Access to MongoDB field engineers, proactive health checks, and upgrade advisory.

Ops Manager Architecture

Ops Manager is a self-hosted management platform. It consists of the Application Server, a backing Application Database, agents deployed on every managed host, and dedicated backup infrastructure.

High-Level Ops Manager Topology CLICK COMPONENTS FOR DETAILS
Users & Integrations
πŸ‘€ Ops Manager Web UI
πŸ”§ REST API v2.0
πŸ”— Terraform Provider
πŸ“§ Alert Webhooks / Email / PagerDuty / Slack
HTTPS :8443
Ops Manager Application Servers (2+ for HA)
πŸ–₯️ HTTP Server (Jetty)
πŸ’Ύ Backup Daemon
πŸ”” Alert Engine
βš™οΈ Goal State Engine
πŸ“Š Metric Aggregator
πŸ‘₯ Auth (LDAP/SAML/SCRAM)
MongoDB Wire Protocol
Ops Manager Backing Databases
Application DB (3-node RS)
Blockstore (snapshot chunks)
Oplog Store (oplog slices)
Head DB (staging)
Agent ← Poll β†’ OM
Managed MongoDB Hosts (Agents)
πŸ€– MongoDB Agent (unified)
β†’ Automation Module
β†’ Monitoring Module
β†’ Backup Module

Ops Manager Subservices β€” Deep Dive

Click each subservice to expand its technical details, internal workings, and interaction patterns.

βš™οΈ
Automation Agent
Runs on every managed host
Manages the entire lifecycle of MongoDB processes β€” provisioning, configuration, upgrades, and topology changes.
  • Pull-based model: Agent polls Ops Manager every 10s (configurable) for the "goal state" β€” a JSON document describing desired topology
  • Convergence engine: Compares current state to goal state and takes actions: start/stop mongod, modify configs, initiate replica set reconfig, add shards
  • Upgrade orchestration: Rolling upgrades one member at a time. Steps down primary last. Waits for secondaries to catch up before proceeding
  • Process management: Starts mongod/mongos with correct flags. Monitors process health. Restarts on crash with backoff
  • Configuration: Generates mongod.conf from goal state. Handles TLS certs, keyfiles, LDAP config, audit config
  • Auth bootstrap: Creates first admin user, configures keyfile auth, enables auth on replica set
  • Port: Outbound HTTPS to Ops Manager (port 8443). No inbound ports required
  • Log: /var/log/mongodb-mms-automation/automation-agent.log
  • Failure mode: If agent is down, existing MongoDB processes keep running. No new changes applied until agent reconnects
πŸ“Š
Monitoring Agent
Runs on every managed host
Collects granular performance metrics from every MongoDB process and pushes them to Ops Manager for visualization and alerting.
  • Data collection methods: Runs serverStatus, replSetGetStatus, dbStats, collStats, top, currentOp, connPoolStats
  • Collection interval: Default every 60 seconds. Configurable down to 10 seconds
  • Hardware metrics: CPU usage, disk IOPS, disk utilization, memory (RSS, mapped, virtual), network I/O β€” collected via host agent
  • Replication lag: Measures optime difference between primary and each secondary. Alerts on configurable thresholds
  • Push model: Batches metrics and sends compressed payload to Ops Manager HTTP endpoint
  • Metric retention: 1-minute granularity for 48 hours β†’ 5-min for 7 days β†’ 1-hour for 90 days β†’ daily for 2 years
  • Custom metrics: Can define custom serverStatus-based metrics for dashboards
  • Profiler integration: Can collect slow query logs (from profiler level 1/2) and display in Ops Manager
  • Data sent: ~2-5 KB per mongod per collection interval (compressed)
πŸ’Ύ
Backup Agent
Runs on every managed host
Handles oplog tailing for continuous backup and coordinates with the Backup Daemon for snapshot creation.
  • Oplog tailing: Connects to each replica set member and tails the local.oplog.rs collection continuously
  • Initial sync: On first backup, performs a full data copy. Uses mongodump internally or filesystem snapshots
  • Oplog slicing: Divides oplog into time-based slices and sends compressed slices to the Oplog Store
  • Coordination: Reports status to Backup Daemon on Ops Manager. Daemon orchestrates when to take snapshots
  • Sharded cluster backup: Coordinates across all shard agents to create a consistent checkpoint using balancer pause + config server oplog position
  • Compression: Oplog and snapshot data compressed with zstd before transmission and storage
  • Bandwidth control: Configurable max oplog transfer rate per agent to avoid saturating network
  • Encryption: Data encrypted in transit (TLS). Optional encryption at rest in blockstore/oplog store
πŸ—οΈ
Backup Daemon
Runs on Ops Manager App Server
Central coordinator for all backup operations. Manages snapshot scheduling, retention policies, and restore orchestration.
  • Snapshot scheduling: Creates snapshots at configurable intervals (default: every 6 hours). Base + incremental approach
  • Head Database: Maintains a staging copy of each backed-up replica set. Applies oplog slices to keep it current. Used as the base for snapshots
  • Snapshot creation: Takes a point-in-time copy of the Head DB. Chunks the data and stores in the Blockstore
  • Incremental snapshots: After initial full snapshot, subsequent snapshots only store changed blocks (deduplication)
  • Retention policy: Configurable per-project. Example: 24 hourly, 7 daily, 4 weekly, 12 monthly snapshots
  • Restore types: Download snapshot (.tar.gz), Restore to another cluster (automated), Point-in-time restore (to any second), Queryable backup (mount as read-only)
  • HA: Only ONE active backup daemon per deployment. If the active Ops Manager node fails, another takes over automatically
  • Storage backends: Blockstore (MongoDB RS), S3-compatible (AWS S3, MinIO, etc.), Filesystem store (NFS/SAN)
πŸ””
Alert Engine
Runs on Ops Manager App Server
Evaluates alert conditions against collected metrics and triggers notifications via multiple integration channels.
  • Alert conditions: Threshold-based (e.g., connections > 500), rate-based (e.g., page faults/sec), boolean (e.g., primary step-down)
  • Built-in alerts: Host down, replication lag, disk usage, oplog window, election events, backup delay, agent disconnect
  • Custom alerts: Create on any collected metric with AND/OR conditions
  • Integrations: Email, PagerDuty, Slack, OpsGenie, VictorOps, Webhook (custom), SNMP traps, HipChat
  • Evaluation cycle: Checks conditions every monitoring interval (default 60s)
  • Alert states: OPEN β†’ ACKNOWLEDGED β†’ CLOSED. Configurable auto-resolution
  • Maintenance windows: Suppress alerts during planned downtime
πŸ–₯️
HTTP Server & API
Runs on Ops Manager App Server
Jetty-based HTTP server serving the Web UI and REST API. Handles authentication, authorization, and session management.
  • Web UI: React-based SPA served by embedded Jetty. Provides dashboards, cluster management, user management, project settings
  • REST API v2.0: Full CRUD for all resources β€” organizations, projects, clusters, users, alerts, backup configs. Digest or API key auth
  • Default port: 8080 (HTTP) / 8443 (HTTPS). TLS strongly recommended for production
  • Authentication: Local accounts (SCRAM), LDAP bind, SAML 2.0 (SSO with Okta, ADFS, etc.), x.509 client certs
  • Authorization: RBAC with roles: Global Owner, Org Admin, Project Owner, Project Read Only, etc.
  • Session: HTTP sessions stored in Application Database. Configurable timeout (default 12 hours)
  • Rate limiting: Configurable per-user and per-API-key rate limits to protect against abuse
  • Load balancer: For HA, place 2+ Ops Manager app servers behind an L7 LB with sticky sessions

Agent ↔ Ops Manager Interaction Flow

Automation Agent Convergence Loop
1
Agent Polls
GET /api/agent/v1/goalState
Every 10s via HTTPS
β†’
2
Goal State Received
JSON describing desired
topology & config
β†’
3
Compare
Diff current state vs
goal state
β†’
4
Converge
Apply changes: start/stop
processes, reconfig RS
β†’
5
Report
POST status back to
Ops Manager
Monitoring Data Flow
1
mongod / mongos
serverStatus, replSetGetStatus
dbStats, currentOp
β†’
2
Monitoring Module
Collects every 60s
+ host hardware stats
β†’
3
Compressed Push
HTTPS POST to
Ops Manager
β†’
4
Metric Aggregator
Stores in App DB
with time-based rollup
β†’
5
UI / Alerts
Dashboards, charts
Alert evaluation
πŸ’‘
Unified Agent (4.0+): Since Ops Manager 4.0, all three functions (Automation, Monitoring, Backup) are bundled into a single mongodb-agent binary. The agent runs as a single process and enables/disables modules based on Ops Manager configuration. This simplifies deployment from 3 agents to 1 per host.

Ops Manager Backing Databases

Ops Manager relies on several dedicated MongoDB instances for its own operation. These are critical infrastructure.

Application Database
Purpose: Stores all Ops Manager metadata β€” users, organizations, projects, cluster configs, alert definitions, metric data, automation goal states, audit logs.

Sizing: Requires 3+ node replica set. SSD recommended. Size depends on # of managed hosts: ~2 GB per 100 servers.

Requirements: Must use WiredTiger. Must have oplog sized for at least 24 hours. Dedicated hardware recommended (not co-located with managed clusters).
Blockstore
Purpose: Stores snapshot data as compressed, deduplicated chunks. Each snapshot is a series of blocks referencing a base snapshot plus deltas.

Sizing: Depends on total data being backed up. Rule of thumb: 2-3x the total dataset size (for retention + overhead).

Options: MongoDB replica set (default), S3-compatible (AWS S3, GCS, MinIO), filesystem (NFS/SAN). Can have multiple blockstores for distribution.
Oplog Store
Purpose: Stores oplog slices between snapshots. These slices enable point-in-time restore to any second between two snapshots.

Sizing: Grows based on write volume. Typically 10-20% of blockstore size. Oplog slices are compressed.

Retention: Slices are garbage-collected after the next snapshot covers that time range plus the retention window.
Head Database
Purpose: Temporary staging database per backed-up replica set. The Backup Daemon applies oplog slices to keep it up-to-date, then takes periodic snapshots from it.

Lifecycle: Created automatically. One per backed-up replica set. Stored on Backup Daemon host or dedicated storage.

Note: High I/O component. Place on fast SSD. Size equals the dataset of the replica set being backed up.

Backup & Recovery Architecture

Ops Manager provides enterprise-grade backup with continuous point-in-time recovery, automated scheduling, and multiple restore options.

Backup Data Flow HOVER LAYERS TO FOCUS
Source: Managed MongoDB Clusters
Primary (initial sync source)
Secondary (oplog tailing)
πŸ“‹ local.oplog.rs (capped collection)
Oplog Tail
Processing: Backup Agent + Daemon
πŸ€– Backup Agent tails oplog continuously
πŸ“¦ Slices oplog into time-based chunks
πŸ—οΈ Backup Daemon applies slices to Head DB
πŸ“Έ Takes periodic snapshots from Head DB
Compressed Write
Storage: Blockstore + Oplog Store
πŸ’Ύ Blockstore: Compressed snapshot chunks
πŸ“‹ Oplog Store: Oplog slices (PIT restore)
☁️ Optional: S3-compatible backend
πŸ“ Optional: Filesystem (NFS/SAN)

Security & Encryption

MongoDB Enterprise Advanced provides defense-in-depth security across authentication, authorization, encryption, and auditing.

Security Layers HOVER LAYERS TO FOCUS
Network Security
πŸ”’ TLS/SSL (all connections)
🌐 VPC / VPN / Private Link
πŸ›‘οΈ IP Allowlists
πŸ”— x.509 mutual TLS
Authentication
πŸ” SCRAM-SHA-256
πŸ“‚ LDAP (Active Directory)
🎫 Kerberos (GSSAPI)
πŸ“œ x.509 Certificates
πŸ”‘ AWS IAM (Atlas)
Authorization
πŸ‘€ RBAC (100+ built-in roles)
πŸ“‚ LDAP Group β†’ Role Mapping
πŸ”§ Custom Roles (collection-level)
πŸ” Privilege Actions (180+)
Encryption
πŸ’½ Encryption at Rest (WiredTiger)
πŸ”‘ KMIP / AWS KMS / Azure / GCP
πŸ” Client-Side FLE (CSFLE)
πŸ” Queryable Encryption
Auditing & Compliance
πŸ“‹ Database Audit Log
πŸ” Configurable Audit Filters
πŸ“œ FIPS 140-2
πŸ›οΈ SOC2, HIPAA, PCI-DSS, GDPR

Monitoring & Logging

Ops Manager provides deep monitoring of every MongoDB process with real-time dashboards, alerting, and log management.

Metrics Collected

Metric Retention & Granularity

Time RangeGranularityRetention
Last 48 hours1 minute2 days
Last 7 days5 minutes7 days
Last 90 days1 hour90 days
Last 2 years1 day730 days

Logging Architecture

πŸ“„ MongoDB Server Logs
Structured JSON logs (since 4.4). Components: ACCESS, COMMAND, CONTROL, ELECTION, GEO, INDEX, NETWORK, QUERY, REPL, SHARDING, STORAGE, WRITE.

Configurable verbosity per component (0-5). Slow query logging via profiler or slowOpThresholdMs (default 100ms).
πŸ€– Agent Logs
/var/log/mongodb-mms-automation/

automation-agent.log β€” Goal state changes, process starts/stops, config changes.
monitoring-agent.log β€” Metric collection events, connection issues.
backup-agent.log β€” Oplog tailing status, sync progress.
πŸ–₯️ Ops Manager Logs
/opt/mongodb/mms/logs/

mms0.log β€” Application server logs (HTTP requests, API calls).
daemon.log β€” Backup daemon operations.
mms0-audit.log β€” Ops Manager user actions audit.
πŸ”— External Integration
Forward logs to external systems via:

Syslog: systemLog.destination: syslog
File β†’ Fluentd/Logstash: Ship JSON logs to ELK/Splunk
SNMP: Enterprise SNMP traps for integration with Nagios, Zabbix

High Availability β€” From Nodes to Datacenters

MongoDB EA HA isn't a single feature β€” it's a layered defense. Each level protects against a broader class of failure, and they all build on the same replica set foundation. A production deployment should address every level.

Level 1 β€” Process
A single mongod crashes, runs OOM, or is killed. Replica set auto-elects a new primary in ~5-10s. Zero data loss with journaling.
Level 2 β€” Host / Rack
A physical server dies or a rack loses power. Members in other racks/AZs elect a new primary. Spread members across failure domains within a DC.
Level 3 β€” Datacenter
An entire DC goes offline β€” network, power, or disaster. Replica set members in surviving DCs elect a new primary. Requires members distributed across DCs.
Level 4 β€” Region / Global
A region-wide outage or you need zero write-latency in each region. Zone sharding pins data locally; mongosync enables standby clusters for manual failover.
πŸ’‘
The same rs.initiate() and w: "majority" mechanics work identically whether members are on the same rack or spread across continents. MongoDB does not distinguish between "local HA" and "geo HA" β€” it's all replica set replication. The only things that change are network latency, member placement, and election configuration.

Component HA Matrix

Every component in the EA stack must be resilient. Here's the HA strategy for each, applicable whether running in a single DC or across multiple.

ComponentHA StrategyMin NodesFailover
Managed MongoDB RS 3+ member replica set. Spread across AZs (single DC) or DCs (multi-DC). 3 (5 for multi-DC) Automatic election, ~5-10s within DC, ~10-30s cross-DC.
Ops Manager App Server 2+ instances behind L7 load balancer (sticky sessions). Can span DCs. 2 LB routes to healthy instance. 0 downtime.
Application Database Standard MongoDB RS. Should span DCs in multi-DC deployments. 3 Automatic election. ~5-10s single DC, ~10-30s cross-DC.
Backup Daemon Active-passive across Ops Manager instances. 2 (OM servers) Auto failover to another OM instance. 1-2 min.
Blockstore / Oplog Store MongoDB RS(s). Multiple stores for load. Should span DCs for DR. 3 per RS RS automatic election. Multiple stores for distribution.
MongoDB Agent Systemd auto-restart. Ops Manager alerts on disconnect. 1 per host Agent down = no new automation. Running processes unaffected.
mongos (Sharded) Stateless. Multiple instances behind app-level LB. 2+ Driver reconnects to available mongos. No state to lose.
Config Servers 3-member RS (CSRS). Spread across DCs for sharded multi-DC. 3 Automatic election. Cluster read-only if primary lost during metadata ops.

Production Deployment Topology

Whether running in one datacenter or three, the stack is the same β€” only the member placement changes.

Single-DC Production Topology HOVER LAYERS TO FOCUS
Load Balancer (L7, sticky sessions)
🌐 HAProxy / Nginx / AWS ALB / F5
πŸ”’ TLS termination or passthrough
❀️ Health check: GET /user/login (HTTP 200)
Round Robin
Ops Manager Application Servers (2+)
πŸ–₯️ OM Server 1 (HTTP + Backup Daemon ACTIVE)
πŸ–₯️ OM Server 2 (HTTP + Backup Daemon STANDBY)
πŸ–₯️ OM Server 3 (HTTP + Backup Daemon STANDBY)
Wire Protocol
Backing Databases (All Replica Sets β€” spread across AZs/racks)
App DB: P + S + S (3 nodes)
Blockstore 1: P + S + S
Blockstore 2: P + S + S (optional)
Oplog Store: P + S + S
HTTPS Outbound
Managed MongoDB Infrastructure
πŸ€– Agent on every host (auto-restart via systemd)
πŸ“¦ MongoDB RS / Sharded β€” members across AZs or racks
⚠️
Critical: The Ops Manager Application Database should never be managed by Ops Manager itself. Deploy it independently on separate hardware with its own backup strategy (e.g., filesystem snapshots or mongodump cron).
πŸ’‘
Backup Daemon HA: Only one Backup Daemon is active at any time. If the active OM node goes down, another instance automatically takes over within 1-2 minutes. Head DB files must be on shared storage (NFS/SAN) or the new daemon rebuilds from the blockstore.

The topology above protects against node and rack failures within a single DC. But what happens if the entire datacenter goes down? The same replica set just needs its members placed across DCs.

Extending to Multi-DC: Same Replica Set, Wider Placement

There is no separate "multi-DC mode." You take the same replica set and distribute members across datacenters. Elections, replication, and write concerns all work identically β€” the only difference is network latency between members.

Multi-DC Replica Set β€” 5 Members Across 3 Datacenters CLICK NODES FOR DETAILS
Datacenter A β€” Primary Region (e.g., us-east-1)
P (Primary) β€” Priority: 10
S1 (Secondary) β€” Priority: 8
Oplog Replication (~5-50ms cross-DC)
Datacenter B β€” Secondary Region (e.g., us-west-2)
S2 (Secondary) β€” Priority: 6
S3 (Secondary) β€” Priority: 6
Heartbeat / Vote (~10-100ms)
Datacenter C β€” Tiebreaker (e.g., eu-west-1)
S4 or Arbiter β€” Priority: 0 (never primary)
πŸ’‘
Why 3 fault domains? With only 2 DCs and a 2+3 member split, losing the 3-member DC means the remaining 2 members cannot form a majority (2 of 5). The cluster becomes read-only. A 3rd DC tiebreaker ensures any single DC failure always leaves a majority β€” enabling automatic failover with zero data loss.
⏱️ Election Timing
Default electionTimeoutMillis: 10000 (10s). In cross-DC setups, increase to 15-30s to avoid spurious elections from transient network blips between DCs.
🏷️ Priority-Based DC Preference
Set higher priority for members in the preferred primary DC (priority: 10). Lower priority for remote DCs (priority: 6). Highest priority + most current oplog wins elections.
πŸ”€ Network Partition Handling
If DC-A partitions from DC-B+C: the 3-member side forms a majority and elects a new primary. DC-A members step down (no majority). On heal, they rejoin and rollback any uncommitted w:1 writes.
πŸ“Š Rollback Protection
w: "majority" ensures writes replicate across DCs before ack. These survive any single DC failure with zero rollback. Only w:1 writes that haven't replicated will be rolled back.

What Happens When a Datacenter Fails

The election process is the same whether one node crashes or an entire DC goes dark. The only difference is how many members are lost at once.

1
DC-A Goes Offline
DC-A loses connectivity β€” Primary (P) and Secondary (S1) become unreachable simultaneously. In-flight writes to the primary are interrupted.
↓
2
Heartbeat Timeout (10s)
S2, S3 (DC-B) and S4 (DC-C) stop receiving heartbeats from P and S1. After electionTimeoutMillis (10s default), secondaries detect the primary is unreachable β€” same mechanism as a single node failure.
↓
3
Majority Check: 3 of 5 Alive
Surviving members: S2 + S3 + S4 = 3 of 5 votes. This is a majority. S2 (highest priority among survivors, most current oplog) calls an election.
↓
4
S2 Elected Primary in DC-B (~10-15s total)
S3 and S4 vote for S2. Election completes. S2 is the new primary, now accepting writes in DC-B. S3 continues as secondary. S4 remains tiebreaker.
↓
5
Drivers Reconnect Automatically
MongoDB drivers detect the topology change via SDAM (Server Discovery & Monitoring). Writes redirect to S2. Reads with nearest preference already target the local DC. Applications resume within ~1-3s after election.
↓
6
DC-A Recovers & Rejoins
When DC-A comes back, P and S1 contact S2 (new primary). P steps down to secondary. Any w:1 writes not replicated before the failure are rolled back to a file. All w:majority writes are intact β€” they were acknowledged across DCs before the failure.

Write Concern & Read Preference β€” The Knobs That Control Durability vs Latency

These settings work the same in single-DC and multi-DC. The difference is that in multi-DC, w: "majority" must wait for cross-DC replication β€” adding network latency to every acknowledged write.

Write ConcernSingle-DC LatencyMulti-DC LatencyDurability on DC Failure
w: 1 ~1ms (local ack) ~1ms (local ack) ⚠ May lose writes if primary DC fails before replication
w: "majority" ~2-5ms (local RS) ~50-100ms (cross-DC round-trip) βœ“ Zero data loss β€” survives any single DC failure
w: "majority", j: true ~5-10ms ~50-150ms βœ“ Maximum β€” survives simultaneous power loss at any DC
w: 3 (numeric) ~2-5ms Variable βœ“ 3 copies, but may not guarantee cross-DC spread
Read PreferenceBehaviorBest For
primary Always reads from primary. Cross-DC latency if app is in a different DC. Strong consistency required
primaryPreferred Primary if available; secondary during failover. Prefer consistency, tolerate stale during failover
nearest Lowest-latency member. Always reads from local DC if a member exists there. βœ“ Recommended for multi-DC β€” local reads everywhere
nearest + tag_sets Nearest member matching tags (e.g., dc: "us-west"). Explicit DC-aware routing, data locality
secondary Only secondaries. Can target local DC secondary for analytics. Offload reads from primary for reporting
⚠️
The latency trade-off: In a multi-DC RS with 50ms cross-DC latency, w: "majority" adds ~50-100ms to every write. For latency-sensitive workloads, you have three options: (1) Accept w:1 with some RPO risk, (2) Use an Active-Standby pattern with mongosync, or (3) Use Zone Sharding to keep writes local.

Alternative Multi-DC Patterns

A 3-DC replica set is ideal but not always possible. Here are the alternatives and their trade-offs.

Making the Entire Stack Multi-DC

It's not just the data clusters β€” every supporting component should also span DCs for true datacenter-level resilience.

πŸ–₯️ Ops Manager App Servers
Deploy 2+ OM instances across DCs behind a global load balancer (AWS Global Accelerator, Cloudflare, F5 GTM). If the primary DC's OM fails, the LB routes agents and users to the surviving DC's OM.
πŸ—„οΈ Ops Manager App DB
The Application Database RS should span DCs β€” same as managed clusters. Use w: "majority" so OM metadata survives a DC failure.
πŸ’Ύ Backup Stores
Blockstore and Oplog Store RS should span DCs too. Or use S3/GCS with cross-region replication. Backup data must survive a DC failure independently.
πŸ€– Agent Connectivity
Agents connect to OM via HTTPS. Configure mmsBaseUrl to a global LB endpoint so agents automatically reach whichever OM instance is healthy, regardless of which DC it's in.

Deployment Models

MongoDB Enterprise Advanced supports both traditional bare metal/VM deployments and modern Kubernetes-native deployments via the Enterprise Operator.

Comparison: Kubernetes vs Bare Metal / VM

AspectBare Metal / VMKubernetes
DeploymentRPM/DEB packages, manual or Ansible/TerraformEnterprise Operator + CRDs (kubectl apply)
Process LifecycleAutomation Agent manages mongod processesOperator manages StatefulSets β†’ Pods β†’ mongod
StorageLocal SSD, SAN, NAS β€” full controlPersistentVolumeClaims (PVC) β€” StorageClass dependent
NetworkingStatic IPs, DNS, direct port accessKubernetes Services, Headless Services, optional Ingress/LoadBalancer
ScalingProvision new VMs, install agent, add to topologykubectl patch to change replica count. Operator handles the rest
UpgradesOps Manager rolling upgrade via Automation AgentOperator performs rolling update of StatefulSet pods
HA / Anti-affinityManual rack/AZ placementPod anti-affinity rules, topology spread constraints
Resource IsolationDedicated hardware, cgroupsResource limits/requests, QoS classes, node selectors
TLS/Cert ManagementManual cert deployment or Vault integrationcert-manager integration, automatic rotation
MonitoringOps Manager Agent (native)Ops Manager Agent in sidecar + Prometheus endpoints
BackupOps Manager Backup Agent (native)Ops Manager Backup Agent in sidecar
Ops Manager itselfInstalled on dedicated VMsCan run in K8s via MongoDBOpsManager CRD
Best ForMax control, air-gapped, regulatory, legacy infraCloud-native, GitOps, auto-scaling, dev/staging

When to Use Which?

Choose Bare Metal / VM when...
  • Air-gapped / disconnected environments
  • Strict regulatory requirements (gov, financial)
  • Maximum control over hardware and networking
  • Existing VM infrastructure (VMware, OpenStack)
  • Very high-performance workloads needing NVMe/local SSD
  • Team has limited Kubernetes expertise
Choose Kubernetes when...
  • Cloud-native infrastructure strategy
  • GitOps / Infrastructure-as-Code workflows
  • Rapid provisioning of dev/staging/test clusters
  • Auto-healing and self-service for developers
  • Multi-cloud or hybrid deployments
  • Integration with service mesh (Istio, Linkerd)