Categorizing Schema Validation Errors

Within the broader Automated Schema Enforcement & Monitoring framework, error categorization is the translation layer that turns a raw MongoDB write rejection into routable, machine-readable telemetry. This page delivers a complete, runnable pipeline: how to read the structured errInfo payload a $jsonSchema violation produces, how to map every rule failure to a deterministic severity category, and how to route each category to the correct remediation path — quarantine, coercion, developer alert, or a business rule engine. In production, a single BulkWriteError can carry dozens of distinct rule violations spanning missing required fields, bsonType mismatches, range breaches, and cross-field business logic. Treating that as one opaque error class obscures root cause and forces platform teams into reactive firefighting; a stateless taxonomy applied at the driver boundary makes failures diagnosable, alertable, and safe to automate against.

Architectural Context & Enforcement Boundaries

Categorization sits immediately downstream of the synchronous write path. When a document violates a validator configured with validationAction: "error", the MongoDB server rejects the write and the driver surfaces a WriteError (or BulkWriteError for batch operations) carrying error code 121. The actionable intelligence is not in the human-readable errmsg string — which drifts across driver and server versions — but in the nested errInfo.details.schemaRulesNotSatisfied array. Each element names the failing rule via operatorName, the affected property path, and, on MongoDB 5.0+, additional context about the actual value. Categorization is the deterministic function that consumes that array and emits a severity-tagged, routing-ready record.

This layer depends on validators being attached correctly in the first place, which is the concern of implementing collection-level validators. Its output feeds three destinations: the async validation monitoring dashboards that plot category distribution over time, the fallback validation chains that decide whether a rejected document is coerced or dead-lettered, and the custom error payloads for schema violations that standardize the shape emitted to downstream consumers. Because the categorizer is stateless and I/O-free, it can run synchronously inside an ingestion worker or asynchronously in a dedicated telemetry processor without changing behavior.

A production-grade taxonomy should be exhaustive, stateless, and aligned to operational severity tiers. The following four categories cover the overwhelming majority of failure modes in enterprise workloads:

Structural violations — missing required fields, or unexpected keys when additionalProperties: false is enforced. These usually indicate API contract drift or an upstream serialization bug and warrant developer-facing rejection.
Type & format mismatches — bsonType coercion failures, invalid enum values, or pattern (regex) mismatches. Often caused by client SDK version skew or legacy payload formats; candidates for automatic coercion or quarantine.
Constraint & range breaches — minimum, maximum, minLength, maxLength, minItems, or maxItems failures. Typically represent data-quality degradation or unbounded user input, and feed data-quality SLAs.
Business/domain logic — cross-field dependencies, state-machine transitions, or temporal constraints enforced via $expr, $and, or $or. These require contextual routing to domain-specific handlers rather than generic retry.

Prerequisites & Operational Requirements

The categorizer reads an errInfo shape that only stabilized in recent server releases. Confirm the following before deploying it into a pipeline:

Requirement	Minimum	Recommended	Why it matters
MongoDB server	4.2	5.0+	`errInfo` is populated from 4.2; the fully detailed `schemaRulesNotSatisfied` tree (with `operatorName`, `propertiesNotSatisfied`, `missingProperties`) requires 5.0+. On 4.2–4.4 the parser must degrade gracefully.
PyMongo driver	4.0	4.6+	Consistent `BulkWriteError.details` structure and exception hierarchy.
Collection state	validator attached	`validationAction: "error"`	`validationAction: "warn"` logs violations without raising, so no `WriteError` reaches the driver — see strict vs moderate validation levels.
Permissions	`insert`/`update` on target	plus `read` on telemetry sink	The categorizer itself needs no elevated privileges; it parses driver exceptions locally.
Python packages	`pymongo==4.*`	pin exact patch in `requirements.txt`	Prevents silent BSON serialization changes across upgrades.

Two environment assumptions are load-bearing. First, categorization must be deterministic: an identical errInfo payload must always yield the same category, regardless of retry attempts, so the mapping table is a constant, not a runtime lookup. Second, attach the active schema version to every emitted record; pairing categories with the validator version they were evaluated against is what makes rollback analysis and drift detection possible, as covered under schema versioning strategies for NoSQL.

Idempotent Categorization Workflow

Follow these steps to stand up categorization against a live collection. Each step is verifiable in isolation.

1. Inspect a real errInfo payload in the shell. Force a violation with mongosh so you can see the exact tree your parser must handle on this server version:

// Provoke a required-field violation and capture the full errInfo tree.
try {
  db.user_events.insertOne({ event_type: "click" }); // missing user_id, timestamp
} catch (e) {
  printjson(e.errInfo.details.schemaRulesNotSatisfied);
}

2. Confirm the discriminator key. On 5.0+, each rule element exposes operatorName (for example "required", "properties", "bsonType"). Do not branch on errmsg. Note that property-level failures nest one level deeper under propertiesNotSatisfied[].propertyName, while required failures list bare names under missingProperties.

3. Define the mapping as a constant. Encode every operatorName you expect into a static dictionary keyed to the four categories. Anything unmapped falls through to UNKNOWN, which is itself a routable signal (it means your taxonomy is behind the validator).

4. Parse at the driver boundary. Catch BulkWriteError (batches) and WriteError (single writes), walk writeErrors[].errInfo, and emit one categorized record per failing rule. Keep the function pure so it is unit-testable without a database.

5. Emit versioned telemetry. Attach the collection name, active schema version, and a stable idempotency key before publishing to the dashboard sink or dead-letter queue.

Production-Ready Automation Implementation

The implementation below extracts, categorizes, and normalizes MongoDB validation errors with pymongo. It is built for high-throughput ingestion where partial failures must be isolated without halting the batch. The core categorize_validation_error function is deliberately decoupled from I/O so it runs identically inside a synchronous worker or an async telemetry processor.

import logging
from typing import Dict, List, Any, Optional

from pymongo import MongoClient
from pymongo.errors import BulkWriteError, WriteError

logger = logging.getLogger(__name__)

# Operational constraints:
# 1. MongoDB 4.2+ populates errInfo; 5.0+ gives the full schemaRulesNotSatisfied tree.
# 2. PyMongo >= 4.0 for a consistent BulkWriteError.details structure.
# 3. Keep batches <= 100k documents to bound driver memory during error aggregation.
# 4. The mapping is a constant, so categorization is deterministic across retries.

# Maps errInfo operatorName values to semantic severity categories.
OPERATOR_CATEGORIES: Dict[str, str] = {
    "required": "STRUCTURAL",
    "additionalProperties": "STRUCTURAL",
    "bsonType": "TYPE_MISMATCH",
    "type": "TYPE_MISMATCH",
    "enum": "TYPE_MISMATCH",
    "pattern": "TYPE_MISMATCH",
    "minimum": "CONSTRAINT_BREACH",
    "maximum": "CONSTRAINT_BREACH",
    "minLength": "CONSTRAINT_BREACH",
    "maxLength": "CONSTRAINT_BREACH",
    "minItems": "CONSTRAINT_BREACH",
    "maxItems": "CONSTRAINT_BREACH",
    "expr": "DOMAIN_LOGIC",
}


def _iter_rules(err_info: Dict[str, Any]) -> List[Dict[str, Any]]:
    """Flatten a 121 errInfo tree into leaf rules that carry a real operatorName.

    The ``properties`` element is a wrapper, not a leaf: its actual operators live
    under propertiesNotSatisfied[].details[]. We expand those and drop the wrapper
    so it never lands in the UNKNOWN bucket.
    """
    details = err_info.get("details", {})
    flattened: List[Dict[str, Any]] = []
    for rule in details.get("schemaRulesNotSatisfied", []):
        props = rule.get("propertiesNotSatisfied")
        if props:
            for prop in props:
                inner_rules = prop.get("details", [])
                if inner_rules:
                    for inner in inner_rules:
                        inner = dict(inner)
                        inner.setdefault("propertyName", prop.get("propertyName"))
                        flattened.append(inner)
                else:
                    flattened.append(dict(prop))
        else:
            flattened.append(rule)
    return flattened


def categorize_validation_error(err_info: Optional[Dict[str, Any]]) -> List[Dict[str, str]]:
    """
    Parse errInfo from a WriteError (code 121) and map each rule violation to a
    semantic category. Pure function: no I/O, deterministic for a given payload.
    """
    if not err_info or "details" not in err_info:
        return [{"category": "UNKNOWN", "path": "root", "operator": "unparseable"}]

    categorized: List[Dict[str, str]] = []
    for rule in _iter_rules(err_info):
        operator = rule.get("operatorName", "unknown")
        category = OPERATOR_CATEGORIES.get(operator, "UNKNOWN")

        missing = rule.get("missingProperties", [])
        if missing:
            path = missing[0]
        else:
            path = rule.get("propertyName", "unknown")

        categorized.append({
            "category": category,
            "path": str(path),
            "operator": operator,
        })

    return categorized or [{"category": "UNKNOWN", "path": "root", "operator": "empty_rules"}]


def execute_validated_inserts(collection, documents: List[Dict[str, Any]]) -> Dict[str, Any]:
    """Execute an unordered bulk insert and categorize every validation failure."""
    results: Dict[str, Any] = {
        "inserted": 0,
        "categorized_failures": [],
        "uncategorized_failures": [],
    }

    try:
        res = collection.insert_many(documents, ordered=False)
        results["inserted"] = len(res.inserted_ids)
    except BulkWriteError as bwe:
        # writeErrors are per-document; writeConcernErrors are cluster-level and
        # are NOT schema violations, so they stay uncategorized.
        for write_err in bwe.details.get("writeErrors", []):
            if write_err.get("code") == 121:
                results["categorized_failures"].extend(
                    categorize_validation_error(write_err.get("errInfo"))
                )
            else:
                results["uncategorized_failures"].append(write_err)
        results["inserted"] = bwe.details.get("nInserted", 0)
        results["uncategorized_failures"].extend(bwe.details.get("writeConcernErrors", []))
        logger.warning(
            "Bulk insert: %d inserted, %d categorized failures",
            results["inserted"], len(results["categorized_failures"]),
        )
    except WriteError as we:
        if we.code == 121:
            results["categorized_failures"].extend(
                categorize_validation_error(we.details.get("errInfo"))
            )
        else:
            raise

    return results

The bulk handler distinguishes three outcomes that platform engineers routinely conflate: successful inserts (nInserted), per-document validation rejections (code 121, categorized), and cluster-level writeConcernErrors (never schema failures, always uncategorized). Conflating the last two produces phantom “schema violation” spikes during replica-set elections. Once a record is categorized, each category maps to a distinct operational response — structural violations trigger developer alerts, type mismatches route to quarantine or coercion, constraint breaches feed data-quality SLAs, and domain-logic failures escalate to a business rule engine. The same taxonomy is what the Python integration for schema checks tooling consumes when it runs pre-flight validation off the write path.

Diagnostic Fingerprints & Fast Resolution

When categorized errors spike, incident responders need to isolate the failure domain in minutes. The table below maps observable fingerprints to the category and the first triage action.

Fingerprint	Where to see it	Category	First action
`WriteError` / `BulkWriteError`, `code: 121`	driver exception, `mongod` log	any	Confirm it is a validation failure, not `11000` (duplicate key) or `13` (auth).
`operatorName: "required"` + `missingProperties`	`errInfo.details`	STRUCTURAL	Block the upstream producer; check for a recent client deploy.
`operatorName: "enum" \| "pattern" \| "bsonType"`	`errInfo.details`	TYPE_MISMATCH	Route to quarantine; run a type-cast remediation job.
`operatorName: "minimum" \| "maxLength" \| "maxItems"`	`errInfo.details`	CONSTRAINT_BREACH	Feed the data-quality SLA; inspect input bounds.
`category: "UNKNOWN"` in telemetry	your sink	taxonomy gap	Dump the raw rule with the snippet below and extend `OPERATOR_CATEGORIES`.

Copy-paste diagnostics:

# mongod log: isolate validation rejections in the last hour (JSON log format, 4.4+).
grep '"code":121' /var/log/mongodb/mongod.log | jq -r '.attr.errInfo.details.schemaRulesNotSatisfied[].operatorName' | sort | uniq -c

# Application telemetry: find the operators that fell through to UNKNOWN.
jq 'select(.category=="UNKNOWN") | .operator' categorized-errors.ndjson | sort | uniq -c

// mongosh: how many stored documents would fail the active validator, grouped by nothing —
// a fast pre-enforcement census. $jsonSchema is a valid query operator.
const schema = db.getCollectionInfos({ name: "user_events" })[0].options.validator.$jsonSchema;
db.user_events.countDocuments({ $nor: [{ $jsonSchema: schema }] });

Edge Cases, Gotchas & Known Limitations

Version-dependent errInfo shape. On MongoDB 4.2–4.4 the schemaRulesNotSatisfied tree is shallower and may omit operatorName; the parser returns UNKNOWN rather than crashing, which is the correct degradation. Only rely on the full tree on 5.0+.
$expr is opaque by design. A failed $expr reports operatorName: "expr" but does not tell you which sub-expression failed. Domain-logic failures therefore need application-side context (the document and the rule intent), not just the payload — this is where custom error payloads for schema violations earn their keep.
Nested arrays flatten unpredictably. A violation inside an array element (via items) can report the array property name rather than the index. If you need positional precision, walk propertiesNotSatisfied[].details recursively; the reference implementation flattens one level, which is sufficient for routing but not for pinpointing element [i].
warn mode emits nothing to the driver. Under validationAction: "warn", violations are logged server-side and no WriteError is raised, so a categorizer wired only to driver exceptions will report zero failures. Read the mongod log or promote to error mode to categorize.
additionalProperties: false can flood STRUCTURAL. A single unexpected key on every document produces one structural violation per write. During schema tightening this can dominate the category distribution and mask real bugs; stage the change and watch the dashboard before enforcing.

Verification & Rollback Procedures

Verify the categorizer end to end before trusting its telemetry:

# Unit-verify coverage without a database: every mapped operator must categorize.
sample = {"details": {"schemaRulesNotSatisfied": [
    {"operatorName": "required", "missingProperties": ["user_id"]},
    {"operatorName": "properties", "propertiesNotSatisfied": [
        {"propertyName": "event_type", "details": [{"operatorName": "enum"}]}
    ]},
]}}
out = categorize_validation_error(sample)
assert {r["category"] for r in out} == {"STRUCTURAL", "TYPE_MISMATCH"}, out
assert all(r["category"] != "UNKNOWN" for r in out), "taxonomy gap detected"
print("categorizer OK:", out)

Then confirm live coverage: after a deploy, query your sink for category == "UNKNOWN" and expect a rate near zero. A rising UNKNOWN rate means the validator introduced an operator your table does not know — extend OPERATOR_CATEGORIES and redeploy.

Rollback is purely code-side because the categorizer never mutates the database. To disable it safely: (1) revert the worker to the previous parser version, (2) leave the collection validator untouched — categorization and enforcement are independent, so rolling back the parser does not change write acceptance, and (3) if a bad mapping mislabeled records, re-run the pure function over the retained raw errInfo payloads to re-categorize historically. Retaining the raw payload alongside the derived category is what makes this a zero-data-loss rollback; never discard errInfo after categorizing.

Frequently Asked Questions

Why branch on operatorName instead of the errmsg string?

The errmsg text is human-readable and changes across server and driver versions, so string matching against it is brittle and silently breaks on upgrade. operatorName inside errInfo.details.schemaRulesNotSatisfied is structured and stable from MongoDB 5.0, making it the reliable discriminator for a deterministic taxonomy.

Does categorization add latency to the write path?

No. The categorizer only runs when a write has already been rejected, and it is a pure, in-memory function over the exception the driver already built. It performs no database round-trips, so it adds nothing to the success path and negligible cost to the failure path.

What does a UNKNOWN category actually mean?

It means a rule fired whose operatorName is not in your mapping table — usually because a validator was tightened with a new operator before the taxonomy was updated. Treat a rising UNKNOWN rate as a drift alarm: dump the raw rule, add the operator to OPERATOR_CATEGORIES, and redeploy.

How do I categorize failures when the collection runs in warn mode?

You cannot do it from driver exceptions — validationAction: "warn" accepts the write and only logs the violation server-side. Either parse the mongod log for 121 entries, or promote the collection to validationAction: "error" so rejections surface as WriteError objects the categorizer can read.

Should writeConcernErrors be categorized as schema violations?

Never. writeConcernErrors are cluster-level acknowledgement failures (for example during a replica-set election), not document rejections. They carry no errInfo schema tree. Keep them in a separate uncategorized bucket, or you will see phantom validation spikes that correlate with failovers rather than data problems.

Automated Schema Enforcement & Monitoring — the parent architecture this categorization layer plugs into, spanning validators, middleware, and pre-flight checks.
Custom error payloads for schema violations — turn the categorized record into a stable, machine-readable payload for downstream consumers.
Implementing collection-level validators — the synchronous enforcement layer whose rejections this taxonomy consumes.
Async validation monitoring dashboards — where category distribution is plotted over time and correlated with deploys.
Building fallback validation chains — the routing target that decides whether a categorized document is coerced or dead-lettered.

Categorizing Schema Validation Errors

Explore deeper